Discussion:
[BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol
Robert Hanson
2017-04-09 12:42:34 UTC
Permalink
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

---------- Forwarded message ----------
From: Robert Hanson <***@stolaf.edu>
Date: Sat, Apr 8, 2017 at 8:12 PM
Subject: Re: [BlueObelisk-discuss] Cohen-Ingold-Prelog rules into Jmol
To: Mikko Vainio <***@abo.fi>


Super! Thank, Mikko. That is EXACTLY what I was looking for. Really nice
that Jmol is handling the 2D->3D and hydrogen addition correctly (for all
except one structure). I have not used that in a long time!

Especially grateful for the V2000 format. Except for one structure, all
that are appropriate to my algorithm to date (SP3 carbon only; R/S, not
r/s) validated nicely:

OK cip/gibberellin_2D.mol 3R4R5S10S13S17S18R21S
OK cip/beta-eudesmol.sdf 4S5R8R
OK cip/beta-eudesmol_3d.sdf 4S5R8R
OK cip/R/(1R)-1-cycloproply-2-methylpropan-1-ol_2D.mol 2R
OK cip/R/(1R)-1-cycloproply-2-methylpropan-1-ol_3D.mol 2R
OK cip/R/(2R)-2-hydroxybut-3-enal_3D.mol 3R
OK cip/R/(2R)-butan-2-ol_3d.mol R
OK cip/R/(3R)-pent-1-en-3-ol_2D.mol 3R
OK cip/R/(3R)-pent-1-en-3-ol_3D.mol 3R
OK cip/R/R.sdf 1R
OK cip/S/(S)-cyclobutyl(cyclopropyl)methanol_2D.mol 5S
OK cip/S/(S)-cyclobutyl(cyclopropyl)methanol_3D.mol 5S
OK cip/S/S.sdf 1S
OK cip/R/(1R)-1-cycloproply-2-methylpropan-1-ol_2D.mol 2R
OK cip/R/(1R)-1-cycloproply-2-methylpropan-1-ol_3D.mol 2R
OK cip/R/(2R)-2-hydroxybut-3-enal_3D.mol 3R
OK cip/R/(2R)-butan-2-ol_3d.mol R
OK cip/R/(3R)-pent-1-en-3-ol_2D.mol 3R
OK cip/R/(3R)-pent-1-en-3-ol_3D.mol 3R
OK cip/R/R.sdf 1R
OK cip/S/(S)-cyclobutyl(cyclopropyl)methanol_2D.mol 5S
OK cip/S/(S)-cyclobutyl(cyclopropyl)methanol_3D.mol 5S
OK cip/S/S.sdf 1S
OK cip/RS/(1R,2R)-2-chlorocyclohexanol_2d.mol 2R3R
OK cip/RS/(1R,2R)-2-chlorocyclohexanol_2d_noH.mol 2R3R
OK cip/RS/(1R,2R)-2-chlorocyclohexanol_3d.mol 2R3R
OK cip/RS/(1R,2R)-2-chlorocyclohexanol_3d_noH.mol 2R3R
OK cip/RS/(1S,5R,8S,12S,13R,15S)-12-methyl-14-oxa-18-
thiahexacyclo[blabla]octadecan-8-ol.sdf 4S5R6S7S8S13R
OK cip/RS/(2S,4aS,8aS)-8a-chloro-2-fluoro-decahydronaphthalen-4a-ol.sdf
5S6S10S
OK cip/RS/(4aR,8aS)-8a-methyl-octahydro-1H-2-benzopyran.sdf 5R6S
OK cip/RS/one-R-one-S.sdf 2R6S
OK cip/RS/_1R,2R_-2-__S_-chloro_fluoro_methyl_cyclohexan-1- ol.sdf
2R3R8S
OK cip/RS/_2R,3R_-3-methylpentan-2-ol.sdf 1R2R
OK cip/RS/_2R,3S_-3-methylpentan-2-ol.sdf 1S2R
OK cip/RS/(1R,2R,4R,5R)-cyclohexane-1,2,3,4,5-pentol_2d_noH.mol RRRR
OK cip/RS/(1R,2R,4R,5R)-cyclohexane-1,2,3,4,5-pentol_3d.mol RRRR
OK cip/RS/(1S,5R)-bicyclo[3.1.0]hex-2-ene_3D.mol RS
OK cip/gibberellin_2D.mol 3R4R5S10S13S17S18R21S
OK cip/beta-eudesmol.sdf 4S5R8R
OK cip/beta-eudesmol_3d.sdf 4S5R8R

The ONE that failed was

1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-
cyclopropylethyl)-pentan-3-yl]methan-1-ol.mol

tris-(cyclopropylethyl)methylvs cyclo[2.2.2]octane. I have no idea how to
fix that issue! What's the rule for that?

Bob

ps - A few of your names are slightly wrong, using "R/S" instead of "r/s".
Hi Bob,
I wrote a partial (2.5K lines) implementation of CIP stereocenter
perception for Balloon (http://users.abo.fi/mivainio/balloon), it handles
tetrahedral and trigonal pyramidal (R/S/r/s), double bond, and allene-like
(E/Z and axial Ra/ra/Sa/sa) stereocenters. The algorithm was implemented
according to and tested on the examples in Nomenclature of Organic
Chemistry: IUPAC Recommendations and Preferred Names 2013, Chapter P-9
Specification of Configuration and Conformation, p 1156-1292 (
http://dx.doi.org/10.1039/9781849733069-01156). As already pointed out on
the mailing list, a naive implementation of the CIP algorithm would do
depth-first graph traversal, which quickly becomes intractable for
polycyclic systems. And probably you do not need to do this at all, unless
generating names or preventing a conformer generation algorithm messing up
pseudoasymmetric centers.
As to examples, please find attached a set of sdf files with manually
checked configurations. The configurations are documented in the files as
data fields for easier automated testing. Some files miss information and
some may be wrong, this is just the snapshot of what I got at the moment,
but should get you started.
Best regards,
Mikko
P.S. I tried to send this to the list but the message was rejected due to
the attachment. If you wish, please put the files up somewhere for others
to use, too, if deemed useful.
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
John Mayfield
2017-04-09 15:44:28 UTC
Permalink
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]
Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example
but Daniel Lowe constructed a small reproducible example to demonstrate why
this can never work.

John
Robert Hanson
2017-04-09 15:51:48 UTC
Permalink
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]
Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example
but Daniel Lowe constructed a small reproducible example to demonstrate why
this can never work.
John
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
John Mayfield
2017-04-09 16:03:03 UTC
Permalink
Good good,

Fake news before fake news - a paper published in the CCG journal by the
CCG.

John
Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]
Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example
but Daniel Lowe constructed a small reproducible example to demonstrate why
this can never work.
John
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Robert Hanson
2017-04-09 17:11:22 UTC
Permalink
OK, so I am reading Chapter 9 now to see the gory details. I didn't know
about the root-distance check, and so now

1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclo
propylethyl)-pentan-3-yl]methan-1-ol.mol

is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after R/S,
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.

I think I will have to tackle that another day.

Bob
Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by the
CCG.
John
Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]
Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example
but Daniel Lowe constructed a small reproducible example to demonstrate why
this can never work.
John
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Robert Hanson
2017-04-09 18:05:02 UTC
Permalink
OK, I don't get the logic of this:


Rule 1 (a) Higher atomic number precedes lower;
(b) A duplicated atom, with its predecessor node having the same label
closer
to the root, ranks higher than a duplicated atom, with its predecessor node
having the same label farther from the root, which ranks higher than any
nonduplicated-atom-node (proposed by Custer, ref. 36)

Rule 2 Higher atomic mass number precedes lower;


Seriously? root distance is checked before isotope. Sure seems odd to me.
Why would that distance check not be after atomic number and mass??

Whatever...

Bob
Post by Robert Hanson
OK, so I am reading Chapter 9 now to see the gory details. I didn't know
about the root-distance check, and so now
1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclop
ropylethyl)-pentan-3-yl]methan-1-ol.mol
is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after R/S,
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.
I think I will have to tackle that another day.
Bob
On Sun, Apr 9, 2017 at 11:03 AM, John Mayfield <
Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by the
CCG.
John
Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]
Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example
but Daniel Lowe constructed a small reproducible example to demonstrate why
this can never work.
John
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Noel O'Boyle
2017-04-09 18:53:57 UTC
Permalink
We need libRS. Everyone reimplementing these rules is some type of madness.
Post by Robert Hanson
Rule 1 (a) Higher atomic number precedes lower;
(b) A duplicated atom, with its predecessor node having the same label
closer
to the root, ranks higher than a duplicated atom, with its predecessor node
having the same label farther from the root, which ranks higher than any
nonduplicated-atom-node (proposed by Custer, ref. 36)
Rule 2 Higher atomic mass number precedes lower;
Seriously? root distance is checked before isotope. Sure seems odd to me.
Why would that distance check not be after atomic number and mass??
Whatever...
Bob
Post by Robert Hanson
OK, so I am reading Chapter 9 now to see the gory details. I didn't know
about the root-distance check, and so now
1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclop
ropylethyl)-pentan-3-yl]methan-1-ol.mol
is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after R/S,
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.
I think I will have to tackle that another day.
Bob
On Sun, Apr 9, 2017 at 11:03 AM, John Mayfield <
Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by the
CCG.
John
Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]
Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example
but Daniel Lowe constructed a small reproducible example to demonstrate why
this can never work.
John
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
Robert Hanson
2017-04-09 22:53:51 UTC
Permalink
"re" implementing is a great way to find additional bugs and compare
strategies. This (to this point) took me two days. And if I started with a
"libRS" in Java, I would still have to modify it extensively to fit Jmol.
That said, I wouldn't mind taking a look at how other have implemented it.

In the mean time, is it OK for me to continue this discussion without libRS?
Post by Noel O'Boyle
We need libRS. Everyone reimplementing these rules is some type of madness.
Post by Robert Hanson
Rule 1 (a) Higher atomic number precedes lower;
(b) A duplicated atom, with its predecessor node having the same label
closer
to the root, ranks higher than a duplicated atom, with its predecessor node
having the same label farther from the root, which ranks higher than any
nonduplicated-atom-node (proposed by Custer, ref. 36)
Rule 2 Higher atomic mass number precedes lower;
Seriously? root distance is checked before isotope. Sure seems odd to me.
Why would that distance check not be after atomic number and mass??
Whatever...
Bob
Post by Robert Hanson
OK, so I am reading Chapter 9 now to see the gory details. I didn't know
about the root-distance check, and so now
1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclop
ropylethyl)-pentan-3-yl]methan-1-ol.mol
is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after R/S,
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.
I think I will have to tackle that another day.
Bob
On Sun, Apr 9, 2017 at 11:03 AM, John Mayfield <
Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by
the CCG.
John
Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]
Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example
but Daniel Lowe constructed a small reproducible example to demonstrate why
this can never work.
John
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
John Mayfield
2017-04-10 12:05:49 UTC
Permalink
Noel pointed out I only sent this back to Bob.
Also why so many dots, that's considered "not good form" in SMILES
Noel O'Boyle
2017-04-10 12:10:44 UTC
Permalink
Sorry, but I have to call you out on this, especially as this is the
Blue Obelisk mailing list.

I've no problem anyone reimplementing anything for fun or profit, but
I have to disagree with the suggestion that having an N'th
implementation of the same algorithm is progress, or good for this
community. At a recent meeting at the EBI, I think there were at least
7 attendees who had written versions of this algorithm. The whole goal
of the Blue Obelisk is to pool our expertise to develop common
resources, to avoid exactly this situation.

- Noel
Post by Robert Hanson
"re" implementing is a great way to find additional bugs and compare
strategies. This (to this point) took me two days. And if I started with a
"libRS" in Java, I would still have to modify it extensively to fit Jmol.
That said, I wouldn't mind taking a look at how other have implemented it.
In the mean time, is it OK for me to continue this discussion without libRS?
Post by Noel O'Boyle
We need libRS. Everyone reimplementing these rules is some type of madness.
Post by Robert Hanson
Rule 1 (a) Higher atomic number precedes lower;
(b) A duplicated atom, with its predecessor node having the same label
closer
to the root, ranks higher than a duplicated atom, with its predecessor node
having the same label farther from the root, which ranks higher than any
nonduplicated-atom-node (proposed by Custer, ref. 36)
Rule 2 Higher atomic mass number precedes lower;
Seriously? root distance is checked before isotope. Sure seems odd to me.
Why would that distance check not be after atomic number and mass??
Whatever...
Bob
Post by Robert Hanson
OK, so I am reading Chapter 9 now to see the gory details. I didn't know
about the root-distance check, and so now
1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclopropylethyl)-pentan-3-yl]methan-1-ol.mol
is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after R/S,
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.
I think I will have to tackle that another day.
Bob
On Sun, Apr 9, 2017 at 11:03 AM, John Mayfield
Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by
the CCG.
John
Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]
Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example but
Daniel Lowe constructed a small reproducible example to demonstrate why this
can never work.
John
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Robert Hanson
2017-04-10 12:50:25 UTC
Permalink
OK. That's fine. Point me to the algorithm. I'll say no more.
Post by Noel O'Boyle
Sorry, but I have to call you out on this, especially as this is the
Blue Obelisk mailing list.
I've no problem anyone reimplementing anything for fun or profit, but
I have to disagree with the suggestion that having an N'th
implementation of the same algorithm is progress, or good for this
community. At a recent meeting at the EBI, I think there were at least
7 attendees who had written versions of this algorithm. The whole goal
of the Blue Obelisk is to pool our expertise to develop common
resources, to avoid exactly this situation.
- Noel
Post by Robert Hanson
"re" implementing is a great way to find additional bugs and compare
strategies. This (to this point) took me two days. And if I started with
a
Post by Robert Hanson
"libRS" in Java, I would still have to modify it extensively to fit Jmol.
That said, I wouldn't mind taking a look at how other have implemented
it.
Post by Robert Hanson
In the mean time, is it OK for me to continue this discussion without
libRS?
Post by Robert Hanson
Post by Noel O'Boyle
We need libRS. Everyone reimplementing these rules is some type of madness.
Post by Robert Hanson
Rule 1 (a) Higher atomic number precedes lower;
(b) A duplicated atom, with its predecessor node having the same label
closer
to the root, ranks higher than a duplicated atom, with its predecessor node
having the same label farther from the root, which ranks higher than
any
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
nonduplicated-atom-node (proposed by Custer, ref. 36)
Rule 2 Higher atomic mass number precedes lower;
Seriously? root distance is checked before isotope. Sure seems odd to
me.
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Why would that distance check not be after atomic number and mass??
Whatever...
Bob
Post by Robert Hanson
OK, so I am reading Chapter 9 now to see the gory details. I didn't
know
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
about the root-distance check, and so now
1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-
cyclopropylethyl)-pentan-3-yl]methan-1-ol.mol
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after
R/S,
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.
I think I will have to tackle that another day.
Bob
On Sun, Apr 9, 2017 at 11:03 AM, John Mayfield
Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by
the CCG.
John
Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior
to
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking
from
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
Post by Robert Hanson
https://www.chemcomp.com/journal/chiral.htm. Serves me right.
Duh!]
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
Was that the algorithm you implemented because it's not correct -
it
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
doesn't (and can't) handle ghost atoms. Trying to track down the
example but
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
Daniel Lowe constructed a small reproducible example to
demonstrate why this
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
can never work.
John
------------------------------------------------------------
------------------
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Chalk, Stuart
2017-04-10 13:21:14 UTC
Permalink
So, I have been following the conversation about implementing code to determine stereochemistry based on the CIP rules with great interest because, as a recent convert into the cheminformatics area, a few thoughts come to mind.

1) Has anyone taken the CIP rules and rewritten them as formal logic (and machine readable) rules?
2) Does anyone involved with the development of the current InChI specification have any comments to make about its implementation of the CIP rules and how that will need to change in the current InChI Trust project to do a better job with encoding stereo centers?
3) Has there been any discussion here or in IUPAC about revising the CIP rules?

I am not an expert in this area, so these may be very naive questions, but it seems to me that if the representation of chemical structure goes down a semantic path (and yes that would mean another molecular structure file format) now is the time to rethink these things in light of electronic representation (i.e. think about the computer needs before human needs).

Stuart

Stuart Chalk, Ph.D.
Associate Professor of Chemistry
Department of Chemistry, Building 50, Room 3514,
University of North Florida
1 UNF Drive, Jacksonville, FL 32224 USA
P: 904-620-1938
F: 904-620-3535
E: ***@unf.edu<mailto:***@unf.edu>
W: http://www.unf.edu/coas/chemistry/faculty/Stuart_Chalk.aspx

On Apr 10, 2017, at 8:50 AM, Robert Hanson <***@stolaf.edu<mailto:***@stolaf.edu>> wrote:

OK. That's fine. Point me to the algorithm. I'll say no more.

On Mon, Apr 10, 2017 at 7:10 AM, Noel O'Boyle <***@gmail.com<mailto:***@gmail.com>> wrote:
Sorry, but I have to call you out on this, especially as this is the
Blue Obelisk mailing list.

I've no problem anyone reimplementing anything for fun or profit, but
I have to disagree with the suggestion that having an N'th
implementation of the same algorithm is progress, or good for this
community. At a recent meeting at the EBI, I think there were at least
7 attendees who had written versions of this algorithm. The whole goal
of the Blue Obelisk is to pool our expertise to develop common
resources, to avoid exactly this situation.

- Noel
Post by Robert Hanson
"re" implementing is a great way to find additional bugs and compare
strategies. This (to this point) took me two days. And if I started with a
"libRS" in Java, I would still have to modify it extensively to fit Jmol.
That said, I wouldn't mind taking a look at how other have implemented it.
In the mean time, is it OK for me to continue this discussion without libRS?
Post by Noel O'Boyle
We need libRS. Everyone reimplementing these rules is some type of madness.
Post by Robert Hanson
Rule 1 (a) Higher atomic number precedes lower;
(b) A duplicated atom, with its predecessor node having the same label
closer
to the root, ranks higher than a duplicated atom, with its predecessor node
having the same label farther from the root, which ranks higher than any
nonduplicated-atom-node (proposed by Custer, ref. 36)
Rule 2 Higher atomic mass number precedes lower;
Seriously? root distance is checked before isotope. Sure seems odd to me.
Why would that distance check not be after atomic number and mass??
Whatever...
Bob
Post by Robert Hanson
OK, so I am reading Chapter 9 now to see the gory details. I didn't know
about the root-distance check, and so now
1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclopropylethyl)-pentan-3-yl]methan-1-ol.mol
is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after R/S,
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.
I think I will have to tackle that another day.
Bob
On Sun, Apr 9, 2017 at 11:03 AM, John Mayfield
Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by
the CCG.
John
Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]
Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example but
Daniel Lowe constructed a small reproducible example to demonstrate why this
can never work.
John
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Peter Murray-Rust
2017-04-10 16:15:43 UTC
Permalink
I agree with Stuart. I think there are formal languages that should allow
us to express rules.

The community ran into huge problems with Daylight "canonical SMILES" since
the published paper and the actual (proprietary) code gave different
answers. I think this - and similar lack of standards - probably cost the
pharma industry zillions of dollars (my guess is ca 1 billion).

IIRC the initial InChI did not have formal documentation of the algorithm.
"The code defines the algorithm" is not a satisfactory approach.

What is necessary is BOTH a formal specification AND a large number of
tests.

As far as I remember CIP (or at least Prelog) stated that their algorithm
was incomplete - there are molecules that cause infinite recursion. So we
absolutely need Open documentation and conformance tests.

The attraction of formal specs is that it may be possible to express them
declaratively and hence generate executable algorithms independently of the
language. That at least acts as a reference even if the execution may be
slower (of course with modern compilers it might be faster).

P.
Post by Chalk, Stuart
So, I have been following the conversation about implementing code to
determine stereochemistry based on the CIP rules with great interest
because, as a recent convert into the cheminformatics area, a few thoughts
come to mind.
1) Has anyone taken the CIP rules and rewritten them as formal logic (and
machine readable) rules?
2) Does anyone involved with the development of the current InChI
specification have any comments to make about its implementation of the CIP
rules and how that will need to change in the current InChI Trust project
to do a better job with encoding stereo centers?
3) Has there been any discussion here or in IUPAC about revising the CIP rules?
I am not an expert in this area, so these may be very naive questions, but
it seems to me that if the representation of chemical structure goes down a
semantic path (and yes that would mean another molecular structure file
format) now is the time to rethink these things in light of electronic
representation (i.e. think about the computer needs before human needs).
Stuart
Stuart Chalk, Ph.D.
Associate Professor of Chemistry
Department of Chemistry, Building 50, Room 3514,
University of North Florida
1 UNF Drive, Jacksonville, FL 32224 USA
P: 904-620-1938 <(904)%20620-1938>
F: 904-620-3535 <(904)%20620-3535>
W: http://www.unf.edu/coas/chemistry/faculty/Stuart_Chalk.aspx
OK. That's fine. Point me to the algorithm. I'll say no more.
Post by Noel O'Boyle
Sorry, but I have to call you out on this, especially as this is the
Blue Obelisk mailing list.
I've no problem anyone reimplementing anything for fun or profit, but
I have to disagree with the suggestion that having an N'th
implementation of the same algorithm is progress, or good for this
community. At a recent meeting at the EBI, I think there were at least
7 attendees who had written versions of this algorithm. The whole goal
of the Blue Obelisk is to pool our expertise to develop common
resources, to avoid exactly this situation.
- Noel
Post by Robert Hanson
"re" implementing is a great way to find additional bugs and compare
strategies. This (to this point) took me two days. And if I started
with a
Post by Robert Hanson
"libRS" in Java, I would still have to modify it extensively to fit
Jmol.
Post by Robert Hanson
That said, I wouldn't mind taking a look at how other have implemented
it.
Post by Robert Hanson
In the mean time, is it OK for me to continue this discussion without
libRS?
Post by Robert Hanson
Post by Noel O'Boyle
We need libRS. Everyone reimplementing these rules is some type of madness.
Post by Robert Hanson
Rule 1 (a) Higher atomic number precedes lower;
(b) A duplicated atom, with its predecessor node having the same label
closer
to the root, ranks higher than a duplicated atom, with its predecessor node
having the same label farther from the root, which ranks higher than
any
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
nonduplicated-atom-node (proposed by Custer, ref. 36)
Rule 2 Higher atomic mass number precedes lower;
Seriously? root distance is checked before isotope. Sure seems odd to
me.
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Why would that distance check not be after atomic number and mass??
Whatever...
Bob
Post by Robert Hanson
OK, so I am reading Chapter 9 now to see the gory details. I didn't
know
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
about the root-distance check, and so now
1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclop
ropylethyl)-pentan-3-yl]methan-1-ol.mol
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z
after R/S,
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of
more
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
general iteration.
I think I will have to tackle that another day.
Bob
On Sun, Apr 9, 2017 at 11:03 AM, John Mayfield
Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by
the CCG.
John
Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior
to
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield
Hi Bob,
Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking
from
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
Post by Robert Hanson
https://www.chemcomp.com/journal/chiral.htm. Serves me right.
Duh!]
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
Was that the algorithm you implemented because it's not correct -
it
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
doesn't (and can't) handle ghost atoms. Trying to track down the
example but
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
Daniel Lowe constructed a small reproducible example to
demonstrate why this
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
can never work.
John
------------------------------------------------------------
------------------
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Post by Robert Hanson
Post by John Mayfield
Post by Robert Hanson
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Post by Robert Hanson
Post by Noel O'Boyle
Post by Robert Hanson
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
_________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Andrew Dalke
2017-04-11 03:03:03 UTC
Permalink
The community ran into huge problems with Daylight "canonical SMILES" since the published paper and the actual (proprietary) code gave different answers. I think this - and similar lack of standards - probably cost the pharma industry zillions of dollars (my guess is ca 1 billion).
Please stop repeating this. The "huge problems" in the specific example you describe simply do not exist.

Yes, the published paper was different than the commercial algorithm which came later. It's now well known that the published paper is flawed. I'm certain CANGEN in the late 1980s was also flawed, perhaps even equally flawed. It took a long time to fix all the bugs in the implementation. The release notes for v4.71 from 2000 describe how some of the SMILES changed compared to the v4.62 release of 1999, because of a bug. Why would anyone want or expect Daylight's implementation to stay bug compatible with the algorithm in the original 1989 paper?

While it's true there are differences between the 1989 paper and the later implementations, those differences are irrelevant as the algorithm in the paper wasn't good enough for general use. In 1989 neither SMILES nor the canonicalization algorithm could handle isotopes, chirality, or stereochemistry. I struggle to think of why a pharmaceutical company in the mid-1990s would use an algorithm which can't distinguish between two stereoisomers when they could buy a Daylight license. I can't think of why more than a small number of people would want a non-isomeric canonicalization which matched the commercial Daylight algorithm, and even then mostly for intellectual curiosity.

The larger number of community members - probably in the thousands if you count all Daylight toolkit users - who needed a solution simply bought or asked their company to buy a Daylight license. It's not like graph canonicalization algorithms were a mystery, so some wrote their own canonicalization code which could handle isomeric SMILES and even non-standard SMILES extensions. Still others, like Tripos and SLN, developed alternative line notations with its own canonicalization algorithm.

In passing, that "cost" to pharma industry that you complained is also called "profit for software developers"; which I personally am rather fond of. And companies received benefits from their costs.

I agree that there are other problems which could be addressed with a Daylight-like canonicalization algorithm. An obvious one is a CAS replacement along the lines of what InChI does. But Daylight didn't make those problems worse. Daylight didn't even have that goal of solving that problem, much less the goal of improving the profit margins of the pharmaceutical industry by giving away a free and no-cost solution, nor did they want to deal with the politics of developing a community-based specification.

On the other hand, what they did do was show that such solutions are possible, which ROSDAL, WLN, and other line notations before it failed to do. This is a step towards the end goal, not away.

My objection otherwise has nothing to do with standards, Blue Obelisk, CIP, or Jmol so I will now stop, and not continue this objection into other followup messages.

Andrew
***@dalkescientific.com
Robert Hanson
2017-04-10 18:31:10 UTC
Permalink
Structure is at Loading Image...

John wrote back to say

1) "The SMILES should be [13C@@H]12C3C1.C2=CC3"

-- Thanks for that. Duh!

2) "The designation is S"

I'm pretty sure it's 1R 5R.

For the chirality at C1, the only question is whether C5 beats C2. The
highest-priority path via C1-C5 is C1-C5-C6 rather than C1-C5-C2 because
the duplicated atom C1 with mass 13 coming around the cyclopropane ring
C1-C5-C6-(C1) beats the alternative pathway C1-C5-C4-C3 based on Rule 2
(higher mass). And then that pathway beats C1-C2-C3-C4 for the same reason.
So C5 has higher priority than C2.

It is opposite when there is no isotope. In that case, C1-C2-C3-C4 beats
C1-C5-C6-(C1) due to the lack of substituents on the duplicated atom C1
compared to C4 in the *next round*, giving 1S 1R for the original model
that Mikko sent me.

Am I wrong?

Bob


​
John Mayfield
2017-04-10 20:56:16 UTC
Permalink
Post by Robert Hanson
I'm pretty sure it's 1R 5R.
1) Firstly, there is only one stereocentre so how do you name two?
2) What did you get for the other test case, that one checks you have the
ordering ranking for atomic masses.
3) I'm aware of bugs in various SMILES readers, for example ACD/ChemSketch
doesn't read SMILES stereo correctly on the first atom. Also people mess up
the ring closures semantics. To eliminate that possibility can you confirm
these all give you the same you the same structure
I'll draw out the full digraph tomorrow if we can't work it out from these
tests.

To answer some other parts of the discussion.

1) Has anyone taken the CIP rules and rewritten them as formal logic (and
Post by Robert Hanson
machine readable) rules?
With regards to the formal logic encoding, the rules are well documented by
the original papers, and a formal IUPAC document (Chapter 9 [1]). On top of
that there was a paper Paulina Mata [2] that provides a structured flow
chart of program logic. There are "holes" in the rules (see Handbook of
Cheminf, Chapter 6 [3]) but there have been additions, a new one was added
recently Rule 1b [1]. In general the algorithm has (bad) exponential run
time for even small cases... it really is quite poor based on current
computer science knowledge. If you want to exchange structures, try to
avoid CIP rules - i.e. InChI and SMILES are preferable for exchanging
information. Only use CIP if you really need it, IMO that is,
name-to-structure or structure-to-name.

2) Does anyone involved with the development of the current InChI
Post by Robert Hanson
specification have any comments to make about its implementation of the CIP
rules and how that will need to change in the current InChI Trust project
to do a better job with encoding stereo centers?
The InChI doesn't and shouldn't need CIP. You can rank order ligands with
any method and use that as you canonical identifier. For example if I
generate a canonical SMILES (yes there are different implementations) the
windings are invariant @ (anti-clockwise, left) and @@(clockwise, right) so
I can just name them as that.

3) Has there been any discussion here or in IUPAC about revising the CIP
Post by Robert Hanson
rules?
Yes see [4,5], but could you imagine trying to get every on change their
definitions or use a new system! As with points 1/2 if you actually need
exchange chemical information there are better ways of doing it.

Here some Open CIP Implementations I can quickly find
- JUMBO6
<https://bitbucket.org/wwmm/jumbo6/src/e76bf83c1eaf6ec65d794b111676913c633f1112/src/main/java/org/xmlcml/cml/tools/StereochemistryTool.java?at=default&fileviewer=file-view-default>
(Notice a bug report from me 5 years ago
<https://bitbucket.org/wwmm/jumbo6/issues/1/incorrect-stereochemistry-determination>
:D)
by Peter Murray Rust
- OPSIN
<https://bitbucket.org/dan2097/opsin/src/343e6340a9ad85f68a08630f8b08de8df8f49557/opsin-core/src/main/java/uk/ac/cam/ch/wwmm/opsin/CipSequenceRules.java?at=default&fileviewer=file-view-default>
by
Daniel Lowe
- CDK
<https://github.com/cdk/cdk/blob/master/descriptor/cip/src/main/java/org/openscience/cdk/geometry/cip/CIPTool.java>
by
Egon Willighagen
- RDKit
<https://github.com/rdkit/rdkit/blob/f2c1a95c6e1548457c1b4bf4f6f8fc7defc5f1a7/Code/GraphMol/Chirality.cpp>
by Greg Landrum
- Centres <https://github.com/johnmay/centres> by Me

I also compared some commercial tools in my thesis also. When Daniel and I
did an investigation at NextMove we found OPSIN/Centres agreed the most.
Centres handles more complicate cases (e.g. decalin, para-cyclohexanes,
inositols) however I know it's still incomplete/wrong - I never bothered to
implement the fraction bond orders for mancude rings see [1].

Anyways if anything this discussion has prompted to me submit the following
abstract to ACS Fall 2017. The main aim is to formalise the problems and
propose a way forward.
Post by Robert Hanson
*Comparing CIP Implementations: The Need for an Open CIP*
Wither & Hence in the Digital Era (Oral)
The Cahn-Ingold-Prelog (CIP) priority rules have been the corner stone in
Post by Robert Hanson
written communication of stereo-chemical configuration for more than half a
century. The rules rank ligands around a stereocentre allowing an atom
order and layout invariant stereo-descriptor to be assigned, for example R
(right) or S (left) for tetrahedral atoms. Despite their widespread daily
use, many chemists may be surprised to find that beyond trivial cases,
different software may assign different labels to the same structure
diagram.
There have been several attempts to either replace or amend the CIP rules.
This talk will highlight the more challenging aspects of the ranking and
present a comparison of software that provide CIP labels and where they
disagree. Providing an IUPAC verified free and open source CIP
implementation would allow software maintainers and vendors to validate and
improve their implementations. Ultimately this would improve the accuracy
in exchange of written chemical information for all.
John

[1]
http://old.iupac.org/reports/provisional/abstract04/BB-prs310305/Chapter9.pdf
There should be a final one somewhere...
[2] http://pubs.acs.org/doi/abs/10.1021/ci00019a004
[3] http://onlinelibrary.wiley.com/book/10.1002/9783527618279
[4] http://www.sciencedirect.com/science/article/pii/S0957416600862370
[5] http://pubs.acs.org/doi/abs/10.1021/ci00012a003
Post by Robert Hanson
Structure is at https://chemapps.stolaf.edu/jmol/temp/cip-c13-test.png
John wrote back to say
-- Thanks for that. Duh!
2) "The designation is S"
I'm pretty sure it's 1R 5R.
For the chirality at C1, the only question is whether C5 beats C2. The
highest-priority path via C1-C5 is C1-C5-C6 rather than C1-C5-C2 because
the duplicated atom C1 with mass 13 coming around the cyclopropane ring
C1-C5-C6-(C1) beats the alternative pathway C1-C5-C4-C3 based on Rule 2
(higher mass). And then that pathway beats C1-C2-C3-C4 for the same reason.
So C5 has higher priority than C2.
It is opposite when there is no isotope. In that case, C1-C2-C3-C4 beats
C1-C5-C6-(C1) due to the lack of substituents on the duplicated atom C1
compared to C4 in the *next round*, giving 1S 1R for the original model
that Mikko sent me.
Am I wrong?
Bob
​
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
Robert Hanson
2017-04-11 03:37:54 UTC
Permalink
Post by Robert Hanson
I'm pretty sure it's 1R 5R.
1) Firstly, there is only one stereocentre so how do you name two?
It's a bicyclo[3.1.0]octene system. It has two stereocenters; only one was
needed for the SMILES string. See PNG image.

Still curious: What was your logic for 1S? What was wrong with mine?
Post by Robert Hanson
2) What did you get for the other test case, that one checks you have the
ordering ranking for atomic masses.
R.


I'll draw out the full digraph tomorrow if we can't work it out from these
tests.

Q: Is there software that does a nice job with producing digraphs from
SMILES?


To answer some other parts of the discussion.

1) Has anyone taken the CIP rules and rewritten them as formal logic (and
Post by Robert Hanson
machine readable) rules?
In general the algorithm has (bad) exponential run time for even small
cases... it really is quite poor based on current computer science
knowledge.

I guess this would matter if you had 1,000,000 compounds to check; the
100-line algorithm (Rules 1 and 2) I wrote seems quite straightforward and
suitable for my purposes. Hard to believe any molecule of interest would
push the limits for such.
Post by Robert Hanson
Here some Open CIP Implementations I can quickly find
- JUMBO6
<https://bitbucket.org/wwmm/jumbo6/src/e76bf83c1eaf6ec65d794b111676913c633f1112/src/main/java/org/xmlcml/cml/tools/StereochemistryTool.java?at=default&fileviewer=file-view-default>
(Notice a bug report from me 5 years ago
<https://bitbucket.org/wwmm/jumbo6/issues/1/incorrect-stereochemistry-determination>
:D)
by Peter Murray Rust
- OPSIN
<https://bitbucket.org/dan2097/opsin/src/343e6340a9ad85f68a08630f8b08de8df8f49557/opsin-core/src/main/java/uk/ac/cam/ch/wwmm/opsin/CipSequenceRules.java?at=default&fileviewer=file-view-default>
by
Daniel Lowe
- CDK
<https://github.com/cdk/cdk/blob/master/descriptor/cip/src/main/java/org/openscience/cdk/geometry/cip/CIPTool.java>
by
Egon Willighagen
- RDKit
<https://github.com/rdkit/rdkit/blob/f2c1a95c6e1548457c1b4bf4f6f8fc7defc5f1a7/Code/GraphMol/Chirality.cpp>
by Greg Landrum
- Centres <https://github.com/johnmay/centres> by Me

Q: These all implement Rule 1b and the rest of the rules? Have they been
validated in some systematic, common way, so we know they don't have any
bugs?

I also compared some commercial tools in my thesis also. When Daniel and I
did an investigation at NextMove we found OPSIN/Centres agreed the most.
Centres handles more complicate cases (e.g. decalin, para-cyclohexanes,
inositols) however I know it's still incomplete/wrong - I never bothered to
implement the fraction bond orders for mancude rings see [1].

Q: Doesn't this argue against the "Why bother doing this -- it's been done
seven times already" argument? Which one is IUPAC-2013-standard?

Anyways if anything this discussion has prompted to me submit the following
abstract to ACS Fall 2017. The main aim is to formalise the problems and
propose a way forward.
Post by Robert Hanson
*Comparing CIP Implementations: The Need for an Open CIP*
Wither & Hence in the Digital Era (Oral)
Super! One more reason I'm bummed that I won't be there... Please say hello
to Roger and Daniel for me.

Bob
John Mayfield
2017-04-11 07:37:15 UTC
Permalink
Post by John Mayfield
2) What did you get for the other test case, that one checks you have the
Post by John Mayfield
ordering ranking for atomic masses.
R.
There you go, that should also be S, ordering is: *CO, *[14CH2]C, *CC, *[H]
https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-vote/.

Q: Is there software that does a nice job with producing digraphs from
Post by John Mayfield
SMILES?
I think I added a utility in Centres, however I've barely looked at the
code in 5 years - but am planning to brush it off and clean up now though.
BTW if you look closely, Centres is abstract and wraps around existing
toolkits - I only wrapped it around CDK though in theory you could do the
same with JMol.

Q: These all implement Rule 1b and the rest of the rules? Have they been
Post by John Mayfield
validated in some systematic, common way, so we know they don't have any
bugs?
I don't think so. IIRC 1b was introduced to fix this case:
O[***@H](C(CCC1CC1)(CCC1CC1)CCC1CC1)C12CCC(CC1)CC2.
If you use that molecule you can tell whether it does/doesn't implement
that rule. Without rule 1b it should not be possible to label it. In
centres you can change the rules of the ranking: CDKPerceptor.java
<https://github.com/johnmay/centres/blob/develop/cdk/src/main/java/uk/ac/ebi/centres/cdk/CDKPerceptor.java#L77-L108>
.

Q: Doesn't this argue against the "Why bother doing this -- it's been done
Post by John Mayfield
seven times already" argument? Which one is IUPAC-2013-standard?
It wasn't me who said that, I'd only say don't do it because the
implementation will drive you mad :-). The "blessed" version would allow
everyone to confirm against it, as your original question asks - you want
to test yours it would be much simpler just to point to a complete one
leave it there. However from my previous testing I don't know if a complete
one exists anywhere (maybe the LHASA one:
http://pubs.acs.org/doi/abs/10.1021/ci00019a004 but of course this maybe
doesn't exist anymore, will ask them).

I guess this would matter if you had 1,000,000 compounds to check; the
Post by John Mayfield
100-line algorithm (Rules 1 and 2) I wrote seems quite straightforward and
suitable for my purposes. Hard to believe any molecule of interest would
push the limits for such.
CHEBI:51439, whether that's of interest or not is of course subjective

John
Noel O'Boyle
2017-04-11 08:42:27 UTC
Permalink
Post by John Mayfield
Post by Robert Hanson
Post by John Mayfield
2) What did you get for the other test case, that one checks you have the
ordering ranking for atomic masses.
R.
There you go, that should also be S, ordering is: *CO, *[14CH2]C, *CC, *[H]
https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-vote/.
Post by Robert Hanson
Q: Is there software that does a nice job with producing digraphs from
SMILES?
Q: Doesn't this argue against the "Why bother doing this -- it's been done
seven times already" argument? Which one is IUPAC-2013-standard?
It wasn't me who said that,
Nor did I - "I've no problem anyone reimplementing anything for fun or
profit, but I have to disagree with the suggestion that having an N'th
implementation of the same algorithm is progress, or good for this
community." No need to get annoyed, Bob - I've said the same to John.
And probably worse. But I think you're playing devil's advocate a bit
here, or maybe I didn't put it very tactfully. I think we can all
agree that the development of a shared resource in this area (in
whatever form) would be valuable, and hopefully John's talk (and
perhaps your implementation too) will be the first steps in this
direction.
Robert Hanson
2017-04-11 12:30:05 UTC
Permalink
[sorry - forgot that this list requires "reply-all"]
---------- Forwarded message ----------
From: Robert Hanson <***@stolaf.edu>
Date: Tue, Apr 11, 2017 at 7:29 AM
Subject: Re: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol
Post by John Mayfield
Post by John Mayfield
2) What did you get for the other test case, that one checks you have the
Post by John Mayfield
ordering ranking for atomic masses.
R.
There you go, that should also be S, ordering is: *CO, *[14CH2]C, *CC, *[H]
https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-vote/.
John, what basis in the IUPAC rules leads you to this reading? It suggests
that atoms in the nth sphere cannot be ranked until atoms in the (n+1)th
sphere are checked after application of Rule 1, even if they could be
distinguished by Rule 2. Are you suggesting that after each rule is checked
(Rule 1a, Rule 1b, Rule 2 -- or is it Rule 1(a and b), Rule 2,...?) one
must expand to the next sphere before making a decision? That seems to me
(a) unsupportable by the IUPAC rules and (b) just asking for extremely
complex code and a whole lot of unnecessary checks.

My understanding is that exhaustive application of all rules are done
within the sphere first, then the process is repeated at the next sphere.
What I read is this:





*The ranking of each atom in the nth sphere depends in the first place on
theranking of atoms of the same branch in (n − 1)th sphere, and then by
theapplication of the Sequence Rules to it; the smaller the number, the
higher therelative ranking. (Ranking Rule 2).*
This is certainly my understanding from all the reading I have done. You
have three atoms connected to an atom. You rank those three atoms based on
the rules. Atoms that are tied are taken to the next sphere, but not
until that process is completed.

To me that is pretty clear: We apply all rules to rank all atoms in a
single sphere. Nothing here says, "Atoms in a sphere are compared pairwise,
and if they are identical, then the comparison of this pair is continued to
the next sphere. Once this depth-first relative ranking is determined, the
procedure is repeated with all pairs of the sphere." I can certainly see
where *that* reading could drive one mad.
Post by John Mayfield
Q: Is there software that does a nice job with producing digraphs from
Post by John Mayfield
SMILES?
I think I added a utility in Centres, however I've barely looked at the
code in 5 years - but am planning to brush it off and clean up now though.
BTW if you look closely, Centres is abstract and wraps around existing
toolkits - I only wrapped it around CDK though in theory you could do the
same with JMol.
Q: These all implement Rule 1b and the rest of the rules? Have they been
Post by John Mayfield
validated in some systematic, common way, so we know they don't have any
bugs?
](C(CCC1CC1)(CCC1CC1)CCC1CC1)C12CCC(CC1)CC2. If you use that molecule you
can tell whether it does/doesn't implement that rule. Without rule 1b it
should not be possible to label it. In centres you can change the rules of
the ranking: CDKPerceptor.java
<https://github.com/johnmay/centres/blob/develop/cdk/src/main/java/uk/ac/ebi/centres/cdk/CDKPerceptor.java#L77-L108>
.
Yes, that's one of the models Mikko sent me. I used it for checking Rule 1b.
Post by John Mayfield
Q: Doesn't this argue against the "Why bother doing this -- it's been done
Post by John Mayfield
seven times already" argument? Which one is IUPAC-2013-standard?
It wasn't me who said that, I'd only say don't do it because the
implementation will drive you mad :-). The "blessed" version would allow
everyone to confirm against it, as your original question asks - you want
to test yours it would be much simpler just to point to a complete one
leave it there. However from my previous testing I don't know if a complete
one exists anywhere (maybe the LHASA one: http://pubs.acs.org/doi/abs/10
.1021/ci00019a004 but of course this maybe doesn't exist anymore, will
ask them).
Supposedly ACD/Labs has a compliant CIP-determining algorithm. http://
bulletin.acscinf.org/PDFs/247nm44.pdf
Is ACD/Labs represented on this list?
Post by John Mayfield
I guess this would matter if you had 1,000,000 compounds to check; the
Post by John Mayfield
100-line algorithm (Rules 1 and 2) I wrote seems quite straightforward and
suitable for my purposes. Hard to believe any molecule of interest would
push the limits for such.
CHEBI:51439, whether that's of interest or not is of course subjective
That's a nice test model.

Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
John Mayfield
2017-04-11 12:54:16 UTC
Permalink
Post by Robert Hanson
John, what basis in the IUPAC rules leads you to this reading?
That the rules :-) - I did warn you it's madness. Read the original papers
CIP papers and the IUPAC document carefully.

from IUPAC

The rules are hierarchical, i.e., each rule must be exhaustively applied in
from Prelog and Helmchen 1982:

Those atoms in the n-th sphere which are of equal rank with respect to
Post by Robert Hanson
those in the (n-1)th sphere to which they are bonded are graded by means of
the sequence *rules and these are applied exhaustively in turn: first the
entire hierarchical graph is examined by sequence rule 1. If a clear
precedence over other ligands can be established, the examination of that
particular ligand is concluded. If ligands remain whose rank is not
provided by sequence rule **1, **then one uses sequence rule 2, once
again exhaustively*, and so forth. While this procedure is in accordance
with precepts published earlier“] it now makes clear, we hope, that a rank
established for a sphere nearer to the core re- mains valid with respect to
atoms in more distant spheres (Fig. 15).
ACD/ChemSketch is very good but I can't test in bulk/call as a library.
Post by Robert Hanson
From what I did test it appeared to be mostly complete.. but will not
compute labels for molecule with more than n rings in a ringset (I forget
the exact value). For the example above it gives the correct answer (i.e.
S).

John
Post by Robert Hanson
[sorry - forgot that this list requires "reply-all"]
---------- Forwarded message ----------
Date: Tue, Apr 11, 2017 at 7:29 AM
Subject: Re: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol
On Tue, Apr 11, 2017 at 2:37 AM, John Mayfield <
Post by John Mayfield
Post by John Mayfield
2) What did you get for the other test case, that one checks you have
the ordering ranking for atomic masses.
R.
There you go, that should also be S, ordering is: *CO, *[14CH2]C, *CC, *[H]
https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-vote/.
John, what basis in the IUPAC rules leads you to this reading? It suggests
that atoms in the nth sphere cannot be ranked until atoms in the (n+1)th
sphere are checked after application of Rule 1, even if they could be
distinguished by Rule 2. Are you suggesting that after each rule is checked
(Rule 1a, Rule 1b, Rule 2 -- or is it Rule 1(a and b), Rule 2,...?) one
must expand to the next sphere before making a decision? That seems to me
(a) unsupportable by the IUPAC rules and (b) just asking for extremely
complex code and a whole lot of unnecessary checks.
My understanding is that exhaustive application of all rules are done
within the sphere first, then the process is repeated at the next sphere.
*The ranking of each atom in the nth sphere depends in the first place on
theranking of atoms of the same branch in (n − 1)th sphere, and then by
theapplication of the Sequence Rules to it; the smaller the number, the
higher therelative ranking. (Ranking Rule 2).*
This is certainly my understanding from all the reading I have done. You
have three atoms connected to an atom. You rank those three atoms based on
the rules. Atoms that are tied are taken to the next sphere, but not
until that process is completed.
To me that is pretty clear: We apply all rules to rank all atoms in a
single sphere. Nothing here says, "Atoms in a sphere are compared pairwise,
and if they are identical, then the comparison of this pair is continued to
the next sphere. Once this depth-first relative ranking is determined, the
procedure is repeated with all pairs of the sphere." I can certainly see
where *that* reading could drive one mad.
Post by John Mayfield
Q: Is there software that does a nice job with producing digraphs from
Post by John Mayfield
SMILES?
I think I added a utility in Centres, however I've barely looked at the
code in 5 years - but am planning to brush it off and clean up now though.
BTW if you look closely, Centres is abstract and wraps around existing
toolkits - I only wrapped it around CDK though in theory you could do the
same with JMol.
Q: These all implement Rule 1b and the rest of the rules? Have they been
Post by John Mayfield
validated in some systematic, common way, so we know they don't have any
bugs?
](C(CCC1CC1)(CCC1CC1)CCC1CC1)C12CCC(CC1)CC2. If you use that molecule
you can tell whether it does/doesn't implement that rule. Without rule 1b
it should not be possible to label it. In centres you can change the rules
of the ranking: CDKPerceptor.java
<https://github.com/johnmay/centres/blob/develop/cdk/src/main/java/uk/ac/ebi/centres/cdk/CDKPerceptor.java#L77-L108>
.
Yes, that's one of the models Mikko sent me. I used it for checking Rule 1b.
Post by John Mayfield
Q: Doesn't this argue against the "Why bother doing this -- it's been
Post by John Mayfield
done seven times already" argument? Which one is IUPAC-2013-standard?
It wasn't me who said that, I'd only say don't do it because the
implementation will drive you mad :-). The "blessed" version would allow
everyone to confirm against it, as your original question asks - you want
to test yours it would be much simpler just to point to a complete one
leave it there. However from my previous testing I don't know if a complete
one exists anywhere (maybe the LHASA one: http://pubs.acs.org/doi/abs/10
.1021/ci00019a004 but of course this maybe doesn't exist anymore, will
ask them).
Supposedly ACD/Labs has a compliant CIP-determining algorithm. http://
bulletin.acscinf.org/PDFs/247nm44.pdf
Is ACD/Labs represented on this list?
Post by John Mayfield
I guess this would matter if you had 1,000,000 compounds to check; the
Post by John Mayfield
100-line algorithm (Rules 1 and 2) I wrote seems quite straightforward and
suitable for my purposes. Hard to believe any molecule of interest would
push the limits for such.
CHEBI:51439, whether that's of interest or not is of course subjective
That's a nice test model.
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
John Mayfield
2017-04-11 13:03:04 UTC
Permalink
Daniel points out this paper is a good start for understanding the rules:
http://pubs.acs.org/doi/abs/10.1021/ci990090v
Robert Hanson
2017-04-11 14:42:48 UTC
Permalink
Brilliant. So I see the logic. This is chemistry not computer science.
Thank you, John. I will adjust the algorithm. Pretty easy fix.
Post by Robert Hanson
John, what basis in the IUPAC rules leads you to this reading?
That the rules :-) - I did warn you it's madness. Read the original papers
CIP papers and the IUPAC document carefully.
from IUPAC
The rules are hierarchical, i.e., each rule must be exhaustively applied
Those atoms in the n-th sphere which are of equal rank with respect to
Post by Robert Hanson
those in the (n-1)th sphere to which they are bonded are graded by means of
the sequence *rules and these are applied exhaustively in turn: first
the entire hierarchical graph is examined by sequence rule 1. If a clear
precedence over other ligands can be established, the examination of that
particular ligand is concluded. If ligands remain whose rank is not
provided by sequence rule **1, **then one uses sequence rule 2, once
again exhaustively*, and so forth. While this procedure is in accordance
with precepts published earlier“] it now makes clear, we hope, that a rank
established for a sphere nearer to the core re- mains valid with respect to
atoms in more distant spheres (Fig. 15).
Robert Hanson
2017-04-11 17:25:38 UTC
Permalink
Thank you very much for that very clear tip, John. Thinking *chemically*
was the key to understanding this logic. atomic masses are subtleties that
should not radically change the picture. Jmol is applying Rules 1 and 2
correctly now based on the suite of structures Mikko gave me. (See
https://sourceforge.net/p/jmol/code/HEAD/tree/trunk/Jmol-datafiles/cip/)

$ load "$CC[C@@](CO)([H])[14CH2]C"
$ print {chirality != ""}.label("%i%[chirality]").join(" ");

3S

$ load
"cip/S/1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclopropylethyl)-pentan-3-yl]methan-1-ol-new.mol"
$ print {chirality != ""}.label("%i%[chirality]").join(" ");

7S

$ load "cip/RS/(1S,5R)-bicyclo[3.1.0]hex-2-ene_2D.mol" filter "2D"
$ print {*}.find("SMILES")
$ print {chirality != ""}.label("%i%[chirality]").join(" ");

[C@@H]12[***@H]3C1.C2=CC3
1S 5R

$ load "cip/RS/(1S,5R)-bicyclo[3.1.0]hex-2-ene_2D.mol" filter "2D"
$ @1.element = "13C"
$ print {*}.find("SMILES")
$ print {chirality != ""}.label("%i%[chirality]").join(" ");

[13C@@H]12C3C1.C2=CC3
1S 5R

:)

Time to work on Rule 3....

​Bob
Bob
2017-04-11 22:17:14 UTC
Permalink
Wow. I love seqcis! Rule 3 was trivial

Robert M. Hanson
St. Olaf College Chemistry
from my Windows phone

-----Original Message-----
From: "Robert Hanson" <***@stolaf.edu>
Sent: ‎4/‎11/‎2017 7:30 AM
To: "BlueObelisk-Discuss" <blueobelisk-***@lists.sourceforge.net>
Subject: Fwd: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol

[sorry - forgot that this list requires "reply-all"]

---------- Forwarded message ----------
From: Robert Hanson <***@stolaf.edu>
Date: Tue, Apr 11, 2017 at 7:29 AM
Subject: Re: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol
To: John Mayfield <***@gmail.com>







On Tue, Apr 11, 2017 at 2:37 AM, John Mayfield <***@gmail.com> wrote:



On 11 April 2017 at 04:37, Robert Hanson <***@stolaf.edu> wrote:

2) What did you get for the other test case, that one checks you have the ordering ranking for atomic masses.
CC[C@@](CO)([H])[14CH2]C


R.

There you go, that should also be S, ordering is: *CO, *[14CH2]C, *CC, *[H]
https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-vote/.




John, what basis in the IUPAC rules leads you to this reading? It suggests that atoms in the nth sphere cannot be ranked until atoms in the (n+1)th sphere are checked after application of Rule 1, even if they could be distinguished by Rule 2. Are you suggesting that after each rule is checked (Rule 1a, Rule 1b, Rule 2 -- or is it Rule 1(a and b), Rule 2,...?) one must expand to the next sphere before making a decision? That seems to me (a) unsupportable by the IUPAC rules and (b) just asking for extremely complex code and a whole lot of unnecessary checks.


My understanding is that exhaustive application of all rules are done within the sphere first, then the process is repeated at the next sphere. What I read is this:

The ranking of each atom in the nth sphere depends in the first place on the
ranking of atoms of the same branch in (n − 1)th sphere, and then by the
application of the Sequence Rules to it; the smaller the number, the higher the
relative ranking. (Ranking Rule 2).


This is certainly my understanding from all the reading I have done. You have three atoms connected to an atom. You rank those three atoms based on the rules. Atoms that are tied are taken to the next sphere, but not until that process is completed.



To me that is pretty clear: We apply all rules to rank all atoms in a single sphere. Nothing here says, "Atoms in a sphere are compared pairwise, and if they are identical, then the comparison of this pair is continued to the next sphere. Once this depth-first relative ranking is determined, the procedure is repeated with all pairs of the sphere." I can certainly see where that reading could drive one mad.






Q: Is there software that does a nice job with producing digraphs from SMILES?


I think I added a utility in Centres, however I've barely looked at the code in 5 years - but am planning to brush it off and clean up now though. BTW if you look closely, Centres is abstract and wraps around existing toolkits - I only wrapped it around CDK though in theory you could do the same with JMol.


Q: These all implement Rule 1b and the rest of the rules? Have they been validated in some systematic, common way, so we know they don't have any bugs?


I don't think so. IIRC 1b was introduced to fix this case: O[***@H](C(CCC1CC1)(CCC1CC1)CCC1CC1)C12CCC(CC1)CC2. If you use that molecule you can tell whether it does/doesn't implement that rule. Without rule 1b it should not be possible to label it. In centres you can change the rules of the ranking: CDKPerceptor.java.


Yes, that's one of the models Mikko sent me. I used it for checking Rule 1b.




Q: Doesn't this argue against the "Why bother doing this -- it's been done seven times already" argument? Which one is IUPAC-2013-standard?


It wasn't me who said that, I'd only say don't do it because the implementation will drive you mad :-). The "blessed" version would allow everyone to confirm against it, as your original question asks - you want to test yours it would be much simpler just to point to a complete one leave it there. However from my previous testing I don't know if a complete one exists anywhere (maybe the LHASA one: http://pubs.acs.org/doi/abs/10.1021/ci00019a004 but of course this maybe doesn't exist anymore, will ask them).


Supposedly ACD/Labs has a compliant CIP-determining algorithm. http://bulletin.acscinf.org/PDFs/247nm44.pdf

Is ACD/Labs represented on this list?




I guess this would matter if you had 1,000,000 compounds to check; the 100-line algorithm (Rules 1 and 2) I wrote seems quite straightforward and suitable for my purposes. Hard to believe any molecule of interest would push the limits for such.


CHEBI:51439, whether that's of interest or not is of course subjective

That's a nice test model.


Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Robert Hanson
2017-04-12 11:44:10 UTC
Permalink
(and I wish they had thought of seqR and seqS)

In case anyone is interested, here is my pseudocode. The actual methods
aren't much longer than this.
John, do you agree this is the appropriate approach?

Bob

https://sourceforge.net/p/jmol/code/HEAD/tree/trunk/Jmol/src/org/jmol/symmetry/CIPChirality.java

//
// getChirality(atom) {
// if (atom.getCovalentBondCount() != 4) exit NO_CHIRALITY
// for (each Rule){
// sortSubstituents()
// if (done) exit getHandedness();
// }
// exit NO_CHIRALITY
// }
//
// sortSubstituents() {
// for (all pairs of substituents a and b) {
// score = a.compareTo(b, currentRule)
// if (score == TIED)
// score = breakTie(a,b)
// }
//
// breakTie(a,b) {
// score = compareShallowly(a, b)
// if (score != TIED) return score
// a.sortSubstituents(), b.sortSubstituents()
// return compareDeeply(a, b)
// }
//
// compareShallowly(a, b) {
// for (each substituent pairing i in a and b) {
// score = applyCurrentRule(a_i, b_i)
// if (score != TIED) return score
// }
// return TIED
// }
//
// compareDeeply(a, b) {
// for (each substituent pairing i in a and b) {
// score = breakTie(a_i, b_i)
// if (score != TIED) return score
// }
// return TIED
// }
//
Post by Bob
Wow. I love seqcis! Rule 3 was trivial
Robert M. Hanson
St. Olaf College Chemistry
from my Windows phone
------------------------------
Sent: ‎4/‎11/‎2017 7:30 AM
Subject: Fwd: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol
[sorry - forgot that this list requires "reply-all"]
---------- Forwarded message ----------
Date: Tue, Apr 11, 2017 at 7:29 AM
Subject: Re: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol
On Tue, Apr 11, 2017 at 2:37 AM, John Mayfield <
Post by John Mayfield
Post by John Mayfield
2) What did you get for the other test case, that one checks you have
the ordering ranking for atomic masses.
R.
There you go, that should also be S, ordering is: *CO, *[14CH2]C, *CC, *[H]
https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-vote/.
John, what basis in the IUPAC rules leads you to this reading? It suggests
that atoms in the nth sphere cannot be ranked until atoms in the (n+1)th
sphere are checked after application of Rule 1, even if they could be
distinguished by Rule 2. Are you suggesting that after each rule is checked
(Rule 1a, Rule 1b, Rule 2 -- or is it Rule 1(a and b), Rule 2,...?) one
must expand to the next sphere before making a decision? That seems to me
(a) unsupportable by the IUPAC rules and (b) just asking for extremely
complex code and a whole lot of unnecessary checks.
My understanding is that exhaustive application of all rules are done
within the sphere first, then the process is repeated at the next sphere.
*The ranking of each atom in the nth sphere depends in the first place on
theranking of atoms of the same branch in (n − 1)th sphere, and then by
theapplication of the Sequence Rules to it; the smaller the number, the
higher therelative ranking. (Ranking Rule 2).*
This is certainly my understanding from all the reading I have done. You
have three atoms connected to an atom. You rank those three atoms based on
the rules. Atoms that are tied are taken to the next sphere, but not
until that process is completed.
To me that is pretty clear: We apply all rules to rank all atoms in a
single sphere. Nothing here says, "Atoms in a sphere are compared pairwise,
and if they are identical, then the comparison of this pair is continued to
the next sphere. Once this depth-first relative ranking is determined, the
procedure is repeated with all pairs of the sphere." I can certainly see
where *that* reading could drive one mad.
Post by John Mayfield
Q: Is there software that does a nice job with producing digraphs from
Post by John Mayfield
SMILES?
I think I added a utility in Centres, however I've barely looked at the
code in 5 years - but am planning to brush it off and clean up now though.
BTW if you look closely, Centres is abstract and wraps around existing
toolkits - I only wrapped it around CDK though in theory you could do the
same with JMol.
Q: These all implement Rule 1b and the rest of the rules? Have they been
Post by John Mayfield
validated in some systematic, common way, so we know they don't have any
bugs?
](C(CCC1CC1)(CCC1CC1)CCC1CC1)C12CCC(CC1)CC2. If you use that molecule
you can tell whether it does/doesn't implement that rule. Without rule 1b
it should not be possible to label it. In centres you can change the rules
of the ranking: CDKPerceptor.java
<https://github.com/johnmay/centres/blob/develop/cdk/src/main/java/uk/ac/ebi/centres/cdk/CDKPerceptor.java#L77-L108>
.
Yes, that's one of the models Mikko sent me. I used it for checking Rule 1b.
Post by John Mayfield
Q: Doesn't this argue against the "Why bother doing this -- it's been
Post by John Mayfield
done seven times already" argument? Which one is IUPAC-2013-standard?
It wasn't me who said that, I'd only say don't do it because the
implementation will drive you mad :-). The "blessed" version would allow
everyone to confirm against it, as your original question asks - you want
to test yours it would be much simpler just to point to a complete one
leave it there. However from my previous testing I don't know if a complete
one exists anywhere (maybe the LHASA one: http://pubs.acs.org/doi/abs/10
.1021/ci00019a004 but of course this maybe doesn't exist anymore, will
ask them).
Supposedly ACD/Labs has a compliant CIP-determining algorithm. http://
bulletin.acscinf.org/PDFs/247nm44.pdf
Is ACD/Labs represented on this list?
Post by John Mayfield
I guess this would matter if you had 1,000,000 compounds to check; the
Post by John Mayfield
100-line algorithm (Rules 1 and 2) I wrote seems quite straightforward and
suitable for my purposes. Hard to believe any molecule of interest would
push the limits for such.
CHEBI:51439, whether that's of interest or not is of course subjective
That's a nice test model.
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Wolf Ihlenfeldt
2017-04-12 12:54:10 UTC
Permalink
Post by Robert Hanson
(and I wish they had thought of seqR and seqS)
In case anyone is interested, here is my pseudocode. The actual methods
aren't much longer than this.
John, do you agree this is the appropriate approach?
Bob
https://sourceforge.net/p/jmol/code/HEAD/tree/trunk/Jmol/src/org/jmol/symmetry/CIPChirality.java
//
// getChirality(atom) {
// if (atom.getCovalentBondCount() != 4) exit NO_CHIRALITY
Insufficient. Chiral atoms where a FEP is substituting for a ligand
are certainly important, in real-life chemistry. But of course that
heavily depends on the atom type - not every FEP stays locked in
place. Also geometry matters - you want to treat square planar
different from tetrahedral, so there needs to be at least some kind of
VSEPR analysis of prospective chiral centers.

Also I do not see any iteration here. Cases where the chirality or
non-chirality of an atom can only be determined after this has been
done in its substituent sphere (say, identical substituents but one
with a stereogenic cis-DB and one with a stereogenic trans-DB, or with
a R and a S center in the substituents) are important - and can of
course be nested to any depth, and they can appear in any atom or bond
order, so there needs to be iteration as long as anything can be
resolved, which then can lead to a possibility to resolve more
centers. Furthermode, you need to interweave CIP atomic and bond
stereodescriptor determination, they cannot be handled independently.
Post by Robert Hanson
// for (each Rule){
// sortSubstituents()
// if (done) exit getHandedness();
// }
// exit NO_CHIRALITY
// }
//
// sortSubstituents() {
// for (all pairs of substituents a and b) {
// score = a.compareTo(b, currentRule)
// if (score == TIED)
// score = breakTie(a,b)
// }
//
// breakTie(a,b) {
// score = compareShallowly(a, b)
// if (score != TIED) return score
// a.sortSubstituents(), b.sortSubstituents()
// return compareDeeply(a, b)
// }
//
// compareShallowly(a, b) {
// for (each substituent pairing i in a and b) {
// score = applyCurrentRule(a_i, b_i)
// if (score != TIED) return score
// }
// return TIED
// }
//
// compareDeeply(a, b) {
// for (each substituent pairing i in a and b) {
// score = breakTie(a_i, b_i)
// if (score != TIED) return score
// }
// return TIED
// }
//
Post by Bob
Wow. I love seqcis! Rule 3 was trivial
Robert M. Hanson
St. Olaf College Chemistry
from my Windows phone
________________________________
From: Robert Hanson
Sent: ‎4/‎11/‎2017 7:30 AM
To: BlueObelisk-Discuss
Subject: Fwd: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol
[sorry - forgot that this list requires "reply-all"]
---------- Forwarded message ----------
Date: Tue, Apr 11, 2017 at 7:29 AM
Subject: Re: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol
On Tue, Apr 11, 2017 at 2:37 AM, John Mayfield
Post by John Mayfield
Post by Bob
Post by John Mayfield
2) What did you get for the other test case, that one checks you have
the ordering ranking for atomic masses.
R.
There you go, that should also be S, ordering is: *CO, *[14CH2]C, *CC, *[H]
https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-vote/.
John, what basis in the IUPAC rules leads you to this reading? It suggests
that atoms in the nth sphere cannot be ranked until atoms in the (n+1)th
sphere are checked after application of Rule 1, even if they could be
distinguished by Rule 2. Are you suggesting that after each rule is checked
(Rule 1a, Rule 1b, Rule 2 -- or is it Rule 1(a and b), Rule 2,...?) one must
expand to the next sphere before making a decision? That seems to me (a)
unsupportable by the IUPAC rules and (b) just asking for extremely complex
code and a whole lot of unnecessary checks.
My understanding is that exhaustive application of all rules are done
within the sphere first, then the process is repeated at the next sphere.
The ranking of each atom in the nth sphere depends in the first place on the
ranking of atoms of the same branch in (n − 1)th sphere, and then by the
application of the Sequence Rules to it; the smaller the number, the higher the
relative ranking. (Ranking Rule 2).
This is certainly my understanding from all the reading I have done. You
have three atoms connected to an atom. You rank those three atoms based on
the rules. Atoms that are tied are taken to the next sphere, but not until
that process is completed.
To me that is pretty clear: We apply all rules to rank all atoms in a
single sphere. Nothing here says, "Atoms in a sphere are compared pairwise,
and if they are identical, then the comparison of this pair is continued to
the next sphere. Once this depth-first relative ranking is determined, the
procedure is repeated with all pairs of the sphere." I can certainly see
where that reading could drive one mad.
Post by John Mayfield
Post by Bob
Q: Is there software that does a nice job with producing digraphs from SMILES?
I think I added a utility in Centres, however I've barely looked at the
code in 5 years - but am planning to brush it off and clean up now though.
BTW if you look closely, Centres is abstract and wraps around existing
toolkits - I only wrapped it around CDK though in theory you could do the
same with JMol.
Post by Bob
Q: These all implement Rule 1b and the rest of the rules? Have they been
validated in some systematic, common way, so we know they don't have any
bugs?
you can tell whether it does/doesn't implement that rule. Without rule 1b it
should not be possible to label it. In centres you can change the rules of
the ranking: CDKPerceptor.java.
Yes, that's one of the models Mikko sent me. I used it for checking Rule 1b.
Post by John Mayfield
Post by Bob
Q: Doesn't this argue against the "Why bother doing this -- it's been
done seven times already" argument? Which one is IUPAC-2013-standard?
It wasn't me who said that, I'd only say don't do it because the
implementation will drive you mad :-). The "blessed" version would allow
everyone to confirm against it, as your original question asks - you want to
test yours it would be much simpler just to point to a complete one leave it
there. However from my previous testing I don't know if a complete one
http://pubs.acs.org/doi/abs/10.1021/ci00019a004 but of course this maybe
doesn't exist anymore, will ask them).
Supposedly ACD/Labs has a compliant CIP-determining algorithm.
http://bulletin.acscinf.org/PDFs/247nm44.pdf
Is ACD/Labs represented on this list?
Post by John Mayfield
Post by Bob
I guess this would matter if you had 1,000,000 compounds to check; the
100-line algorithm (Rules 1 and 2) I wrote seems quite straightforward and
suitable for my purposes. Hard to believe any molecule of interest would
push the limits for such.
CHEBI:51439, whether that's of interest or not is of course subjective
That's a nice test model.
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Wolf-D. Ihlenfeldt - Xemistry GmbH - ***@xemistry.com
Phone: +49 6174 201455 - Fax +49 6174 209665
---
xemistry gmbh – Geschäftsführer/Managing Director: Dr. W. D. Ihlenfeldt
Address: Hainholzweg 11, D-61462 Königstein, Germany
HR Königstein B7522 : Ust/VAT ID DE215316329 : DUNS 34-400-1719
Robert Hanson
2017-04-12 13:00:41 UTC
Permalink
Post by Robert Hanson
Post by Robert Hanson
(and I wish they had thought of seqR and seqS)
In case anyone is interested, here is my pseudocode. The actual methods
aren't much longer than this.
John, do you agree this is the appropriate approach?
Bob
https://sourceforge.net/p/jmol/code/HEAD/tree/trunk/
Jmol/src/org/jmol/symmetry/CIPChirality.java
Post by Robert Hanson
//
// getChirality(atom) {
// if (atom.getCovalentBondCount() != 4) exit NO_CHIRALITY
Insufficient. Chiral atoms where a FEP is substituting for a ligand
are certainly important, in real-life chemistry. But of course that
heavily depends on the atom type - not every FEP stays locked in
place. Also geometry matters - you want to treat square planar
different from tetrahedral, so there needs to be at least some kind of
VSEPR analysis of prospective chiral centers.
"Insufficient" is a relative term. I'm just showing what my algorithm does
in Jmol.

I don't know what "FEP" means.
Post by Robert Hanson
Also I do not see any iteration here.
This is only for Rules 1-3 right now. One thing at a time...
Post by Robert Hanson
Cases where the chirality or
non-chirality of an atom can only be determined after this has been
done in its substituent sphere (say, identical substituents but one
with a stereogenic cis-DB and one with a stereogenic trans-DB, or with
a R and a S center in the substituents) are important - and can of
course be nested to any depth, and they can appear in any atom or bond
order, so there needs to be iteration as long as anything can be
resolved, which then can lead to a possibility to resolve more
centers. Furthermode, you need to interweave CIP atomic and bond
stereodescriptor determination, they cannot be handled independently.
I don't doubt that. Next up is Rules 4 and 5, where this will surely
become an issue.

Revisions to the pseudocode are welcome.

Bob
Wolf Ihlenfeldt
2017-04-12 13:04:10 UTC
Permalink
Post by Robert Hanson
Post by Wolf Ihlenfeldt
Post by Robert Hanson
(and I wish they had thought of seqR and seqS)
In case anyone is interested, here is my pseudocode. The actual methods
aren't much longer than this.
John, do you agree this is the appropriate approach?
Bob
https://sourceforge.net/p/jmol/code/HEAD/tree/trunk/Jmol/src/org/jmol/symmetry/CIPChirality.java
//
// getChirality(atom) {
// if (atom.getCovalentBondCount() != 4) exit NO_CHIRALITY
Insufficient. Chiral atoms where a FEP is substituting for a ligand
are certainly important, in real-life chemistry. But of course that
heavily depends on the atom type - not every FEP stays locked in
place. Also geometry matters - you want to treat square planar
different from tetrahedral, so there needs to be at least some kind of
VSEPR analysis of prospective chiral centers.
"Insufficient" is a relative term. I'm just showing what my algorithm does
in Jmol.
I don't know what "FEP" means.
Free electron pair.

VSEPR - valence shell electron pair repulsion. The simplest method to
guess the fundamental ligand geometry.
Post by Robert Hanson
Post by Wolf Ihlenfeldt
Also I do not see any iteration here.
This is only for Rules 1-3 right now. One thing at a time...
Post by Wolf Ihlenfeldt
Cases where the chirality or
non-chirality of an atom can only be determined after this has been
done in its substituent sphere (say, identical substituents but one
with a stereogenic cis-DB and one with a stereogenic trans-DB, or with
a R and a S center in the substituents) are important - and can of
course be nested to any depth, and they can appear in any atom or bond
order, so there needs to be iteration as long as anything can be
resolved, which then can lead to a possibility to resolve more
centers. Furthermode, you need to interweave CIP atomic and bond
stereodescriptor determination, they cannot be handled independently.
I don't doubt that. Next up is Rules 4 and 5, where this will surely become
an issue.
Revisions to the pseudocode are welcome.
Bob
--
Wolf-D. Ihlenfeldt - Xemistry GmbH - ***@xemistry.com
Phone: +49 6174 201455 - Fax +49 6174 209665
---
xemistry gmbh – Geschäftsführer/Managing Director: Dr. W. D. Ihlenfeldt
Address: Hainholzweg 11, D-61462 Königstein, Germany
HR Königstein B7522 : Ust/VAT ID DE215316329 : DUNS 34-400-1719
Rzepa, Henry S
2017-04-12 13:11:47 UTC
Permalink
Can I ask a more general question? Quite a number of codes, including various commercial, <claim> to detect and assign CIP. Three that I use are Gaussview, ChemDraw and ChemDoodle. There are I am sure many others. I presume InChI in the stereochemical layer might have a go as well (?)

Rarely declared however are the heuristics behind such detection, and in particular how the “difficult” cases are handled. Might we assume caveat emptor when it comes to all these codes? I suppose one should really test them against each other?

I am sure Jmol when its entry into this fold is mature will compete with the best of them, but how many of these other codes have been tested against proper validation sets and the results reported?

Henry Rzepa, http://orcid.org/0000-0002-8635-8390
John Mayfield
2017-04-12 13:56:55 UTC
Permalink
Please read back in the discussion Henry,

I've linked multiple comparisons - and am planning on doing a more
comprehensive one for the next ACS (I'll add Gaussview to the list).
ChemDraw's is more advanced than ChemDoodle's. CIP should not be used for
finding stereocenters, canonicalizing, or comparing (matching) - as said
before *InChI does not, should not, and will never include or need a CIP
implementation*.

John
Post by Rzepa, Henry S
Can I ask a more general question? Quite a number of codes, including
various commercial, <claim> to detect and assign CIP. Three that I use
are Gaussview, ChemDraw and ChemDoodle. There are I am sure many others.
I presume InChI in the stereochemical layer might have a go as well (?)
Rarely declared however are the heuristics behind such detection, and in
particular how the “difficult” cases are handled. Might we assume caveat
emptor when it comes to all these codes? I suppose one should really test
them against each other?
I am sure Jmol when its entry into this fold is mature will compete with
the best of them, but how many of these other codes have been tested
against proper validation sets and the results reported?
Henry Rzepa, http://orcid.org/0000-0002-8635-8390
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
Loading...