[BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol

Discussion:

Robert Hanson

2017-04-09 12:42:34 UTC

[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

---------- Forwarded message ----------
From: Robert Hanson <***@stolaf.edu>
Date: Sat, Apr 8, 2017 at 8:12 PM
Subject: Re: [BlueObelisk-discuss] Cohen-Ingold-Prelog rules into Jmol
To: Mikko Vainio <***@abo.fi>

Super! Thank, Mikko. That is EXACTLY what I was looking for. Really nice
that Jmol is handling the 2D->3D and hydrogen addition correctly (for all
except one structure). I have not used that in a long time!

Especially grateful for the V2000 format. Except for one structure, all
that are appropriate to my algorithm to date (SP3 carbon only; R/S, not
r/s) validated nicely:

OK cip/gibberellin_2D.mol 3R4R5S10S13S17S18R21S
OK cip/beta-eudesmol.sdf 4S5R8R
OK cip/beta-eudesmol_3d.sdf 4S5R8R
OK cip/R/(1R)-1-cycloproply-2-methylpropan-1-ol_2D.mol 2R
OK cip/R/(1R)-1-cycloproply-2-methylpropan-1-ol_3D.mol 2R
OK cip/R/(2R)-2-hydroxybut-3-enal_3D.mol 3R
OK cip/R/(2R)-butan-2-ol_3d.mol R
OK cip/R/(3R)-pent-1-en-3-ol_2D.mol 3R
OK cip/R/(3R)-pent-1-en-3-ol_3D.mol 3R
OK cip/R/R.sdf 1R
OK cip/S/(S)-cyclobutyl(cyclopropyl)methanol_2D.mol 5S
OK cip/S/(S)-cyclobutyl(cyclopropyl)methanol_3D.mol 5S
OK cip/S/S.sdf 1S
OK cip/R/(1R)-1-cycloproply-2-methylpropan-1-ol_2D.mol 2R
OK cip/R/(1R)-1-cycloproply-2-methylpropan-1-ol_3D.mol 2R
OK cip/R/(2R)-2-hydroxybut-3-enal_3D.mol 3R
OK cip/R/(2R)-butan-2-ol_3d.mol R
OK cip/R/(3R)-pent-1-en-3-ol_2D.mol 3R
OK cip/R/(3R)-pent-1-en-3-ol_3D.mol 3R
OK cip/R/R.sdf 1R
OK cip/S/(S)-cyclobutyl(cyclopropyl)methanol_2D.mol 5S
OK cip/S/(S)-cyclobutyl(cyclopropyl)methanol_3D.mol 5S
OK cip/S/S.sdf 1S
OK cip/RS/(1R,2R)-2-chlorocyclohexanol_2d.mol 2R3R
OK cip/RS/(1R,2R)-2-chlorocyclohexanol_2d_noH.mol 2R3R
OK cip/RS/(1R,2R)-2-chlorocyclohexanol_3d.mol 2R3R
OK cip/RS/(1R,2R)-2-chlorocyclohexanol_3d_noH.mol 2R3R
OK cip/RS/(1S,5R,8S,12S,13R,15S)-12-methyl-14-oxa-18-
thiahexacyclo[blabla]octadecan-8-ol.sdf 4S5R6S7S8S13R
OK cip/RS/(2S,4aS,8aS)-8a-chloro-2-fluoro-decahydronaphthalen-4a-ol.sdf
5S6S10S
OK cip/RS/(4aR,8aS)-8a-methyl-octahydro-1H-2-benzopyran.sdf 5R6S
OK cip/RS/one-R-one-S.sdf 2R6S
OK cip/RS/_1R,2R_-2-__S_-chloro_fluoro_methyl_cyclohexan-1- ol.sdf
2R3R8S
OK cip/RS/_2R,3R_-3-methylpentan-2-ol.sdf 1R2R
OK cip/RS/_2R,3S_-3-methylpentan-2-ol.sdf 1S2R
OK cip/RS/(1R,2R,4R,5R)-cyclohexane-1,2,3,4,5-pentol_2d_noH.mol RRRR
OK cip/RS/(1R,2R,4R,5R)-cyclohexane-1,2,3,4,5-pentol_3d.mol RRRR
OK cip/RS/(1S,5R)-bicyclo[3.1.0]hex-2-ene_3D.mol RS
OK cip/gibberellin_2D.mol 3R4R5S10S13S17S18R21S
OK cip/beta-eudesmol.sdf 4S5R8R
OK cip/beta-eudesmol_3d.sdf 4S5R8R

The ONE that failed was

1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-
cyclopropylethyl)-pentan-3-yl]methan-1-ol.mol

tris-(cyclopropylethyl)methylvs cyclo[2.2.2]octane. I have no idea how to
fix that issue! What's the rule for that?

Bob

ps - A few of your names are slightly wrong, using "R/S" instead of "r/s".

Hi Bob,
I wrote a partial (2.5K lines) implementation of CIP stereocenter
perception for Balloon (http://users.abo.fi/mivainio/balloon), it handles
tetrahedral and trigonal pyramidal (R/S/r/s), double bond, and allene-like
(E/Z and axial Ra/ra/Sa/sa) stereocenters. The algorithm was implemented
according to and tested on the examples in Nomenclature of Organic
Chemistry: IUPAC Recommendations and Preferred Names 2013, Chapter P-9
Specification of Configuration and Conformation, p 1156-1292 (
http://dx.doi.org/10.1039/9781849733069-01156). As already pointed out on
the mailing list, a naive implementation of the CIP algorithm would do
depth-first graph traversal, which quickly becomes intractable for
polycyclic systems. And probably you do not need to do this at all, unless
generating names or preventing a conformer generation algorithm messing up
pseudoasymmetric centers.
As to examples, please find attached a set of sdf files with manually
checked configurations. The configurations are documented in the files as
data fields for easier automated testing. Some files miss information and
some may be wrong, this is just the snapshot of what I got at the moment,
but should get you started.
Best regards,
Mikko
P.S. I tried to send this to the list but the message was rejected due to
the attachment. If you wish, please put the files up somewhere for others
to use, too, if deemed useful.

John Mayfield

2017-04-09 15:44:28 UTC

Permalink

Hi Bob,

Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

Robert Hanson

2017-04-09 15:51:48 UTC

Permalink

No, John. Don't worry. I just happened to look at that page prior to
designing my own.

Hi Bob,

Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example
but Daniel Lowe constructed a small reproducible example to demonstrate why
this can never work.
John
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

John Mayfield

2017-04-09 16:03:03 UTC

Permalink

Good good,

Fake news before fake news - a paper published in the CCG journal by the
CCG.

John

Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <

Hi Bob,

Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

Robert Hanson

2017-04-09 17:11:22 UTC

Permalink

OK, so I am reading Chapter 9 now to see the gory details. I didn't know
about the root-distance check, and so now

1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclo
propylethyl)-pentan-3-yl]methan-1-ol.mol

is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after R/S,
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.

I think I will have to tackle that another day.

Bob

Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by the
CCG.
John

Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <

Hi Bob,

Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

Robert Hanson

2017-04-09 18:05:02 UTC

Permalink

OK, I don't get the logic of this:

Rule 1 (a) Higher atomic number precedes lower;
(b) A duplicated atom, with its predecessor node having the same label
closer
to the root, ranks higher than a duplicated atom, with its predecessor node
having the same label farther from the root, which ranks higher than any
nonduplicated-atom-node (proposed by Custer, ref. 36)

Rule 2 Higher atomic mass number precedes lower;

Seriously? root distance is checked before isotope. Sure seems odd to me.
Why would that distance check not be after atomic number and mass??

Whatever...

Bob

Post by Robert Hanson
OK, so I am reading Chapter 9 now to see the gory details. I didn't know
about the root-distance check, and so now
1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclop
ropylethyl)-pentan-3-yl]methan-1-ol.mol
is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after R/S,
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.
I think I will have to tackle that another day.
Bob
On Sun, Apr 9, 2017 at 11:03 AM, John Mayfield <

Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by the
CCG.
John

Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <

Hi Bob,

Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

Noel O'Boyle

2017-04-09 18:53:57 UTC

Permalink

We need libRS. Everyone reimplementing these rules is some type of madness.

Post by Robert Hanson
Rule 1 (a) Higher atomic number precedes lower;
(b) A duplicated atom, with its predecessor node having the same label
closer
to the root, ranks higher than a duplicated atom, with its predecessor node
having the same label farther from the root, which ranks higher than any
nonduplicated-atom-node (proposed by Custer, ref. 36)
Rule 2 Higher atomic mass number precedes lower;
Seriously? root distance is checked before isotope. Sure seems odd to me.
Why would that distance check not be after atomic number and mass??
Whatever...
Bob

Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by the
CCG.
John

Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <

Hi Bob,

Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

Robert Hanson

2017-04-09 22:53:51 UTC

Permalink

"re" implementing is a great way to find additional bugs and compare
strategies. This (to this point) took me two days. And if I started with a
"libRS" in Java, I would still have to modify it extensively to fit Jmol.
That said, I wouldn't mind taking a look at how other have implemented it.

In the mean time, is it OK for me to continue this discussion without libRS?

Post by Noel O'Boyle
We need libRS. Everyone reimplementing these rules is some type of madness.

Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by
the CCG.
John

Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield <

Hi Bob,

Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

John Mayfield

2017-04-10 12:05:49 UTC

Permalink

Noel pointed out I only sent this back to Bob.

Also why so many dots, that's considered "not good form" in SMILES

Noel O'Boyle

2017-04-10 12:10:44 UTC

Permalink

Sorry, but I have to call you out on this, especially as this is the
Blue Obelisk mailing list.

I've no problem anyone reimplementing anything for fun or profit, but
I have to disagree with the suggestion that having an N'th
implementation of the same algorithm is progress, or good for this
community. At a recent meeting at the EBI, I think there were at least
7 attendees who had written versions of this algorithm. The whole goal
of the Blue Obelisk is to pool our expertise to develop common
resources, to avoid exactly this situation.

- Noel

Post by Robert Hanson
"re" implementing is a great way to find additional bugs and compare
strategies. This (to this point) took me two days. And if I started with a
"libRS" in Java, I would still have to modify it extensively to fit Jmol.
That said, I wouldn't mind taking a look at how other have implemented it.
In the mean time, is it OK for me to continue this discussion without libRS?

Post by Noel O'Boyle
We need libRS. Everyone reimplementing these rules is some type of madness.

Post by Robert Hanson
OK, so I am reading Chapter 9 now to see the gory details. I didn't know
about the root-distance check, and so now
1-(bicyclo[2.2.2]octan-1-yl)-1-[1,5-dicyclopropyl-3(2-cyclopropylethyl)-pentan-3-yl]methan-1-ol.mol
is working. So all of this is easy enough. That's probably it for
independent stereochemistry. Where there is a dependency of one
stereochemical determination from another -- R/S after E/Z; E/Z after R/S,
E/Z after E/Z, R/S after R/S -- obviously that takes some sort of more
general iteration.
I think I will have to tackle that another day.
Bob
On Sun, Apr 9, 2017 at 11:03 AM, John Mayfield

Post by John Mayfield
Good good,
Fake news before fake news - a paper published in the CCG journal by
the CCG.
John

Post by Robert Hanson
No, John. Don't worry. I just happened to look at that page prior to
designing my own.
On Sun, Apr 9, 2017 at 10:44 AM, John Mayfield

Hi Bob,

Post by Robert Hanson
[I actually do know it is Cahn; pulled "Cohen" without thinking from
https://www.chemcomp.com/journal/chiral.htm. Serves me right. Duh!]

Was that the algorithm you implemented because it's not correct - it
doesn't (and can't) handle ghost atoms. Trying to track down the example but
Daniel Lowe constructed a small reproducible example to demonstrate why this
can never work.
John
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Robert Hanson

2017-04-10 12:50:25 UTC

Permalink

OK. That's fine. Point me to the algorithm. I'll say no more.

Post by Noel O'Boyle
Sorry, but I have to call you out on this, especially as this is the
Blue Obelisk mailing list.
I've no problem anyone reimplementing anything for fun or profit, but
I have to disagree with the suggestion that having an N'th
implementation of the same algorithm is progress, or good for this
community. At a recent meeting at the EBI, I think there were at least
7 attendees who had written versions of this algorithm. The whole goal
of the Blue Obelisk is to pool our expertise to develop common
resources, to avoid exactly this situation.
- Noel

Post by Robert Hanson
"re" implementing is a great way to find additional bugs and compare
strategies. This (to this point) took me two days. And if I started with

Post by Robert Hanson
"libRS" in Java, I would still have to modify it extensively to fit Jmol.
That said, I wouldn't mind taking a look at how other have implemented

it.

Post by Robert Hanson
In the mean time, is it OK for me to continue this discussion without

libRS?