[BlueObelisk-discuss] Proposed amendment to CIP Rule 1b

Discussion:

Robert Hanson

2017-05-15 22:47:47 UTC

I'm interested in two things. First, feedback on a proposed amendment to
CIP Rule 1b. Second, suggestions for how to officially propose this.

Current Rule 1:

*(1a) higher atomic number precedes lower;*

*(1b) a duplicate atom node whose corresponding nonduplicated atom node is
the root or closer to the root ranks higher than a duplicate atom node
whose corresponding nonduplicated atom node is farther from the root. *Said
differently but with the same meaning:

*(1a) higher atomic number precedes lower;**(1b) in comparing two duplicate
nodes, lower root distance precedes higher root distance, where "root
distance" for a duplicate node is defined as*
* the distance from its corresponding nonduplicated atom node to the root
node.*
Proposed amended rule:

*(1a) higher atomic number precedes lower;*
*(1b) in comparing two duplicate nodes, lower root distance preceded higher
root distance, where "root distance" is defined: (i) in the case of **a
duplicate atom for which the atomic number is averaged over two or more
atoms in applying Rule 1a, *

*the distance from the duplicate node itself to the root node; and (ii) in
all other cases, the distance of its corresponding nonduplicated atom node
to the root node.*
If that means nothing to you, ignore this. But it is a critically important
addition for any algorithm if it is to correctly assign the stereochemistry
even for very simple compounds based on CIP rules 1-5. For example, without
that modification, an algorithm following the rules in IUPAC BB 2013 will
arrive at "S" for the descriptor for in this compound:

[image: Inline image 1]
C1=CC=CC(O)=C1[***@H](C2=CC=CC=C2O)O

My second question is, having said that, how do I go about officially
stating this? Publish? Where?

Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

John May

2017-05-15 23:08:08 UTC

Permalink

- John

I'm interested in two things. First, feedback on a proposed amendment to CIP Rule 1b. Second, suggestions for how to officially propose this.
(1a) higher atomic number precedes lower;
(1b) a duplicate atom node whose corresponding nonduplicated atom node is the root or closer to the root ranks higher than a duplicate atom node whose corresponding nonduplicated atom node is farther from the root.
(1a) higher atomic number precedes lower;
(1b) in comparing two duplicate nodes, lower root distance precedes higher root distance, where "root distance" for a duplicate node is defined as the distance from its corresponding nonduplicated atom node to the root node.
(1a) higher atomic number precedes lower;
(i) in the case of a duplicate atom for which the atomic number is averaged over two or more atoms in applying Rule 1a, the distance from the duplicate node itself to the root node; and
(ii) in all other cases, the distance of its corresponding nonduplicated atom node to the root node.
<image.png>
My second question is, having said that, how do I go about officially stating this? Publish? Where?
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

John May

2017-05-15 23:25:24 UTC

Permalink

I need to think more about it tomorrow, I think your logic is correct but I wouldn't say it's critically important. You're conflating two procedures - a) finding stereochemistry vs b) naming it. You only need CIP for b, a is more efficiently and correctly handled with group theory.

"Everything looks like a nail to a man with a hammer"

- John

I'm interested in two things. First, feedback on a proposed amendment to CIP Rule 1b. Second, suggestions for how to officially propose this.
(1a) higher atomic number precedes lower;
(1b) a duplicate atom node whose corresponding nonduplicated atom node is the root or closer to the root ranks higher than a duplicate atom node whose corresponding nonduplicated atom node is farther from the root.
(1a) higher atomic number precedes lower;
(1b) in comparing two duplicate nodes, lower root distance precedes higher root distance, where "root distance" for a duplicate node is defined as the distance from its corresponding nonduplicated atom node to the root node.
(1a) higher atomic number precedes lower;
(i) in the case of a duplicate atom for which the atomic number is averaged over two or more atoms in applying Rule 1a, the distance from the duplicate node itself to the root node; and
(ii) in all other cases, the distance of its corresponding nonduplicated atom node to the root node.
My second question is, having said that, how do I go about officially stating this? Publish? Where?
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

John Mayfield

2017-05-16 16:16:21 UTC

Permalink

Hi Bob,

Daniel says he'd seen another example in ChEBI essentially the same as this
where if you add Rule 1b it breaks the tie when it shouldn't.

John

Post by John May
I need to think more about it tomorrow, I think your logic is correct but
I wouldn't say it's critically important. You're conflating two procedures
- a) finding stereochemistry vs b) naming it. You only need CIP for b, a is
more efficiently and correctly handled with group theory.
"Everything looks like a nail to a man with a hammer"
- John
I'm interested in two things. First, feedback on a proposed amendment to
CIP Rule 1b. Second, suggestions for how to officially propose this.
*(1a) higher atomic number precedes lower;*
*(1b) a duplicate atom node whose corresponding nonduplicated atom node is
the root or closer to the root ranks higher than a duplicate atom node
whose corresponding nonduplicated atom node is farther from the root. *Said
*(1a) higher atomic number precedes lower;**(1b) in comparing two
duplicate nodes, lower root distance precedes higher root distance, where
"root distance" for a duplicate node is defined as*
* the distance from its corresponding nonduplicated atom node to the root
node.*
*(1a) higher atomic number precedes lower;*
*(1b) in comparing two duplicate nodes, lower root distance preceded
higher root distance, where "root distance" is defined: (i) in the case of
**a duplicate atom for which the atomic number is averaged over two or
more atoms in applying Rule 1a, *
*the distance from the duplicate node itself to the root node; and (ii)
in all other cases, the distance of its corresponding nonduplicated atom
node to the root node.*
If that means nothing to you, ignore this. But it is a critically
important addition for any algorithm if it is to correctly assign the
stereochemistry even for very simple compounds based on CIP rules 1-5. For
example, without that modification, an algorithm following the rules in
[image: Inline image 1]
My second question is, having said that, how do I go about officially
stating this? Publish? Where?
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Robert Hanson

2017-05-17 03:36:58 UTC

Permalink

So you agree? Any particular reason no one has published on this? Just too
minor a detail?

Any example with two similar, functionalized benzene rings (substituted
biphenyl, for example), stands a 50% chance of failing this test. I'm quite
surprised that it wasn't discovered very early. I guess they just never
considered this Kekule issue. I think that is apparent from the Kekule fix
for Rule 1a, where it is stated:

[image: Inline image 2]

Well, that certainly is not true, is it?!

Questions for John:

Q1: Does the CDK implement the Kekule considerations required for
application of Rule 1a?

Q2: Does the CDK implement Rule 1b?

All I can find is a rudimentary atom number/mass consideration -- Rule 2
and part of Rule 1a. I can't find Rules 3, 4, or 5. I know I must be
missing something major here.

Q2: Is the CDK validation suite for CIP on the GitHub site somewhere? I
can't find it.

â
Bob

John Mayfield

2017-05-17 07:41:52 UTC

Permalink

I think I agree but need to draw out the digraph to convince my self. The
whole reason for 1b was to fix this case (I believe originally from WDI
hashcode paper IIRC):

CC(C(CCC1CC1)(CCC1CC1)CCC1CC1)C12CCC(CC1)CC2

I think splitting ties when they're the same is undesirable but worse is
naming two different things the same. As I said you can fix the first one
with a different and better algorithm. For the second you have these
examples which are different but get the same R/S labels:

C[***@H]1[C@@H](C)[C@@H](C)[C@@H](C)[***@H](C)[***@H](C)[***@H](C)[C@@H]1C
C[***@H]1[***@H](C)[C@@H](C)[***@H](C)[***@H](C)[***@H](C)[C@@H](C)[***@H]1C

Correct the CDK is rudimentary and only handles simple cases - although in
practise that is most cases :-). But as I've said multiple times the
Centres (https://github.com/johnmay/centres) one as a more complete
implementation which does 1-5 and Aux descriptions. It's a little difficult
to follow as I wrote it toolkit independant so downstream users plug in to
certain interfaces saying how to access atomic number and connected atoms
etc. The parts are spread out a bit but you can see how the rules are
configured here:
https://github.com/johnmay/centres/blob/develop/cdk/src/main/java/uk/ac/ebi/centres/cdk/CDKPerceptor.java#L78-L106

I never spent enough time on it to do the fractional bond orders, I was
considering doing it for this ACS talk but we'll see - it's a lot of effort
for very little gain. The validation of centres was done in my thesis but
there are a handful of examples here:
https://github.com/johnmay/centres/tree/develop/cdk/src/test/resources/uk/ac/ebi/centres/cdk.
Again, I was planning on putting together a comprehensive validation set
for the talk.

John

Post by Robert Hanson
So you agree? Any particular reason no one has published on this? Just
too minor a detail?
Any example with two similar, functionalized benzene rings (substituted
biphenyl, for example), stands a 50% chance of failing this test. I'm quite
surprised that it wasn't discovered very early. I guess they just never
considered this Kekule issue. I think that is apparent from the Kekule fix
[image: Inline image 2]
Well, that certainly is not true, is it?!
Q1: Does the CDK implement the Kekule considerations required for
application of Rule 1a?
Q2: Does the CDK implement Rule 1b?
All I can find is a rudimentary atom number/mass consideration -- Rule 2
and part of Rule 1a. I can't find Rules 3, 4, or 5. I know I must be
missing something major here.
Q2: Is the CDK validation suite for CIP on the GitHub site somewhere? I
can't find it.
â
Bob

Robert Hanson

2017-05-17 17:01:57 UTC

Permalink

Post by John Mayfield
I think I agree but need to draw out the digraph to convince my self. The
whole reason for 1b was to fix this case (I believe originally from WDI
CC(C(CCC1CC1)(CCC1CC1)CCC1CC1)C12CCC(CC1)CC2
I think splitting ties when they're the same is undesirable but worse is
naming two different things the same. As I said you can fix the first one
with a different and better algorithm.

? Missing this reference. Better algorithm than what? Or you mean just in
general, if you get a null result, at least you are just missing something.
Either case, I think, you need a better algorithm. :)

Post by John Mayfield
For the second you have these examples which are different but get the

Oh, that is very cool. So you think this is a failure of Rule 4b in the
IUPAC rules? Very impressive. I don't think Jmol is making any mistake
here, do you?

Post by John Mayfield
Correct the CDK is rudimentary and only handles simple cases - although in
practise that is most cases :-). But as I've said multiple times the
Centres (https://github.com/johnmay/centres) one as a more complete
implementation which does 1-5 and Aux descriptions. It's a little difficult
to follow as I wrote it toolkit independant so downstream users plug in to
certain interfaces saying how to access atomic number and connected atoms
etc. The parts are spread out a bit but you can see how the rules are
configured here: https://github.com/johnmay/centres/blob/develop/
cdk/src/main/java/uk/ac/ebi/centres/cdk/CDKPerceptor.java#L78-L106

Thanks for that link. You will find that when you implement 1a fully, you
will need to pull it apart from 1b, applying Rule 1a exhaustively before
1b. Otherwise it messes up. Pretty sure that is true with 4a, 4b, and 4c as
well. I guess that's obvious to you; took me a while to catch on to that.
Is Centres doing that? 4a and 4b are both in PairRule, right?

I think it's interesting that the Kekule consideration introduces the
situation that a duplicated atom can break a tie either in its own sphere
(due to its mass or its root distance) or in its substituents' sphere (do
to its massless phantom atoms). So the idea of a "simplified digraph" that
hides the phantom atoms and doesn't indicate duplicated atom mass or root
distance is difficult to interpret -- you have to remember to apply the
atom mass exhaustively first -- possibly moving its priority *above *its
corresponding nonduplicated atom, then root distance, then, in the next
sphere, its phantom atom masses. Very tricky.

Post by John Mayfield
I never spent enough time on it to do the fractional bond orders, I was
considering doing it for this ACS talk but we'll see - it's a lot of effort
for very little gain.

It was only about 50 lines, actually, at least since all I did was for the
important cases (6-membered rings). Feel free to utilize it, of course. You
will need it anyway for the 1b correction. Or at least, for that you will
need some sort of Kekule check. Maybe you already have that somewhere
else....

I really wish they had restricted Rule 1b to ring-type duplicated atoms
only. Alas!

Post by John Mayfield
The validation of centres was done in my thesis but there are a handful of
examples here: https://github.com/johnmay/centres/tree/develop/
cdk/src/test/resources/uk/ac/ebi/centres/cdk. Again, I was planning on
putting together a comprehensive validation set for the talk.

Great. I will incorporate those into my test suite. Are the target
designations in the files?

John Mayfield

2017-05-17 19:20:13 UTC

Permalink

Post by Robert Hanson
? Missing this reference. Better algorithm than what? Or you mean just in
general, if you get a null result, at least you are just missing something.
Either case, I think, you need a better algorithm. :)

As I said before, for marking if atoms/bonds are stereogenic you should not
use CIP. A better algorithm is based on Morgan relaxation/partition
refinement, similar to what InChI does but you can do it better.

Oh, that is very cool. So you think this is a failure of Rule 4b in the

Post by Robert Hanson
IUPAC rules? Very impressive. I don't think Jmol is making any mistake
here, do you?

It's from a German thesis that found it/proves it, see Handbook of
Cheminformatics. Yes Jmol is correct.

Thanks for that link. You will find that when you implement 1a fully, you

Post by Robert Hanson
will need to pull it apart from 1b, applying Rule 1a exhaustively before
1b. Otherwise it messes up. Pretty sure that is true with 4a, 4b, and 4c as
well. I guess that's obvious to you; took me a while to catch on to that.
Is Centres doing that? 4a and 4b are both in PairRule, right?

Yes it already does the hierarchy correctly (I feel like I'm repeating
myself but see
https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-vote/). I wrote
centres before the IUPAC document was officially published and need to
adjust/split out some rules. But the pair rule if I remember correctly was
the like vs unlike but maybe does 4a (I can't remember).

It was only about 50 lines, actually, at least since all I did was for the

Post by Robert Hanson
important cases (6-membered rings). Feel free to utilize it, of course. You
will need it anyway for the 1b correction. Or at least, for that you will
need some sort of Kekule check. Maybe you already have that somewhere
else....

What do you do for non-periodic atoms? For example

*[***@H](=O)CC

If it's a polymer (e.g. carbohydrate) you can cyclise it and then compute
CIP but this case I decided to handled by making it H < R < He < .. etc.

Great. I will incorporate those into my test suite. Are the target

Post by Robert Hanson
designations in the files?

Unfortunately not they're in a Java Test class somewhere. I'll update it to
SMILES (CXSMILES) soon which will make it easier than the CML.

Robert Hanson

2017-05-17 19:42:29 UTC

Permalink

Post by John Mayfield

Post by Robert Hanson
Oh, that is very cool. So you think this is a failure of Rule 4b in the
IUPAC rules? Very impressive. I don't think Jmol is making any mistake
here, do you?

It's from a German thesis that found it/proves it, see Handbook of
Cheminformatics. Yes Jmol is correct.
Thanks for that link. You will find that when you implement 1a fully, you

Yes it already does the hierarchy correctly (I feel like I'm repeating
myself but see https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-
vote/).

You probably are repeating yourself. Sometimes that is necessary with email
threads like this. Thanks.

Post by John Mayfield
I wrote centres before the IUPAC document was officially published and
need to adjust/split out some rules. But the pair rule if I remember
correctly was the like vs unlike but maybe does 4a (I can't remember).
It was only about 50 lines, actually, at least since all I did was for the

What do you do for non-periodic atoms? For example

It's Jmol. I don't think "non-periodic atoims" are part of CIP, are they?
In one of your examples, you have an "R" group. Jmol reads that as an atom
with 0 atom number. So that ends up R < H < C < O.

Post by John Mayfield
If it's a polymer (e.g. carbohydrate) you can cyclise it and then compute
CIP but this case I decided to handled by making it H < R < He < .. etc.
Great. I will incorporate those into my test suite. Are the target

Post by Robert Hanson
designations in the files?

Unfortunately not they're in a Java Test class somewhere. I'll update it
to SMILES (CXSMILES) soon which will make it easier than the CML.

OK, I will track those down.

Bob