Discussion:
[BlueObelisk-discuss] Jmol - CIP update
Robert Hanson
2017-04-24 03:36:50 UTC
Permalink
Jmol.___JmolVersion="14.15.2" // 4/23/17

- CIPChirality.java 633 lines all except Rule 4b (Mata) and Rule 5 for
those cases.
- adds P, S, As, Se, Sb, Te, Bi, Po trigonal pyramidal and tetrahedral
- validates on 79 known chiral compounds testing a variety of nuances
- still some opportunity for optimization
- multi-path Mata analysis data is collected; just not being parsed.

Pseudocode follows. Actual code is a bit more than this, but this is the
basic idea. It's slightly modified from my original statement in that the
returned score, rather than just 1, 0, or -1, is a number that indicates
both the winner (positive or negative) and the sphere in which that win was
achieved (magnitude). This allows efficient and rapid forward processing
through the spheres.

The idea is to do both a "shallow" (intra-sphere) and a "deep"
(inter-sphere) comparison, iterating as necessary. Extensive use of
auxiliary descriptors is made by cloning atoms on the fly and following
their paths backward rather than forward. Note that the methods
sortSubstituents, breakTie, and compareDeeply are mutually entrant, leading
to the following of the digraph exactly as per IUPAC specifications.

Rules 4 and 5 still need a little fleshing out. All the information needed
is there; I'm just not processing it. I decided to add P and S and lone
pairs today first. One thing at a time!

Wolf and John, you had concerns that a full treatment might explode in some
way, I think. I don't doubt that that might be the case -- I can't prove
otherwise. I would not be surprised if certain combinations of alkenes and
chiral centers could do something like that, but extensive use of auxiliary
descriptors is made here, and I think that removes much of that issue. What
do you think?

The code is designed to fail gracefully by not assigning some centers
rather than assigning them incorrectly. Not saying it is done -- just
saying that I am satisfied that it is quite manageable for general use.

Continued thanks to all for contributing to this discussion -- especially
the skepticism!

Bob

getChirality(molecule) {
checkForAlkenes()
if (haveAlkenes) checkForSmallRings()
for(all atoms) getChirality(applyRules1-3)
for(all double bonds) checkEZ()
for(all atoms still without designations) getChirality(applyRules4and5)
if (haveAlkenes) removeUnnecessaryEZDesignations()
}

getChirality(atom) {
for (each Rule){
sortSubstituents()
if (done) exit checkHandedness();
}
exit NO_CHIRALITY
}

sortSubstituents() {
for (all pairs of substituents a and b) {
score = a.compareTo(b, currentRule)
if (score == TIED)
score = breakTie(a,b)
}

breakTie(a,b) {
score = compareShallowly(a, b)
if (score != TIED) return score
a.sortSubstituents(), b.sortSubstituents()
return compareDeeply(a, b)
}

compareShallowly(a, b) {
for (each substituent pairing i in a and b) {
score = applyCurrentRule(a_i, b_i)
if (score != TIED) return score
}
return TIED
}

compareDeeply(a, b) {
currentScore = Integer.MAX_VALUE
for (each substituent pairing i in a and b) {
score = min(currentScore, breakTie(a_i, b_i)
}
return TIED
}
Robert Hanson
2017-04-25 16:26:30 UTC
Permalink
Basic Mata analysis is in. Still two structures (Examples 4 and 6, pp 1200
and 1208) are not validating, but I know why. Haven't implemented
multi-reference like/unlike checking or a final check for complex
pseudochirality involving Mata analysis. Both should be simple enough.
We'll see! All other structures are validating. I've asked IUPAC for their
validation set.

699 lines.

Bob
​
Robert Hanson
2017-04-26 05:39:47 UTC
Permalink
Jmol update. Rule 4 is complete. Just 1/80 structures -- Example 6, p. 1208
-- is not validating. But that's just because I have not worked on Rule 5,
and that structure is only resolved there.

Two other structures (Example 4, p. 1200 and O-methyl
(S)-(benzenesulfonothioate), P-93.2.5, page 1216 ) I believe are
incorrectly annotated in the SDF file. In the first case, it looks like an
error in choosing the reference configuration when the two branches have
different Rule-1 priorities; in the second, it looks like S=O is not
properly considered to be a single bond (so no duplicated S on O2). These
seem to me to be easy things to miss and, at least the second one, to fix.
Jmol has no problem now with the reference issue. It is handling all
components of Mata's description of Rule 4b. Again, the code for that is
highly recursive.

So I'm fairly certain that the overall approach I am using exactly
implements IUPAC 2013 Chapter 9 Rules 1-4 for all cases considered,
including S and P with lone pairs (but not imines yet). There are some
great examples there that are really demanding of an algorithm and caught
some rather subtle issues.

Now I'm definitely getting interested in what other algorithms are out
there and how they perform. Very interested in what you find, John. Are you
planning to test these other implementations using a standard validation
suite? I've put mine up at
https://sourceforge.net/p/jmol/code/HEAD/tree/trunk/Jmol-datafiles/cip, but
I think it's missing some key tests.

I'm hoping to finish Rule 5 sometime tomorrow or Thursday.

743 lines.

Bob
Post by Robert Hanson
Basic Mata analysis is in. Still two structures (Examples 4 and 6, pp 1200
and 1208) are not validating, but I know why. Haven't implemented
multi-reference like/unlike checking or a final check for complex
pseudochirality involving Mata analysis. Both should be simple enough.
We'll see! All other structures are validating. I've asked IUPAC for their
validation set.
699 lines.
Bob
​
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Robert Hanson
2017-04-27 05:21:53 UTC
Permalink
Rule 5 is done. Fully validating using the first validation set that Mikko
sent me (86 compounds, roughly, some 2D/3D duplicates). I'm sure there are
more cases it needs testing with, though.

My algorithm implementation handles Rules 4 and 5 lexicographically so that
a simple Array.sort(String[]) does the job. Kind of interesting, perhaps.
​
763 lines, Wolf.

Bob
Robert Hanson
2017-04-29 19:54:30 UTC
Permalink
OK, final update for a while. (816 lines, Wolf. I have no idea how that
compares to yours or others' algorithms.)

Open-source, validated IUPAC 2013 preferred IUPAC name (PIN) stereochemical
designations.

Jmol.___JmolVersion="14.15.2" // 4/29/17

-- 816 lines
-- validation data are at https://sourceforge.net/p/
jmol/code/HEAD/tree/trunk/Jmol-datafiles/cip/
-- validates for 160 structures (some duplicates; both cip_examples.zip
and stereo_test_cases.sdf)
-- validates for all cases considered:
-- simple R/S and E/Z
-- small-ring removal of E/Z
-- parallel-strand Rule 4b and Rule 5 (Mata)
-- pseudochiral r/s and m/p
-- odd and even cumulenes
-- atropisomers
-- P, S, As, Se, Sb, Te, Bi, Po trigonal pyramidal and tetrahedral
-- imine and diazine E/Z chirality

The algorithm will fail for some more complex nested aspects of Rule 4b. I
decided to be satisfied for now with only those examples in IUPAC Blue Book
2013 Chapter 9. My understanding is that even ACD/Labs did not fully
implement Rules 4 and 5 much beyond that.

Working version in JavaScript can be tested at https://chemapps.stolaf.edu/
jmol/jsmol/jsmetest2.htm
Binary and source at https://sourceforge.net/projects/jmol/files/Jmol/

A great challenge for April!

Bob
Post by Robert Hanson
Rule 5 is done. Fully validating using the first validation set that Mikko
sent me (86 compounds, roughly, some 2D/3D duplicates). I'm sure there are
more cases it needs testing with, though.
My algorithm implementation handles Rules 4 and 5 lexicographically so
that a simple Array.sort(String[]) does the job. Kind of interesting,
perhaps.
​
763 lines, Wolf.
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
John Mayfield
2017-04-30 09:38:14 UTC
Permalink
Post by Robert Hanson
The algorithm will fail for some more complex nested aspects of Rule 4b. I
decided to be satisfied for now with only those examples in IUPAC Blue Book
2013 Chapter 9. My understanding is that even ACD/Labs did not fully
implement Rules 4 and 5 much beyond that.
When you say fail, does it give no answer or the wrong answer?
Robert Hanson
2017-04-30 12:57:57 UTC
Permalink
Unfortunately, the wrong answer. At least for now. The problem is with the
mixed mode like/unlike with R/seqCis, R/M, etc. in Rule 4b. I haven't
implemented those at all. When I tried, I got crazy results.

I would like to do some serious cross-checking with a new Rule 4b test
suite. John, sounds like this is something you want to bring up in your
presentation. There's no doubt that if nothing else we need a discussion of
the IUPAC 2013 rules. Maybe an "open" spec, but really to me that means an
IUPAC project. But maybe others aren't interested in that. How do you want
to proceed? You sort of "claimed" this space. I don't want to step on your
toes. Sorry I can't make the fall ACS meeting this year.

Personally, I think we should do better than passing structures back and
forth over this list. But I'm not in a position to be able to run all these
tests on other systems; just Jmol. (And, of course, you don't need me to do
that.) At the very least, we should be using SMILES, not images. While
we're at it, what about this one?

C/1=C\C[CH@](C)C/C=C/C[CH@](C)C\1

Bob
Post by John Mayfield
Post by Robert Hanson
The algorithm will fail for some more complex nested aspects of Rule 4b.
I decided to be satisfied for now with only those examples in IUPAC Blue
Book 2013 Chapter 9. My understanding is that even ACD/Labs did not fully
implement Rules 4 and 5 much beyond that.
When you say fail, does it give no answer or the wrong answer?
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Robert Hanson
2017-05-09 18:48:27 UTC
Permalink
OK, 967 lines. This now includes chiral bridgehead nitrogens and is just
missing a small piece of code for integrating M/P and seqCis/seqTrans into
Rule 4b. The algorithm is solid. No particular issues other than that. Does
not implement atom-number averaging for mancude rings, but does remove
Kekule dependencies for aromatic rings. This last bit required a slight
additional "friendly amendment" to Rule 1b, which I don't think was crafted
with aromatics in mind. Pseudocode is still pretty much the same as
initially described. There are no issues with multi-center dependencies.

Validates on the full 236-compound Chapter 9 validation suite from ACD/Labs
except:

var skip = ({27 229}) || // ignore -- BB has E/Z only; missing chirality
({95 96 98 99 100 101 102 103 104 108 109 110 111 112 200}) ||
// trigonal planar, square planar, or hypervalent
({32 33}) || // helicene
({212 213})|| // chiral conformation 1,4-benzene in a ring
({38}) || // ignoring -- Mancude for cyclopentadienyl -- will
require some thought
({170}) // failing (mixed Rule 4b)

In the process, I have found four erroneous assignments in IUPAC Blue Book
2013. These are being checked by IUPAC. There's actually quite a large
errata page for that book already.

With regards to an "Open" CIP -- I strongly suggest not going there. If you
are seriously interested in this, join/form an IUPAC project.

Bob
Post by Robert Hanson
OK, final update for a while. (816 lines, Wolf. I have no idea how that
compares to yours or others' algorithms.)
Open-source, validated IUPAC 2013 preferred IUPAC name (PIN)
stereochemical designations.
Jmol.___JmolVersion="14.15.2" // 4/29/17
-- 816 lines
-- validation data are at https://sourceforge.net/p/jmol
/code/HEAD/tree/trunk/Jmol-datafiles/cip/
-- validates for 160 structures (some duplicates; both cip_examples.zip
and stereo_test_cases.sdf)
-- simple R/S and E/Z
-- small-ring removal of E/Z
-- parallel-strand Rule 4b and Rule 5 (Mata)
-- pseudochiral r/s and m/p
-- odd and even cumulenes
-- atropisomers
-- P, S, As, Se, Sb, Te, Bi, Po trigonal pyramidal and tetrahedral
-- imine and diazine E/Z chirality
The algorithm will fail for some more complex nested aspects of Rule 4b. I
decided to be satisfied for now with only those examples in IUPAC Blue Book
2013 Chapter 9. My understanding is that even ACD/Labs did not fully
implement Rules 4 and 5 much beyond that.
Working version in JavaScript can be tested at
https://chemapps.stolaf.edu/jmol/jsmol/jsmetest2.htm
Binary and source at https://sourceforge.net/projects/jmol/files/Jmol/
A great challenge for April!
Bob
Post by Robert Hanson
Rule 5 is done. Fully validating using the first validation set that
Mikko sent me (86 compounds, roughly, some 2D/3D duplicates). I'm sure
there are more cases it needs testing with, though.
My algorithm implementation handles Rules 4 and 5 lexicographically so
that a simple Array.sort(String[]) does the job. Kind of interesting,
perhaps.
​
763 lines, Wolf.
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
John Mayfield
2017-05-09 20:19:05 UTC
Permalink
Post by Robert Hanson
With regards to an "Open" CIP -- I strongly suggest not going there. If
you are seriously interested in this, join/form an IUPAC project.
For me the main motivation is to not reinvent the wheel, or perhaps not
reinvent the wheel worse than it already was. The less people have to think
and worry about CIP the better.

Maybe an IUPAC project is the way to go there... but a stepping stone is to
have a 'standard/verified' implementation (i.e. Open CIP) first. You've
done well on your implementation and it may be one to recomend as a
'verified' version but if we do nothing in 2 years time someone else will
come along and waste more time with the same bugs/corner cases.

- You've still not done the iteration quite right, you only give 4 labels
for this one: [C@]3([C@]1([H])CC[C@@]([H])(C)CC1)([C@]2([H])CC[C@
@]([H])(C)CC2)CC[C@]([H])(C)CC3
- What CIP labels do you get for CHEBI:51439
<http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A51439>? I believe a
correct implementation should never finish... or at least run of memory
when building the digraph.
Post by Robert Hanson
OK, 967 lines. This now includes chiral bridgehead nitrogens and is just
missing a small piece of code for integrating M/P and seqCis/seqTrans into
Rule 4b. The algorithm is solid. No particular issues other than that. Does
not implement atom-number averaging for mancude rings, but does remove
Kekule dependencies for aromatic rings. This last bit required a slight
additional "friendly amendment" to Rule 1b, which I don't think was crafted
with aromatics in mind. Pseudocode is still pretty much the same as
initially described. There are no issues with multi-center dependencies.
Validates on the full 236-compound Chapter 9 validation suite from
var skip = ({27 229}) || // ignore -- BB has E/Z only; missing chirality
({95 96 98 99 100 101 102 103 104 108 109 110 111 112 200}) ||
// trigonal planar, square planar, or hypervalent
({32 33}) || // helicene
({212 213})|| // chiral conformation 1,4-benzene in a ring
({38}) || // ignoring -- Mancude for cyclopentadienyl -- will
require some thought
({170}) // failing (mixed Rule 4b)
In the process, I have found four erroneous assignments in IUPAC Blue Book
2013. These are being checked by IUPAC. There's actually quite a large
errata page for that book already.
With regards to an "Open" CIP -- I strongly suggest not going there. If
you are seriously interested in this, join/form an IUPAC project.
Bob
Post by Robert Hanson
OK, final update for a while. (816 lines, Wolf. I have no idea how that
compares to yours or others' algorithms.)
Open-source, validated IUPAC 2013 preferred IUPAC name (PIN)
stereochemical designations.
Jmol.___JmolVersion="14.15.2" // 4/29/17
-- 816 lines
-- validation data are at https://sourceforge.net/p/jmol
/code/HEAD/tree/trunk/Jmol-datafiles/cip/
-- validates for 160 structures (some duplicates; both cip_examples.zip
and stereo_test_cases.sdf)
-- simple R/S and E/Z
-- small-ring removal of E/Z
-- parallel-strand Rule 4b and Rule 5 (Mata)
-- pseudochiral r/s and m/p
-- odd and even cumulenes
-- atropisomers
-- P, S, As, Se, Sb, Te, Bi, Po trigonal pyramidal and tetrahedral
-- imine and diazine E/Z chirality
The algorithm will fail for some more complex nested aspects of Rule 4b.
I decided to be satisfied for now with only those examples in IUPAC Blue
Book 2013 Chapter 9. My understanding is that even ACD/Labs did not fully
implement Rules 4 and 5 much beyond that.
Working version in JavaScript can be tested at
https://chemapps.stolaf.edu/jmol/jsmol/jsmetest2.htm
Binary and source at https://sourceforge.net/projects/jmol/files/Jmol/
A great challenge for April!
Bob
Post by Robert Hanson
Rule 5 is done. Fully validating using the first validation set that
Mikko sent me (86 compounds, roughly, some 2D/3D duplicates). I'm sure
there are more cases it needs testing with, though.
My algorithm implementation handles Rules 4 and 5 lexicographically so
that a simple Array.sort(String[]) does the job. Kind of interesting,
perhaps.
​
763 lines, Wolf.
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
John Mayfield
2017-05-09 20:23:47 UTC
Permalink
Post by Robert Hanson
For
Forgot to check the symmetry when I wrote it. Now correct and same problem,
the previous one should have 4 centres/labels, this one should have 6.
Post by Robert Hanson
With regards to an "Open" CIP -- I strongly suggest not going there. If
Post by Robert Hanson
you are seriously interested in this, join/form an IUPAC project.
For me the main motivation is to not reinvent the wheel, or perhaps not
reinvent the wheel worse than it already was. The less people have to think
and worry about CIP the better.
Maybe an IUPAC project is the way to go there... but a stepping stone is
to have a 'standard/verified' implementation (i.e. Open CIP) first. You've
done well on your implementation and it may be one to recomend as a
'verified' version but if we do nothing in 2 years time someone else will
come along and waste more time with the same bugs/corner cases.
- You've still not done the iteration quite right, you only give 4 labels
- What CIP labels do you get for CHEBI:51439
<http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A51439>? I believe
a correct implementation should never finish... or at least run of memory
when building the digraph.
Post by Robert Hanson
OK, 967 lines. This now includes chiral bridgehead nitrogens and is just
missing a small piece of code for integrating M/P and seqCis/seqTrans into
Rule 4b. The algorithm is solid. No particular issues other than that. Does
not implement atom-number averaging for mancude rings, but does remove
Kekule dependencies for aromatic rings. This last bit required a slight
additional "friendly amendment" to Rule 1b, which I don't think was crafted
with aromatics in mind. Pseudocode is still pretty much the same as
initially described. There are no issues with multi-center dependencies.
Validates on the full 236-compound Chapter 9 validation suite from
var skip = ({27 229}) || // ignore -- BB has E/Z only; missing chirality
({95 96 98 99 100 101 102 103 104 108 109 110 111 112 200}) ||
// trigonal planar, square planar, or hypervalent
({32 33}) || // helicene
({212 213})|| // chiral conformation 1,4-benzene in a ring
({38}) || // ignoring -- Mancude for cyclopentadienyl -- will
require some thought
({170}) // failing (mixed Rule 4b)
In the process, I have found four erroneous assignments in IUPAC Blue
Book 2013. These are being checked by IUPAC. There's actually quite a large
errata page for that book already.
With regards to an "Open" CIP -- I strongly suggest not going there. If
you are seriously interested in this, join/form an IUPAC project.
Bob
Post by Robert Hanson
OK, final update for a while. (816 lines, Wolf. I have no idea how that
compares to yours or others' algorithms.)
Open-source, validated IUPAC 2013 preferred IUPAC name (PIN)
stereochemical designations.
Jmol.___JmolVersion="14.15.2" // 4/29/17
-- 816 lines
-- validation data are at https://sourceforge.net/p/jmol
/code/HEAD/tree/trunk/Jmol-datafiles/cip/
-- validates for 160 structures (some duplicates; both cip_examples.zip
and stereo_test_cases.sdf)
-- simple R/S and E/Z
-- small-ring removal of E/Z
-- parallel-strand Rule 4b and Rule 5 (Mata)
-- pseudochiral r/s and m/p
-- odd and even cumulenes
-- atropisomers
-- P, S, As, Se, Sb, Te, Bi, Po trigonal pyramidal and tetrahedral
-- imine and diazine E/Z chirality
The algorithm will fail for some more complex nested aspects of Rule 4b.
I decided to be satisfied for now with only those examples in IUPAC Blue
Book 2013 Chapter 9. My understanding is that even ACD/Labs did not fully
implement Rules 4 and 5 much beyond that.
Working version in JavaScript can be tested at
https://chemapps.stolaf.edu/jmol/jsmol/jsmetest2.htm
Binary and source at https://sourceforge.net/projects/jmol/files/Jmol/
A great challenge for April!
Bob
Post by Robert Hanson
Rule 5 is done. Fully validating using the first validation set that
Mikko sent me (86 compounds, roughly, some 2D/3D duplicates). I'm sure
there are more cases it needs testing with, though.
My algorithm implementation handles Rules 4 and 5 lexicographically so
that a simple Array.sort(String[]) does the job. Kind of interesting,
perhaps.
​
763 lines, Wolf.
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
Robert Hanson
2017-05-09 21:43:21 UTC
Permalink
Thanks, again, John. That fix is checked in. I had forgotten to check for
r and s at other than the root atom.

That reminds me to say that the BB validation suite is missing a lot of
good tests such as this one. So one really great contribution would be to
create an open validation set that would go far beyond the examples in the
BB.

Bob
John Mayfield
2017-05-09 22:21:45 UTC
Permalink
And the CHEBI one? :p
Post by Robert Hanson
Thanks, again, John. That fix is checked in. I had forgotten to check for
r and s at other than the root atom.
That reminds me to say that the BB validation suite is missing a lot of
good tests such as this one. So one really great contribution would be to
create an open validation set that would go far beyond the examples in the
BB.
Bob
Robert Hanson
2017-05-10 03:04:19 UTC
Permalink
yeah! I'll pass on that one -- after all, you said, "if it's a good
algorithm, this one will blow up."
Post by John Mayfield
And the CHEBI one? :p
Post by Robert Hanson
Thanks, again, John. That fix is checked in. I had forgotten to check
for r and s at other than the root atom.
That reminds me to say that the BB validation suite is missing a lot of
good tests such as this one. So one really great contribution would be to
create an open validation set that would go far beyond the examples in the
BB.
Bob
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Robert Hanson
2017-05-11 15:01:07 UTC
Permalink
Final Jmol update for a while. The next release of Jmol will have a CIP
algorithm implementation that passes all the tests I have at this time
(ACD/Labs 236-model test suite + Mikko's 64-model suite). There are many
errors/omissions in the ACD/Labs suite, and I have not had the time to
check every one of them against the actual text of the BB. So all I can
really say is that in every case that I have checked the BB text, Jmol
agrees -- or I have submitted a correction report with a digraph analysis
-- and in every case that it disagrees with the ACD/Labs suite, it is the
ACD/Labs suite that is mistaken, not Jmol.

Thanks again. John for the great test cases. Do keep sending me any tough
cases you might have.

1018 lines completes Rule 4b (for now!)

Bob
Post by Robert Hanson
yeah! I'll pass on that one -- after all, you said, "if it's a good
algorithm, this one will blow up."
Post by John Mayfield
And the CHEBI one? :p
Post by Robert Hanson
Thanks, again, John. That fix is checked in. I had forgotten to check
for r and s at other than the root atom.
That reminds me to say that the BB validation suite is missing a lot of
good tests such as this one. So one really great contribution would be to
create an open validation set that would go far beyond the examples in the
BB.
Bob
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Chalk, Stuart
2017-05-09 20:27:38 UTC
Permalink
Bob, John and the group

If there is interest in starting up an ‘Open CIP’ project I can offer space on the newly minted IUPAC GitHub account.
Let me know if that would be of use and if so who would be the admin for the project. We can invite collaborators after that


Stuart

On May 9, 2017, at 4:19 PM, John Mayfield <***@gmail.com<mailto:***@gmail.com>> wrote:

With regards to an "Open" CIP -- I strongly suggest not going there. If you are seriously interested in this, join/form an IUPAC project.

For me the main motivation is to not reinvent the wheel, or perhaps not reinvent the wheel worse than it already was. The less people have to think and worry about CIP the better.

Maybe an IUPAC project is the way to go there... but a stepping stone is to have a 'standard/verified' implementation (i.e. Open CIP) first. You've done well on your implementation and it may be one to recomend as a 'verified' version but if we do nothing in 2 years time someone else will come along and waste more time with the same bugs/corner cases.

- You've still not done the iteration quite right, you only give 4 labels for this one: [C@]3([C@]1([H])CC[C@@]([H])(C)CC1)([C@]2([H])CC[C@@]([H])(C)CC2)CC[C@]([H])(C)CC3
- What CIP labels do you get for CHEBI:51439<http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A51439>? I believe a correct implementation should never finish... or at least run of memory when building the digraph.

On 9 May 2017 at 19:48, Robert Hanson <***@stolaf.edu<mailto:***@stolaf.edu>> wrote:
OK, 967 lines. This now includes chiral bridgehead nitrogens and is just missing a small piece of code for integrating M/P and seqCis/seqTrans into Rule 4b. The algorithm is solid. No particular issues other than that. Does not implement atom-number averaging for mancude rings, but does remove Kekule dependencies for aromatic rings. This last bit required a slight additional "friendly amendment" to Rule 1b, which I don't think was crafted with aromatics in mind. Pseudocode is still pretty much the same as initially described. There are no issues with multi-center dependencies.

Validates on the full 236-compound Chapter 9 validation suite from ACD/Labs except:

var skip = ({27 229}) || // ignore -- BB has E/Z only; missing chirality
({95 96 98 99 100 101 102 103 104 108 109 110 111 112 200}) || // trigonal planar, square planar, or hypervalent
({32 33}) || // helicene
({212 213})|| // chiral conformation 1,4-benzene in a ring
({38}) || // ignoring -- Mancude for cyclopentadienyl -- will require some thought
({170}) // failing (mixed Rule 4b)

In the process, I have found four erroneous assignments in IUPAC Blue Book 2013. These are being checked by IUPAC. There's actually quite a large errata page for that book already.

With regards to an "Open" CIP -- I strongly suggest not going there. If you are seriously interested in this, join/form an IUPAC project.

Bob


On Sat, Apr 29, 2017 at 2:54 PM, Robert Hanson <***@stolaf.edu<mailto:***@stolaf.edu>> wrote:
OK, final update for a while. (816 lines, Wolf. I have no idea how that compares to yours or others' algorithms.)

Open-source, validated IUPAC 2013 preferred IUPAC name (PIN) stereochemical designations.

Jmol.___JmolVersion="14.15.2" // 4/29/17

-- 816 lines
-- validation data are at https://sourceforge.net/p/jmol/code/HEAD/tree/trunk/Jmol-datafiles/cip/
-- validates for 160 structures (some duplicates; both cip_examples.zip and stereo_test_cases.sdf)
-- validates for all cases considered:
-- simple R/S and E/Z
-- small-ring removal of E/Z
-- parallel-strand Rule 4b and Rule 5 (Mata)
-- pseudochiral r/s and m/p
-- odd and even cumulenes
-- atropisomers
-- P, S, As, Se, Sb, Te, Bi, Po trigonal pyramidal and tetrahedral
-- imine and diazine E/Z chirality

The algorithm will fail for some more complex nested aspects of Rule 4b. I decided to be satisfied for now with only those examples in IUPAC Blue Book 2013 Chapter 9. My understanding is that even ACD/Labs did not fully implement Rules 4 and 5 much beyond that.

Working version in JavaScript can be tested at https://chemapps.stolaf.edu/jmol/jsmol/jsmetest2.htm
Binary and source at https://sourceforge.net/projects/jmol/files/Jmol/

A great challenge for April!

Bob



On Thu, Apr 27, 2017 at 12:21 AM, Robert Hanson <***@stolaf.edu<mailto:***@stolaf.edu>> wrote:
Rule 5 is done. Fully validating using the first validation set that Mikko sent me (86 compounds, roughly, some 2D/3D duplicates). I'm sure there are more cases it needs testing with, though.

My algorithm implementation handles Rules 4 and 5 lexicographically so that a simple Array.sort(String[]) does the job. Kind of interesting, perhaps.
​
763 lines, Wolf.

Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-***@lists.sourceforge.net<mailto:Blueobelisk-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-***@lists.sourceforge.net<mailto:Blueobelisk-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
John Mayfield
2017-05-13 11:18:26 UTC
Permalink
Thanks Stuart,

I've submitted an abstract to the Fall ACS, intending to see if there's
interest and will look at possible options after that.

John
Post by Chalk, Stuart
Bob, John and the group
If there is interest in starting up an ‘Open CIP’ project I can offer
space on the newly minted IUPAC GitHub account.
Let me know if that would be of use and if so who would be the admin for
the project. We can invite collaborators after that

Stuart
With regards to an "Open" CIP -- I strongly suggest not going there. If
Post by Robert Hanson
you are seriously interested in this, join/form an IUPAC project.
For me the main motivation is to not reinvent the wheel, or perhaps not
reinvent the wheel worse than it already was. The less people have to think
and worry about CIP the better.
Maybe an IUPAC project is the way to go there... but a stepping stone is
to have a 'standard/verified' implementation (i.e. Open CIP) first. You've
done well on your implementation and it may be one to recomend as a
'verified' version but if we do nothing in 2 years time someone else will
come along and waste more time with the same bugs/corner cases.
- You've still not done the iteration quite right, you only give 4 labels
- What CIP labels do you get for CHEBI:51439
<http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A51439>? I believe
a correct implementation should never finish... or at least run of memory
when building the digraph.
Post by Robert Hanson
OK, 967 lines. This now includes chiral bridgehead nitrogens and is just
missing a small piece of code for integrating M/P and seqCis/seqTrans into
Rule 4b. The algorithm is solid. No particular issues other than that. Does
not implement atom-number averaging for mancude rings, but does remove
Kekule dependencies for aromatic rings. This last bit required a slight
additional "friendly amendment" to Rule 1b, which I don't think was crafted
with aromatics in mind. Pseudocode is still pretty much the same as
initially described. There are no issues with multi-center dependencies.
var skip = ({27 229}) || // ignore -- BB has E/Z only; missing chirality
({95 96 98 99 100 101 102 103 104 108 109 110 111 112 200}) ||
// trigonal planar, square planar, or hypervalent
({32 33}) || // helicene
({212 213})|| // chiral conformation 1,4-benzene in a ring
({38}) || // ignoring -- Mancude for cyclopentadienyl -- will
require some thought
({170}) // failing (mixed Rule 4b)
In the process, I have found four erroneous assignments in IUPAC Blue
Book 2013. These are being checked by IUPAC. There's actually quite a large
errata page for that book already.
With regards to an "Open" CIP -- I strongly suggest not going there. If
you are seriously interested in this, join/form an IUPAC project.
Bob
Post by Robert Hanson
OK, final update for a while. (816 lines, Wolf. I have no idea how that
compares to yours or others' algorithms.)
Open-source, validated IUPAC 2013 preferred IUPAC name (PIN)
stereochemical designations.
Jmol.___JmolVersion="14.15.2" // 4/29/17
-- 816 lines
-- validation data are at https://sourceforge.net/p/jmol
/code/HEAD/tree/trunk/Jmol-datafiles/cip/
-- validates for 160 structures (some duplicates; both cip_examples.zip
and stereo_test_cases.sdf)
-- simple R/S and E/Z
-- small-ring removal of E/Z
-- parallel-strand Rule 4b and Rule 5 (Mata)
-- pseudochiral r/s and m/p
-- odd and even cumulenes
-- atropisomers
-- P, S, As, Se, Sb, Te, Bi, Po trigonal pyramidal and tetrahedral
-- imine and diazine E/Z chirality
The algorithm will fail for some more complex nested aspects of Rule 4b.
I decided to be satisfied for now with only those examples in IUPAC Blue
Book 2013 Chapter 9. My understanding is that even ACD/Labs did not fully
implement Rules 4 and 5 much beyond that.
Working version in JavaScript can be tested at
https://chemapps.stolaf.edu/jmol/jsmol/jsmetest2.htm
Binary and source at https://sourceforge.net/projects/jmol/files/Jmol/
A great challenge for April!
Bob
Post by Robert Hanson
Rule 5 is done. Fully validating using the first validation set that
Mikko sent me (86 compounds, roughly, some 2D/3D duplicates). I'm sure
there are more cases it needs testing with, though.
My algorithm implementation handles Rules 4 and 5 lexicographically so
that a simple Array.sort(String[]) does the job. Kind of interesting,
perhaps.
​
763 lines, Wolf.
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
_________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
Robert Hanson
2017-05-14 04:43:15 UTC
Permalink
OK, final (?) report. Always a good sign when code simplifies. I've
tightened up the algorithm, which passes all pertinent Chapter 9 tests,
plus several more. After removing unnecessary code, Jmol's CIP
implementation is now back to 970 lines, even with added (minimal) Kekule
considerations and full R/S, seqCis/seqTrans, and M/P mixing for Rules 4b
and 5. It would not take much more to add helicene identification.

I'm happy to report that the following simplified pseudocode is sufficient.
There is nothing magical here. The successful one-pass use of auxiliary
descriptors and finite digraphs should put to rest any concerns that CIP
determination of stereochemical descriptors has any cyclical dependencies
or routinely blows up. Except, perhaps, due to too many atoms and too high
symmetry. Every program will have its limits in this regard, of course, and
I think this algorithm could certainly be made more efficient. In any case,
the algorithm I have implemented demonstrates that this process can be a
one-pass process through all eight rules: 1a, 1b, 2, 3, 4a, 4b, 4c, and 5;
once through does it.

There are some nuances that are problematic -- a dependency on generating
multiple Kekule models, and a problem with Rule 1b as currently stated
actually introducing its own Kekule problems. But Rule 4b looks to me now
to be no major problem. A bit complicated, for sure, but not so bad in the
end. And I'm sure John will find some issues with what I have here. The
key, as Peter Murray-Rust mentioned, is the construction of a very good set
of test structures. There's more testing to do. I don't believe the
structures that are in papers or in the IUPAC 2013 Blue Book are hardly
enough to cover the bases. So I have no doubt that I missed something here,
with the limited number of examples available to me. But I'm confident that
the overall strategy is sound, and that additional issues will be minor.

Take a look; let me know what you think. Thanks again for all the great
comments and especially for great test cases.

Bob

// getChirality(molecule) {
// prefilterAtoms()
// checkForAlkenes()
// checkForSmallRings()
// checkForBridgeheadNitrogens()
// checkForKekuleIssues()
// checkForAtropisomerism()
// for(all filtered atoms) getAtomChirality(atom)
// if (haveAlkenes) {
// for(all double bonds) getBondChirality(a1, a2)
// removeUnnecessaryEZDesignations()
// }
// }
//
// getAtomChirality(atom) {
// for (each Rule){
// sortSubstituents()
// if (done) return checkHandedness();
// }
// return NO_CHIRALITY
// }
//
// getBondChirality(a1, a2) {
// atop = getAlkeneEndTopPriority(a1)
// btop = getAlkeneEndTopPriority(a2)
// return (atop >= 0 && btop >= 0 ? getEneChirality(atop, a1, a2,
btop) : NO_CHIRALITY)
// }
//
// sortSubstituents() {
// for (all pairs of substituents a1 and a2) {
// score = a1.compareTo(a2, currentRule)
// if (score == TIED)
// score = breakTie(a1,a2)
// }
//
// breakTie(a,b) {
// score = compareShallowly(a, b)
// if (score != TIED) return score
// a.sortSubstituents(), b.sortSubstituents()
// return compareDeeply(a, b)
// }
//
// compareShallowly(a, b) {
// for (each substituent pairing i in a and b) {
// score = applyCurrentRule(a_i, b_i)
// if (score != TIED) return score
// }
// return TIED
// }
//
// compareDeeply(a, b) {
// bestScore = Integer.MAX_VALUE
// for (each substituent pairing i in a and b) {
// bestScore = min(bestScore, breakTie(a_i, b_i)
// }
// return bestScore
// }
Robert Hanson
2017-05-17 19:46:18 UTC
Permalink
* 5/17/16 Jmol 14.15.5. adds helicene M/P chirality; 959 lines
* validated using CCDC
* structures HEXHEL02 HEXHEL03 HEXHEL04 ODAGOS ODAHAF
* http://pubs.rsc.org/en/content/articlehtml/2017/CP/C6CP07552E
Post by Robert Hanson
OK, final (?) report. Always a good sign when code simplifies. I've
tightened up the algorithm, which passes all pertinent Chapter 9 tests,
plus several more. After removing unnecessary code, Jmol's CIP
implementation is now back to 970 lines, even with added (minimal) Kekule
considerations and full R/S, seqCis/seqTrans, and M/P mixing for Rules 4b
and 5. It would not take much more to add helicene identification.
I'm happy to report that the following simplified pseudocode is
sufficient. There is nothing magical here. The successful one-pass use of
auxiliary descriptors and finite digraphs should put to rest any concerns
that CIP determination of stereochemical descriptors has any cyclical
dependencies or routinely blows up. Except, perhaps, due to too many atoms
and too high symmetry. Every program will have its limits in this regard,
of course, and I think this algorithm could certainly be made more
efficient. In any case, the algorithm I have implemented demonstrates that
this process can be a one-pass process through all eight rules: 1a, 1b, 2,
3, 4a, 4b, 4c, and 5; once through does it.
There are some nuances that are problematic -- a dependency on generating
multiple Kekule models, and a problem with Rule 1b as currently stated
actually introducing its own Kekule problems. But Rule 4b looks to me now
to be no major problem. A bit complicated, for sure, but not so bad in the
end. And I'm sure John will find some issues with what I have here. The
key, as Peter Murray-Rust mentioned, is the construction of a very good set
of test structures. There's more testing to do. I don't believe the
structures that are in papers or in the IUPAC 2013 Blue Book are hardly
enough to cover the bases. So I have no doubt that I missed something here,
with the limited number of examples available to me. But I'm confident that
the overall strategy is sound, and that additional issues will be minor.
Take a look; let me know what you think. Thanks again for all the great
comments and especially for great test cases.
Bob
// getChirality(molecule) {
// prefilterAtoms()
// checkForAlkenes()
// checkForSmallRings()
// checkForBridgeheadNitrogens()
// checkForKekuleIssues()
// checkForAtropisomerism()
// for(all filtered atoms) getAtomChirality(atom)
// if (haveAlkenes) {
// for(all double bonds) getBondChirality(a1, a2)
// removeUnnecessaryEZDesignations()
// }
// }
//
// getAtomChirality(atom) {
// for (each Rule){
// sortSubstituents()
// if (done) return checkHandedness();
// }
// return NO_CHIRALITY
// }
//
// getBondChirality(a1, a2) {
// atop = getAlkeneEndTopPriority(a1)
// btop = getAlkeneEndTopPriority(a2)
// return (atop >= 0 && btop >= 0 ? getEneChirality(atop, a1, a2,
btop) : NO_CHIRALITY)
// }
//
// sortSubstituents() {
// for (all pairs of substituents a1 and a2) {
// score = a1.compareTo(a2, currentRule)
// if (score == TIED)
// score = breakTie(a1,a2)
// }
//
// breakTie(a,b) {
// score = compareShallowly(a, b)
// if (score != TIED) return score
// a.sortSubstituents(), b.sortSubstituents()
// return compareDeeply(a, b)
// }
//
// compareShallowly(a, b) {
// for (each substituent pairing i in a and b) {
// score = applyCurrentRule(a_i, b_i)
// if (score != TIED) return score
// }
// return TIED
// }
//
// compareDeeply(a, b) {
// bestScore = Integer.MAX_VALUE
// for (each substituent pairing i in a and b) {
// bestScore = min(bestScore, breakTie(a_i, b_i)
// }
// return bestScore
// }
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Loading...