Discussion:
[BlueObelisk-discuss] New OpenSMILES specification posted
Craig James
2012-12-14 17:25:06 UTC
Permalink
Thanks to Tim, there is a new version of the OpenSMILES specification, in
both HTML and PDF formats:

http://www.opensmiles.org

Tim did a great deal of reformating with nice style sheets, and there is a
whole new section on stereo centers.

I'm leaving on vacation so I haven't had time to review it, but I invite
everyone to read it and make comments.

This message is cross-posted to several mailing lists. If you're on the
BlueObelisk-SMILES list, please reply to that list. If you're not on that
list but have comments, post them where you can; perhaps someone can
cross-post them to the BO-SMILES list.

Craig
Wolf Ihlenfeldt
2012-12-14 18:17:41 UTC
Permalink
I am absolutely convinced that specifying the aromatic/lowercase form
as preferred output style is a really, really bad decision.

Reconstructing a proper Kekule form from complex multi-ring structures
with multiple N in mixed pyridine/pyrrole roles in combination with
implicit hydrogen quickly becomes a pretty hard problem. Determining
aromaticity using whatever model from a Kekule form is much easier.

Also, the lowercase form implicitly encodes a specific aromaticity
model, which can easily lead to a mismatch when another batch of
structures that was encoded with another aromaticity model. is merged
into a dataset

Nobody uses canonic SMILES for structure identity checks because there
are so many canonicalization variants, and much more robust methods
exist, so the argument about having to choose an exact location of
single and double bonds and thus introducing ambiguity has no real
merit.
Post by Craig James
Thanks to Tim, there is a new version of the OpenSMILES specification, in
http://www.opensmiles.org
Tim did a great deal of reformating with nice style sheets, and there is a
whole new section on stereo centers.
I'm leaving on vacation so I haven't had time to review it, but I invite
everyone to read it and make comments.
This message is cross-posted to several mailing lists. If you're on the
BlueObelisk-SMILES list, please reply to that list. If you're not on that
list but have comments, post them where you can; perhaps someone can
cross-post them to the BO-SMILES list.
Craig
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Wolf-D. Ihlenfeldt - Xemistry GmbH - ***@xemistry.com
Phone: +49 6174 201455 - Fax +49 6174 209665
---
xemistry gmbh – Geschäftsführer/Managing Director: Dr. W. D. Ihlenfeldt
Address: Hainholzweg 11, D-61462 Königstein, Germany
HR Königstein B7522 : Ust/VAT ID DE215316329 : DUNS 34-400-1719
Peter Murray-Rust
2012-12-15 18:50:54 UTC
Permalink
[Reply to BO list]
Post by Craig James
Thanks to Tim, there is a new version of the OpenSMILES specification, in
http://www.opensmiles.org
Tim did a great deal of reformating with nice style sheets, and there is a
whole new section on stereo centers.
I am very pleased to see progress on Open SMILES - the closed nature of
previous SMILES implementations has probably cost hundreds of millions of
dollars if not more in having incompatible information. Open specs are
critical and the Neelie Kroes in the EC is leading a very strong agenda on
OSS and standards.

"If you can build a molecule from a modeling kit, you can name it."

This is an interesting and largely true statement but with qualifications.
Names are categorial and discrete - geometry is often continuous. I agree
that Constitution - through graphs - seems to be largely discrete (but
breaks down where the existence of bonds is subjective or variable).
However Configuration can be continuous and Conformation frequently is. So
I'd argue that there should be a pragmatic line drawn - and it's mainly
that SMILES, InChI (which is isomorphic) and constitutional graphs start to
degrade when stepping outside mainstream organic chemistry.

Of the extensions my votes would be for:
* R-groups on static bonds (i.e. not to ring centres) - yes, we need this
* crystals - absolutely not. DW tried to do this without understanding
crystals. It can never work
* polymers - gets tricky very quickly. There are almost always some free
variables (n's and m's)
* twisted SMILES. Please No. Yes, it works for chair cyclohexane. Almost
everything else it gets messy very quickly. Geometry does not behave
prettily

Physical properties attached to molecules and parts of molecules. No. The
concept and the syntax aren't designed for this. InChI is in danger of
trying to canonicalize real numbers done by machines. It can't be done.

If you think this is too conservative, ask how you represent "Aluminium
Chloride".

P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Loading...