Discussion:
[BlueObelisk-discuss] Building a CML code library
Peter Murray-Rust
2012-02-06 10:37:32 UTC
Permalink
As a follow-up to the workshop we ran on Semantic Physical Science (see
http://blogs.ch.cam.ac.uk/pmr/2012/01/15/semantic-physical-science/ and
later blog posts ) we are publishing a thematic issue in J. Cheminform.
(cf. last years http://www.jcheminf.com/series/semantic_mol_future ).. The
emphasis is on implementing semantic systems particularly in the area of
chemistry and materials science. I will publish the proposed papers in the
next day or so.

One of the papers is:
"Building a CML code library"
In this we formulate the essential and optional components of a software
library to support CML. The scope is ingest, building, modification,
serialization, and validation. There may also be scope for code-specific
dictionaries.

This is being copied to the Blue Obelisk to gather current practice in CML
libraries. This is not limited to Open Source implementations. These
implementations need not be "complete" - in fact the theme of the paper is
to show that subsets can be addressed robustly. The implementations should
be at least show a willingness to address problems of conformance ("Open
Standards" in the BO ODOSOS).

At present I am aware of roughly

JUMBO (PMR-group, Java)
Cameron Neylon, Python
FoX (Andrew Walker et al, FORTRAN90)
CDK Java
cclib ??
Open Babel C++ (Avogadro)
Chem4Word (C#)
InChi-interface (C++), PMR group

I believe that some other implementations exist, not sure of whether there
is an identifiable library/API and conformance
Chemdraw
ACDLabs
Chemaxon

I believe there are others - I just haven't come across them recently

I'd like to make a reasonably comprehensive list and add objective comments
if possible.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Wolf Ihlenfeldt
2012-02-06 10:52:21 UTC
Permalink
Post by Peter Murray-Rust
As a follow-up to the workshop we ran on Semantic Physical Science (see
http://blogs.ch.cam.ac.uk/pmr/2012/01/15/semantic-physical-science/ and
later blog posts ) we are publishing a thematic issue in J. Cheminform.
(cf. last years http://www.jcheminf.com/series/semantic_mol_future )..
The emphasis is on implementing semantic systems particularly in the area
of chemistry and materials science. I will publish the proposed papers in
the next day or so.
"Building a CML code library"
In this we formulate the essential and optional components of a software
library to support CML. The scope is ingest, building, modification,
serialization, and validation. There may also be scope for code-specific
dictionaries.
This is being copied to the Blue Obelisk to gather current practice in CML
libraries. This is not limited to Open Source implementations. These
implementations need not be "complete" - in fact the theme of the paper is
to show that subsets can be addressed robustly. The implementations should
be at least show a willingness to address problems of conformance ("Open
Standards" in the BO ODOSOS).
At present I am aware of roughly
JUMBO (PMR-group, Java)
Cameron Neylon, Python
FoX (Andrew Walker et al, FORTRAN90)
CDK Java
cclib ??
Open Babel C++ (Avogadro)
Chem4Word (C#)
InChi-interface (C++), PMR group
I believe that some other implementations exist, not sure of whether there
is an identifiable library/API and conformance
Chemdraw
ACDLabs
Chemaxon
The Cactvs toolkit has a pretty good implementation, including I/O of
reactions and NMR spectra.
Post by Peter Murray-Rust
I believe there are others - I just haven't come across them recently
I'd like to make a reasonably comprehensive list and add objective
comments if possible.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Wolf-D. Ihlenfeldt - Xemistry GmbH - ***@xemistry.com
Phone: +49 6174 201455 - Fax +49 6174 209665
---
xemistry gmbh – GeschÀftsfÃŒhrer/Managing Director: Dr. W. D. Ihlenfeldt
Address: Hainholzweg 11, D-61462 Königstein, Germany
HR Königstein B7522 : Ust/VAT ID DE215316329 : DUNS 34-400-1719
Peter Murray-Rust
2012-02-06 11:11:12 UTC
Permalink
Post by Wolf Ihlenfeldt
reactions and NMR spectra.
Thanks very much. Is there a public API that we can point to? Is there
anything we can say about CML versions, conformance, etc. (We do not expect
current systems to be conformant to have them included here, but it is
useful to know if the problem has been addressed).
Post by Wolf Ihlenfeldt
Post by Peter Murray-Rust
I believe there are others - I just haven't come across them recently
I'd like to make a reasonably comprehensive list and add objective
comments if possible.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Phone: +49 6174 201455 - Fax +49 6174 209665
---
xemistry gmbh – Geschäftsführer/Managing Director: Dr. W. D. Ihlenfeldt
Address: Hainholzweg 11, D-61462 Königstein, Germany
HR Königstein B7522 : Ust/VAT ID DE215316329 : DUNS 34-400-1719
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Wolf Ihlenfeldt
2012-02-06 15:33:59 UTC
Permalink
Post by Peter Murray-Rust
Post by Wolf Ihlenfeldt
reactions and NMR spectra.
Thanks very much. Is there a public API that we can point to?
It is a shared library for the Cactvs toolkit (filex_cml.so/dll/dylib in
standard distributions) and thus contains a standard I/O module header with
module information and function pointers for the basic required functions
(format identification, file open, record skipping/resync, input, output,
file close). Outside the toolkit, it is probably not very useful (and it
shares a lot of code with the ChemAxon mrv module, the source is actually a
common file with #ifdefs). The module deposits data directly into the
internal data structures of the toolkit.
Post by Peter Murray-Rust
Is there anything we can say about CML versions, conformance, etc. (We do
not expect current systems to be conformant to have them included here, but
it is useful to know if the problem has been addressed).
If you have a test suite, I'd like to run it. Of course we try to keep the
code up to date. We can confidentily state that the code works for
structure and spectra retrieval from NMRShiftDB, and in the context of
KNIME working with CML cells. Generally, output uses the latest published
standards (as far as we are aware of it), while the reader is rather
lenient in what it accepts. As I wrote, if you have test files, please send
them...
Post by Peter Murray-Rust
Post by Wolf Ihlenfeldt
Post by Peter Murray-Rust
I believe there are others - I just haven't come across them recently
I'd like to make a reasonably comprehensive list and add objective
comments if possible.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Phone: +49 6174 201455 - Fax +49 6174 209665
---
xemistry gmbh – GeschÀftsfÃŒhrer/Managing Director: Dr. W. D. Ihlenfeldt
Address: Hainholzweg 11, D-61462 Königstein, Germany
HR Königstein B7522 : Ust/VAT ID DE215316329 : DUNS 34-400-1719
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
--
Wolf-D. Ihlenfeldt - Xemistry GmbH - ***@xemistry.com
Phone: +49 6174 201455 - Fax +49 6174 209665
---
xemistry gmbh – GeschÀftsfÃŒhrer/Managing Director: Dr. W. D. Ihlenfeldt
Address: Hainholzweg 11, D-61462 Königstein, Germany
HR Königstein B7522 : Ust/VAT ID DE215316329 : DUNS 34-400-1719
Egon Willighagen
2012-02-06 12:54:10 UTC
Permalink
Post by Peter Murray-Rust
CDK Java
The CDK uses for CML writing your CMLXOM library. For CML reading, the
CDK still uses the library I started for Jmol and JChemPaint a very
long time ago, which got us in contact originally... this was
somewhere in 1998 I think :)

Please cite for this the matching paper: Willighagen, E. L. Internet
Journal of Chemistry 2001, 4, 4+.

(freely available from:
http://www.scribd.com/doc/14333194/Processing-CML-Conventions-in-Java)

This paper is tracked by Web of Science, and citing this paper when
discussing the CML parser of the CDK is the easiest way to support my
research career :)

Jmol, of course, also parses CML fine, but no longer uses my original
CML library but a more custom parser nowadays.

Egon
--
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
Jean Brefort
2012-02-06 21:52:21 UTC
Permalink
Not sure it is relevant, but I wrote a CML plugin for GChemPaint and
associated programs. It is quite incomplete, but supports molecules (2d
and 3d) and crystal cells, both read/write. Current code is at
http://svn.savannah.nongnu.org/viewvc/trunk/gchemutils/plugins/loaders/cml/cml.cc?revision=1598&root=gchemutils&view=markup

Best regards,
Jean Brefort
Post by Peter Murray-Rust
As a follow-up to the workshop we ran on Semantic Physical Science (see
http://blogs.ch.cam.ac.uk/pmr/2012/01/15/semantic-physical-science/
and later blog posts ) we are publishing a thematic issue in J.
Cheminform. (cf. last years
http://www.jcheminf.com/series/semantic_mol_future ).. The emphasis is
on implementing semantic systems particularly in the area of chemistry
and materials science. I will publish the proposed papers in the next
day or so.
"Building a CML code library"
In this we formulate the essential and optional components of a
software library to support CML. The scope is ingest, building,
modification, serialization, and validation. There may also be scope
for code-specific dictionaries.
This is being copied to the Blue Obelisk to gather current practice in
CML libraries. This is not limited to Open Source implementations.
These implementations need not be "complete" - in fact the theme of
the paper is to show that subsets can be addressed robustly. The
implementations should be at least show a willingness to address
problems of conformance ("Open Standards" in the BO ODOSOS).
At present I am aware of roughly
JUMBO (PMR-group, Java)
Cameron Neylon, Python
FoX (Andrew Walker et al, FORTRAN90)
CDK Java
cclib ??
Open Babel C++ (Avogadro)
Chem4Word (C#)
InChi-interface (C++), PMR group
I believe that some other implementations exist, not sure of whether
there is an identifiable library/API and conformance
Chemdraw
ACDLabs
Chemaxon
I believe there are others - I just haven't come across them recently
I'd like to make a reasonably comprehensive list and add objective
comments if possible.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
Peter Murray-Rust
2012-02-06 22:38:33 UTC
Permalink
Post by Jean Brefort
Not sure it is relevant, but I wrote a CML plugin for GChemPaint and
associated programs. It is quite incomplete, but supports molecules (2d
and 3d) and crystal cells, both read/write. Current code is at
http://svn.savannah.nongnu.org/viewvc/trunk/gchemutils/plugins/loaders/cml/cml.cc?revision=1598&root=gchemutils&view=markup
Thanks - it is very relevant.

P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Marcus D. Hanwell
2012-02-09 14:13:39 UTC
Permalink
Post by Peter Murray-Rust
Post by Jean Brefort
Not sure it is relevant, but I wrote a CML plugin for GChemPaint and
associated programs. It is quite incomplete, but supports molecules (2d
and 3d) and crystal cells, both read/write. Current code is at
http://svn.savannah.nongnu.org/viewvc/trunk/gchemutils/plugins/loaders/cml/cml.cc?revision=1598&root=gchemutils&view=markup
Thanks - it is very relevant.
I also wanted to point out that David Lonie (with minimal guidance
from me as his mentor) added a CML reader to the Visualization Toolkit
(VTK) as part of his Google Summer of Code project last year. I also
have a C++/Qt plugin for Avogadro that reads CML directly and displays
the document structure (not yet merged) and am experimenting with CML
and JSON in the context of the work we are doing at Kitware.

These are very much C++ oriented, with some wrapping in other
languages, and a strong emphasis on libxml2 for the XML parsing. I
have fixed the occasional bug in the Open Babel CML reader too, and
was looking at the possibility augmenting it so that we could produce
newer CML variants.

There are various pieces of code I could point to, and in order to
better use CML with richer output expected from NWChem I expect to
extend our approach significantly.

Marcus

Loading...