Peter Murray-Rust
2011-06-05 10:24:32 UTC
**Copied to both Blue Obelisk and Quixote lists. please be careful in
replying to both and please keep to topic **
Background
=========
The Quixote project (http://quixote.wikispot.org/) is an Open Source/Data
project to create and capture compchem calculations in semantic form and
make them available Openly to all. We have now (Friday) released our
prototype repository at: http://quixote.ch.cam.ac.uk/content/index.html We
expect this to include tens of thousands of contributed logfiles and their
semantic (CML) comversions. There is already feedback and questions being
asked.
The Blue Obelisk (
http://sourceforge.net/apps/mediawiki/blueobelisk/index.php?title=Main_Page)
has created and maintains many major Open Source codes and Open Data/Open
Standards resources. It has set up a site
http://blueobelisk.shapado.com/where anyone can ask and answer
questions on Open (Data/Source/Standards).
This has proven its worth and has many contributors. The software is an Open
version of the very successful Stack Overflow (etc.) sites for software
questions (http://stackoverflow.com/ )
Requirement
==========
Many people involved in compchem (users, consumers, coders, interfacers,
etc.) have questions that cannot be easily answered. There are many reasons:
* novice status - don't know where to look or who to ask
* unexpected program behaviour (PMR had this with GAMESS-US - lines > 80
chars corrupt silently)
* undocumented behaviour or output
* questions about strategy
etc.
Here are some questions I would like to ask:
* why are there only 7 Fukui components in NWChem (I expected 8)?
* what does "Stoichiometry" mean in Gaussian logfiles?
* what level of optimization is required before calculating 13C NMR shifts?
* why does this GAMESS-US input fail to give the right number of atoms?
* can NWChem calculate chemical shifts?
* which open Source codes do QMMM?
Shapado approach
===============
Although Open sites in chemistry have very variable uptake I think that
compchem is an exception and will work extremely well.
* its scope is well-bounded (it's relatively easy to see whether something
is compchem or not)
* there is a very large number of practitioners at all levels
(undergraduate, method user, method developer, central facility - e.g. Grid,
supercomputer)
* there are tens of millions of jobs run each year
There are several very well understood axes of classification (by method, by
basis, by code, by strategy, by compound type, etc .). Therefore is is
relatively easy to see which questions nave already been asked? It's easy to
browse through (say) all the GAMESS-UK questions.
StackOverflow is incredibly powerful and I get answers with minutes. Here's
an example from PMR (
http://stackoverflow.com/questions/5879546/parsing-dates-with-variable-spaces).
I think we could fairly soon get a critical mass of questions and
answers. The SO experience shows that it covers the whole area from
undergraduate to unaswered research problems (there's a convention that
"homework" questions are treated sympathetically but that you don't give the
full answer to start with ("plz send me teh codez")).
Proposal
=======
We create a Shapado for computational chemistry (QC, MD and similar) which
answers any questions (not just those on Open Source). [I am assuming this
does not already exist]. The Openness comes from the Open discussion and the
desire to create Open practices and resources but it should follow the SO
practice that any codes, any data, any workplace is welcome. (Product
information in response to genuine questions is allowed but not product
placement - advertising). It must be Open in that the site should not be
"closable" later as happens with many free-as-in-beer sites. I am assuming
that it will have a good membership of BO+Q members but anyone can play.
Action
=====
We need to know what is involved in setting up such a site and running it. I
know that Egon has been heavily involved - who else. I'm guessing that there
would be Quixotans who can get started.
As long as your mails are in scope of this question and the way forward is
being debated it's probably a good idea to copy both lists. There will come
a time when we need to change this.
P.
replying to both and please keep to topic **
Background
=========
The Quixote project (http://quixote.wikispot.org/) is an Open Source/Data
project to create and capture compchem calculations in semantic form and
make them available Openly to all. We have now (Friday) released our
prototype repository at: http://quixote.ch.cam.ac.uk/content/index.html We
expect this to include tens of thousands of contributed logfiles and their
semantic (CML) comversions. There is already feedback and questions being
asked.
The Blue Obelisk (
http://sourceforge.net/apps/mediawiki/blueobelisk/index.php?title=Main_Page)
has created and maintains many major Open Source codes and Open Data/Open
Standards resources. It has set up a site
http://blueobelisk.shapado.com/where anyone can ask and answer
questions on Open (Data/Source/Standards).
This has proven its worth and has many contributors. The software is an Open
version of the very successful Stack Overflow (etc.) sites for software
questions (http://stackoverflow.com/ )
Requirement
==========
Many people involved in compchem (users, consumers, coders, interfacers,
etc.) have questions that cannot be easily answered. There are many reasons:
* novice status - don't know where to look or who to ask
* unexpected program behaviour (PMR had this with GAMESS-US - lines > 80
chars corrupt silently)
* undocumented behaviour or output
* questions about strategy
etc.
Here are some questions I would like to ask:
* why are there only 7 Fukui components in NWChem (I expected 8)?
* what does "Stoichiometry" mean in Gaussian logfiles?
* what level of optimization is required before calculating 13C NMR shifts?
* why does this GAMESS-US input fail to give the right number of atoms?
* can NWChem calculate chemical shifts?
* which open Source codes do QMMM?
Shapado approach
===============
Although Open sites in chemistry have very variable uptake I think that
compchem is an exception and will work extremely well.
* its scope is well-bounded (it's relatively easy to see whether something
is compchem or not)
* there is a very large number of practitioners at all levels
(undergraduate, method user, method developer, central facility - e.g. Grid,
supercomputer)
* there are tens of millions of jobs run each year
There are several very well understood axes of classification (by method, by
basis, by code, by strategy, by compound type, etc .). Therefore is is
relatively easy to see which questions nave already been asked? It's easy to
browse through (say) all the GAMESS-UK questions.
StackOverflow is incredibly powerful and I get answers with minutes. Here's
an example from PMR (
http://stackoverflow.com/questions/5879546/parsing-dates-with-variable-spaces).
I think we could fairly soon get a critical mass of questions and
answers. The SO experience shows that it covers the whole area from
undergraduate to unaswered research problems (there's a convention that
"homework" questions are treated sympathetically but that you don't give the
full answer to start with ("plz send me teh codez")).
Proposal
=======
We create a Shapado for computational chemistry (QC, MD and similar) which
answers any questions (not just those on Open Source). [I am assuming this
does not already exist]. The Openness comes from the Open discussion and the
desire to create Open practices and resources but it should follow the SO
practice that any codes, any data, any workplace is welcome. (Product
information in response to genuine questions is allowed but not product
placement - advertising). It must be Open in that the site should not be
"closable" later as happens with many free-as-in-beer sites. I am assuming
that it will have a good membership of BO+Q members but anyone can play.
Action
=====
We need to know what is involved in setting up such a site and running it. I
know that Egon has been heavily involved - who else. I'm guessing that there
would be Quixotans who can get started.
As long as your mails are in scope of this question and the way forward is
being debated it's probably a good idea to copy both lists. There will come
a time when we need to change this.
P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069