Discussion:
[BlueObelisk-discuss] Fwd: [Jmol-users] PDB - OK, who's the wise guy?
Egon Willighagen
2013-01-19 12:03:41 UTC
Permalink
Anyone knows?

---------- Forwarded message ----------
From: Robert Hanson <***@stolaf.edu>
Date: Fri, Jan 18, 2013 at 11:26 PM
Subject: [Jmol-users] PDB - OK, who's the wise guy?
To: "jmol-***@lists.sourceforge.net" <jmol-***@lists.sourceforge.net>


Where does THIS come from? Hex code in a PDB file? Is that spec?

ATOM 99998 H1 TIP3W3304 -28.543 60.673 40.064 1.00 0.00 WT5 H
ATOM 99999 H2 TIP3W3304 -27.773 60.376 41.353 1.00 0.00 WT5 H
ATOM 186a0 OH2 TIP3W3305 -24.713 61.533 47.372 1.00 0.00 WT5 O
ATOM 186a1 H1 TIP3W3305 -25.652 61.772 47.519 1.00 0.00 WT5 H
ATOM 186a2 H2 TIP3W3305 -24.713 61.625 46.379 1.00 0.00 WT5 H

Bob

--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
Chair, Chemistry Department
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900


------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Jmol-users mailing list
Jmol-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jmol-users



--
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
Nina Jeliazkova
2013-01-19 12:46:35 UTC
Permalink
Egon,

I'm not a pdb expert, but was intrigued and it turns out hex numbers
are in this spec :

https://www.schrodinger.com/AcrobatFile.php?type=supportdocs&type2=&ident=530

p.10:
Options for the pdbconvert command.
-hex
Use hexadecimal encoding for atom numbers greater than 99999 and for
residue numbers greater than 9999.


Regards,
Nina
Post by Egon Willighagen
Anyone knows?
---------- Forwarded message ----------
Date: Fri, Jan 18, 2013 at 11:26 PM
Subject: [Jmol-users] PDB - OK, who's the wise guy?
Where does THIS come from? Hex code in a PDB file? Is that spec?
ATOM 99998 H1 TIP3W3304 -28.543 60.673 40.064 1.00 0.00 WT5 H
ATOM 99999 H2 TIP3W3304 -27.773 60.376 41.353 1.00 0.00 WT5 H
ATOM 186a0 OH2 TIP3W3305 -24.713 61.533 47.372 1.00 0.00 WT5 O
ATOM 186a1 H1 TIP3W3305 -25.652 61.772 47.519 1.00 0.00 WT5 H
ATOM 186a2 H2 TIP3W3305 -24.713 61.625 46.379 1.00 0.00 WT5 H
Bob
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
Chair, Chemistry Department
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Jmol-users mailing list
https://lists.sourceforge.net/lists/listinfo/jmol-users
--
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
http://p.sf.net/sfu/learnmore_122912
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
Peter Murray-Rust
2013-01-19 19:40:20 UTC
Permalink
Post by Nina Jeliazkova
Egon,
I'm not a pdb expert, but was intrigued and it turns out hex numbers
https://www.schrodinger.com/AcrobatFile.php?type=supportdocs&type2=&ident=530
This is not a PDB spec (PDB=Protein Data Bank) but a proprietary document
by the Schroedinger Company. This is probably "nearly" PDB but with
proprietary extensions. The formal PDB spec
http://deposit.rcsb.org/adit/docs/pdb_atom_format.html describes these five
character as "Integer". I doubt very much whether RCSB would agree that
Schroedinger's document represented their spec.

This proliferation of unauthorised mutant documents simply pollutes and
destroys the quality interchange of chemical information.It happened with
SMILES which is why community efforts, such as Open SMILES are important.
The only PDB spec we should use id RCSB's.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Robert Hanson
2013-01-19 22:00:35 UTC
Permalink
Incredible! In the same file:


Well, I heartily agree. This is a VERY SHORT SIGHTED idea. In the same file:


ATOM 19999 OH2 TIP3W2742 -7.467 -15.016 7.560 1.00 0.00
WT1 O
ATOM 20000 H1 TIP3W2742 -7.659 -14.310 8.177 1.00 0.00
WT1 H

ATOM 1ffff OH2 TIP3W7613 14.728 120.645 53.959 1.00 0.00
WT8 O
ATOM 20000 H1 TIP3W7613 15.610 120.717 54.253 1.00 0.00
WT8 H

...You've got to be kidding!


Mind you, this was in NAMD, not Gaussian.
On Sat, Jan 19, 2013 at 12:46 PM, Nina Jeliazkova <
Post by Nina Jeliazkova
Egon,
I'm not a pdb expert, but was intrigued and it turns out hex numbers
https://www.schrodinger.com/AcrobatFile.php?type=supportdocs&type2=&ident=530
This is not a PDB spec (PDB=Protein Data Bank) but a proprietary document
by the Schroedinger Company. This is probably "nearly" PDB but with
proprietary extensions. The formal PDB spec
http://deposit.rcsb.org/adit/docs/pdb_atom_format.html describes these
five character as "Integer". I doubt very much whether RCSB would agree
that Schroedinger's document represented their spec.
This proliferation of unauthorised mutant documents simply pollutes and
destroys the quality interchange of chemical information.It happened with
SMILES which is why community efforts, such as Open SMILES are important.
The only PDB spec we should use id RCSB's.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
http://p.sf.net/sfu/learnmore_122912
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
Chair, Chemistry Department
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Andrew Dalke
2013-01-20 12:08:34 UTC
Permalink
The only PDB spec we should use [is] RCSB's.
What you've written is not possible, unless you have a PDB file which
came directly from the RCSB. Certain fields are "mandatory" but can only
be filled in by the RCSB, or are widely ignored during export. Other
fields do not have enough space to handle what people reasonably want.

1) For example, the PDB specification says:

All records in a PDB coordinate entry must appear in a defined order.
Mandatory record types are present in all entries.
...
HEADER Mandatory
...
1 - 6 Record name "HEADER"
11 - 50 String(40) classification Classifies the molecule(s).
51 - 59 Date depDate Deposition date. This is the date the
coordinates were received at the PDB.
63 - 66 IDcode idCode This identifier is unique within the PDB.


If you wish to generate a PDB file for a structure that the PDB
doesn't know about, then what do you put for the depDate and idCode?

For kicks, I pulled up my copy of the PDB format from 1974. It says that
the PDB file has: COMPND, AUTHOR, CRYST1, DECODE, REMARK

2) Certain other fields are also mandatory, but widely ignored by
structure-based PDB writers. For example,

SEQRES Mandatory Mandatory if ATOM records exist.

A quick search of the CDK shows that it is one of the many packages which do
not write the required SEQRES on output, so does not generate a valid PDB file,
as defined by the specification.

However, not including the required HEADER and SEQRES records is perfectly
acceptable by the de facto community standard, which is:

- follows the PDB spec in the domain of applicability
- other programs know how to read and write it, within that same domain


3) I wrote "domain of applicability" because there are structures which cannot
be represented in the PDB format. Some are too large, and run into the atom
serial number limit. As Egon discovered, this limits the PDB file to 99999
atoms.

Indeed, there are some submissions to the PDB which are too large, so the
actual submission is broken across two or more different PDB records.

The PDB solution is to use the "SPLIT" header, which lists the PDB entries
for all of the PDB records which make up the full submission. For example,
the entries 2J00, 2J01, 2J02, and 2J03 must be combined to make the
"Structure of the 70S ribosome complexed with mRNA and tRNA."

This solution is not available to any organization other than the RCSB.
(Ie, how do you make a new ID? How does your software know how to map
the new ID to a file? When happens when you run out of identifiers?)

What should a program do if the user asks to save a structure as "PDB"
format and the structure cannot be represented in PDB format? Should it
refuse to save the structure? Or should it switch to an application-
specific extension? (I will not say that a documented extension which
is publicly readable and which has no constraints on uptake by others
is "proprietary".)

Schrodinger decided to switch to hex-encoded numbers. This lets them
handle up to 1,048,574 atoms.

X-PLOR (back in the 1980s) decided to treat the first character as
being in base 36, so that "99999" was followed by "A0000", "A0001",
etc. When I wrote VMD's PDB parser in the 1990s, I followed the X-PLOR
convention. This let us handle up to 359,963 atoms. This is the -hybrid36
option in the Schrodinger document which Nina dug up.

Both are *wrong* according to the PDB spec. But we didn't care about
that detail because we wanted to support >99,999 atoms, and the PDB
offered no guidance about how to handle this case, nor is guidance
for this in their mandate.

In any case, this field can sometimes be ignored on input. The PDB requires
(or required - I don't see it in the current spec) that atom serial
numbers be strictly serial. It used to be (pre-1992) that missing serial
numbers could indicate missing atoms. In any case, the serial number is
only important if you want to use other parts of the format to identify
a given atom. If you're working with large scale structures (e.g., in MD)
and don't care about structure annotations, then this field can be ignored.



Like the UK's unwritten constitution, there is no written
specification describing the de facto 'PDB exchange' format,
much less one which provides guidance about the various PDB
extensions which different pieces of software use.

Some 15 years ago, I started such a project. I quickly gave up,
in part because I moved away from macromolecular structures, and
in part because it's not easy. I can't say that it would be worth
the effort should someone here decide to take it up.


In any case, we're left with the reality that we can't say that "the
only PDB spec we should use is RCSB's", because the only people who
can follow the RCSB's PDB specification is the RCSB itself.

Not only that, but:

This proliferation of unauthorised mutant documents simply pollutes
and destroys the quality interchange of chemical information.

is incorrect, because the only reason the PDB is so common is because
people decided to not follow the strict letter of the Brookhaven/RCSB
PDB specification but instead to follow the spirit.

This it works because PDB-as-coordinate-file (which is all most people
care about) is different and simpler than PDB-as-structure-annotation-file,
which is itself different than deposition-as-PDB-files-curated-by-the-RCSB.
The PDB specification is *only* for the last use case, but it's been
expropriated for other uses.


Andrew
***@dalkescientific.com
Andrew Dalke
2013-01-21 02:14:46 UTC
Permalink
Post by Andrew Dalke
For kicks, I pulled up my copy of the PDB format from 1974. It says that
the PDB file has: COMPND, AUTHOR, CRYST1, DECODE, REMARK
Err, that was supposed to be deleted. I found that I didn't have a complete
spec, dug through the old PDB newsletters, looked for old printouts in my
files, couldn't find it, got distracted, and managed to forget to delete
these two lines.

So I don't know if the 1970s spec says that those files were supposed to
be in every PDB file, but I'm pretty certain that they were in every file
that they distributed.

Andrew
***@dalkescientific.com
Robert Hanson
2013-01-21 23:09:35 UTC
Permalink
Thanks, Andrew, that history helps. And I must say, the hybrid-36 scheme
really is very clever. You would probably like the base-90 scheme I cooked
up for the JVXL surface file format.

Unlike Peter, I have no problem with custom specifications, as long as they
are clear, unambiguous, and published. Something like this should not have
to be "discovered" 100,000 lines into a file. It would have been immensely
helpful if there had been a required REMARK record to the effect that atom
numbers/residue numbers were in some specific format. Far too often someone
comes up with a solution to their problem for their program and implements
it without ever considering how it might impact other programs. As it is,
Jmol, for instance, with some tweaks, can now read a file with both the
hybrid-36 and Schroedinger's HEX solution (upon the next release), but it
involves additional checking just to be able to be on the look-out for this
issue, and that potentially slows down all PDB file reading by Jmol (or any
other program).

Bob
Post by Andrew Dalke
Post by Andrew Dalke
For kicks, I pulled up my copy of the PDB format from 1974. It says that
the PDB file has: COMPND, AUTHOR, CRYST1, DECODE, REMARK
Err, that was supposed to be deleted. I found that I didn't have a complete
spec, dug through the old PDB newsletters, looked for old printouts in my
files, couldn't find it, got distracted, and managed to forget to delete
these two lines.
So I don't know if the 1970s spec says that those files were supposed to
be in every PDB file, but I'm pretty certain that they were in every file
that they distributed.
Andrew
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Blueobelisk-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
Chair, Chemistry Department
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Loading...