This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
[IUCr Home Page] [CIF Home Page] [mmCIF Home Page]

Keep coordinate listing brief

Clifford Felder,Chem.Phys.,Weizmann Inst.,tel.8343759 (felder@sgjs2.weizmann.ac.il)
Mon, 7 Nov 1994 09:58:43 +0200


Dear Macromolecular structure community,
I am a technical research assistant working with Prof. Joel Sussman,
currently head of the PDB. I approach the question of formatting and
organizing molecular coordinate information from the point of view of a
molecular modeller and programmer, rather than as a database specialist.

Compared with the new proposed mmCIF and ZINC standards, the existing
classical PDB format is remarkably well suited to being read and written by
modelling applications, as well as being conservative on storage and network
transmission resources, owing to its compactness. Indeed, it has apparently
become the accepted standard among the vast majority of modelling packages.

While the so-called "HEADER" portions of PDB entries should rightfully be
organized in a manner most conducive to on-line data management, search and
retrievel, the so-called "ATOM" portions (eg. the ATOM, HETATM and TER
records) should be retained in their present format, or something very
similar. These latter record types both are much less likely to be the
subjects of direct data-searches (Any searches concerning molecular geometry
most likely would require some sort of intermediary software to derive it
from the coordinates.) and constitute the bulk of most macromolecular data
entries.

However, some minor modifications might be considered, to allow atom names to
five characters, residues to four characters, and the X-, Y- and
Z-coordinates to six decimal places in place of the existing three. The
less-important atom number and comment fields could be moved to the right,
beyond column 80.

Indeed, Prof. Sussman has remarked, and I agree, that, no matter what format
is adopted, every given data item should be presented on a single line, even
if that line extends beyond column 80. Nowadays, having longer line lengths
does not present the kinds of problems it once did, and several Unix and ed
tools such as 'sort' and 'grep' operate on the basis of single lines, and
cannot handle multi-line formats. Even scripts and macros are much harder to
write for multi-line formats.

-- 


Sincerely yours, Clifford Felder <felder@sgjs2.weizmann.ac.il>