E0719

TRANSLATING PDB ENTRIES INTO MMCIF . Philip E. Bourne, San Diego Supercomputer Center, PO Box 85608, San Diego, CA 92186-9785, USA, Herbert J. Bernstein, Bernstein + Sons, 5 Brewster Lane, Bellport, NY 11713-2803, USA and Frances C. Bernstein, Protein Data Bank, Chemistry Dept., Brookhaven National Laboratory, Upton, NY 11973-5000, USA.

The essential steps needed to map Protein Data Bank (PDB) entries into valid mmCIF data sets are discussed. Examples of converting both routine and complex structures using actual PDB entries with the program pdb2cif are given.

The Protein Data Bank format has been used for over 20 years to archive macromolecular data, is produced by many refinement programs, and is used as an input format by many applications. The pending adoption of the mmCIF dictionary by the IUCr, in response to the need to explicitly represent a larger amount of data which can be parsed by computer, (necessary as the number of structures continues to grow exponentially), has made translation from PDB format to mmCIF format a pressing issue.

In this talk we review the techniques needed to move from structures represented in PDB format to mmCIF format. Some data items have direct mapping with minor syntactic adjustment, such as for author names and journal references. Other data items, however, require us to recast our thinking along new lines. For example, the PDB format works with chains and HET groups, while mmCIF uses entities (discrete chemical components). Proper identification of entities in a PDB entry may require looking for sequence homologies. As another example, consider beta sheets. The PDB format treats a bifurcated sheet astwo distinct sheets which happen to have certain strands in common, while mmCIF allows all the strands involved to be represented as a single sheet. This requires strand matching and alignment to go from PDB format to mmCIF. What has currently been automated in pdb2cif and what still requires human intervention will be discussed.

Work supported in part by US NSF grant no. BIR 9310154 (for PEB), US NSF, PHS, NIH, NCRR, NIGMS, NLM and DOE under contract DE-AC02-76CH00016 (for FCB).