TRANSLATING MMCIF DATA INTO PDB ENTRIES . Frances C. Bernstein, Protein Data Bank, Chemistry Dept., Brookhaven National Laboratory, Upton, NY 11973-5000, USA and Herbert J. Bernstein, Bernstein + Sons, 5 Brewster Lane, Bellport, NY 11713-2803, USA.
The major steps needed to translate mmCIF data into a "pseudo-PDB" format (a format sufficiently similar to standard PDB format to be accepted by most applications) are presented, with examples drawn from the program cif2pdb. The objective is to help application developers and people writing CIFs understand uses of mmCIF which will hinder translation to PDB format and to help users familiar with PDB format understand new mmCIF constructs.
The Protein Data Bank format has been used for over 20 years to archive macromolecular data, is produced by many refinement programs and is used as an input format by many applications. Adoption of the mmCIF dictionary by the IUCr will lead to the creation of a significant pool of mmCIF data sets. However, it may be some time before existing application programs can handle mmCIF input. Therefore it is important to have facilities to translate mmCIF data into PDB format to facilitate the use of CIFs with existing programs. The PDB is developing a CIF-based AutoDep program,which uses the WWW. In those areas where there is a one-to-one correspondence, AutoDep will use mmCIF tokens and produce the appropriate PDBrecords. However, there are areas where more complex transformations are needed and in user labs it is not always possible, or even desirable, to make a perfect PDB entry from an mmCIF data set. If the only purpose of the translation is to display a molecule, then there may be no reason to reorganize residues and het groups and change atom names to match PDB standards. We discuss both the production of a "pseudo-PDB" format which can be converted into a rigorous PDB format with further processing, and the capabilities of the PDB AutoDep program. We consider ways to construct a valid PDB ATOM/HETATM/TER list, and approaches to deal with mmCIF tag values for chain identifiers and atom names which may not fit into PDB field sizes.
Work supported in part by US NSF, PHS, NIH, NCRR, NIGMS, NLM and DOE under contract DE-AC02-76CH00016.