Feature article

PDB Annual Report 2001

The PDB was founded in 1971 as the international repository for three-dimensional structure data of biological macromolecules. The PDB processes, stores, and disseminates structural coordinates and related information about proteins, nucleic acids, carbohydrates, and protein-nucleic acid complexes. The resource distributes information about all aspects of structural biology, including structural genomics, data representation formats, software, and educational materials. Currently in an average month, approximately 260 structures are deposited, 200 structures are released, and 2.6 million files of individual structure entries are downloaded from the PDB. The PDB’s Data Uniformity Project enhances the consistency of existing (legacy) entries and maintains a consistent method of annotating current depositions.

All legacy PDB entries and the recent RCSB entries are available in mmCIF format from the PDB beta FTP site at ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmcif/. The files follow the latest version of the mmCIF dictionary supplemented by an exchange dictionary developed by the PDB and the EBI. This exchange dictionary can be obtained from http://deposit.pdb.org/mmcif/. An application program called CIFTr was made available for translating files in mmCIF format into files in PDB format. CIFTr works on UNIX platforms, and can be downloaded at http://deposit.pdb.org/software/ (see below for more information).

Several new searching functions have been released. It is now possible to search by the number of chains on a structural backbone or in a complex. Users can also search by source organism, including synonyms and common names, so that searches on 'human' and 'Homo sapiens' return the same entries. The key word search function now performs both exact and partial word matching, and it is possible to query on the titles of entries. A new interface can reveal how many structures exist at some level in the Enzyme Commission (EC) hierarchy. This required the accurate assignment of enzyme numbers to all relevant PDB structures and integration of EC nomenclature into the database system. Remediated files, related software, and update notices are archived at the Data Uniformity Project Web page at www.rcsb.org/pdb/uniformity/index.htmal. The home page has been revised to emphasize mirror sites, to permit both keyword and PDB ID searching, and to improve access to documents, format descriptions, and other materials.

CIFTr, a tool used by PDB staff in data processing that translates from mmCIF to PDB format, has been released publicly. The program works on UNIX platforms, and can be downloaded at http://deposit.pdb.org/software/.

The validation software used by ADIT to run checks on structures as a part of primary data processing and as part of data uniformity has been compiled into a suite of programs available for download. Designed to work with files in mmCIF or PDB format, the beta version of this validation software can be downloaded in binary form for SGI, SUN, and Linux platforms from http://deposit.pdb.org/software/. Reports produced include an Atlas entry, a summary report, and a collection of structural diagnostics including bond distance and angle comparisons, base morphology comparison (for nucleic acids), and molecular graphic images. In addition, reports from PROCHECK, NUCHECK, SFCHECK are also made available. The PDB has compiled a variety of structural genomics links at www.rcsb.org/pdb/strucgen.html

Taken from the PDB Annual Report

IUCr Newsletter

Feature article

PDB Annual Report 2001