Biological databases

[PDB logo]
In structural biology we have is one of the oldest established biological databases, the Protein Data Bank. Incredibly it was effectively established in 1971 when only a handful of atomic resolution crystals had been solved. What catalyzed its formation were advances in computer graphics, which, despite having no more than 32k of CPU memory to work with, allowed molecules of the size of proteins to be displayed relatively easily. For the graphic programs to work effectively across the entire database, the files needed to be in a standard format, including such basic concerns as whether the coordinates were measured in centimeters or inches. With the establishment of the PDB it was possible to look at any of the known protein structures in the same way.

[PDB booth] Masami Kusunoki, Kyle Burkhardt, Bohdan Schneider, and Wolfgang Bluhm at the PDB booth at the IUCr meeting.
The PDB moved from a centralized location at the Brookhaven National Lab to a consortium encompassing three sites. The content has increased from a few structures to well in excess of 15,000 today, and is on course to have ~45,000 entries by 2005. The completeness of the PDB has been assured by the policy of journals, including Nature Structural Biology, to demand that structures cannot be published in their pages without simultaneous deposition. The accuracy of the database is policed by suites of checking programs to detect errors in coordinates when they are submitted.

There is much to be gained by establishing compatible formats and ontology across a range of databases. With compatibility, pieces of software called ‘intelligent agents’ will be able to extract related information from any and all databases in which it may reside. Such standardization would almost abolish the need for databases to be comprehensive because data would be available from more than one source. Comparing information gleaned from several sources could provide an assessment of the accuracy of those data. By standardizing the way atomic coordinates are documented, the PDB has proven an invaluable boon to structural biology. How much more useful to bioinformatics, in all its guises, would be standardization of the databases themselves?

Nature Structural Biology, Vol. 9, No. 7, July 2002