Bookmark and Share

Structural Informatics

Speakers at the ACA Transactions Symposium on Structural Informatics. Back row (left to right): Garland Marshall, Sean Eddy, Andrej Sali, Otto Ritter and Philip Bourne, front row (left to right): Gary Gilliland, Helen Berman, Shoshana Wodak, Stephen Bryant, Janet Thornton and Suzanne Fortier.

The subject of this year's ACA Transactions Symposium was the development of specialist databases that combine the information stored in primary archives, such as the Protein Data Bank and the Cambridge Structural Database, with biological data about the molecules of interest.

The Protein Kinase Resource (PKR; http://www.sdsc.edu/kinases/) is one example of this type of highly specialized resource. The PKR, a joint project of the San Diego Supercomputer Ctr. and the Chem. Dept. of the U. of California, San Diego, covers in great detail a single protein family important in signal transduction. The database contains multiple sequence alignments and family classification based on sequence, structure annotation, comparison. and conformational analysis, and information on investigators, meetings of interest, and literature references. The PKR is a model automatically maintained database and query engine that can be used for a variety of protein families based on dictionaries for enzymology, sequence features, tables, and overall family classification, and is developed in STAR/CIF which complements the existing mmCIF dictionary.

A database dedicated to providing structural information about HIV protease (HIV PR; http://www-fbsc.ncifcrf.gov/HIVdb/) has beeri created at the Nat'l Cancer Inst. and contains structural data for three PR variants. namely HIV-1, HIV-2, and simian immunodeficiency virus PRs. The HIV PR database will be a source of all available structures in a unified and fully annotated format. The "Informal" part of the database contains information about both the protein and inhibitors present in the complexes of HIV PR, as well as the original sets of coordinates. Descriptions of the complexes, database-unique labels, PDB names, descriptions of the inhibitors, and references are in the main table which is the gateway between the more detailed information about the PRs and the data on inhibitors. The latter includes chemical formulas, two-dimensional and three-dimensional models, detailed description of the compounds, and conditions of the Ki measurements. The "Analytical" part of the database is organized either by services giving access to various tools or by results of specific analyses. When completed, this part of the database will provide tools for the analysis of the structures.

One of the first specialist structural databases, the Nucleic Acid Database (NDB; http://ndbserver.rutgers.edu/) was established as a resource for the nucleic acid community. Data are organized in a searchable relational database that contains primary information about the crystallographic experiment and information about geometric features. The NDB includes an illustrated atlas of nucleic acid structures with crystal packing pictures, bibliographic  references, and standard dictionaries for nucleic acid components.The NDB is a test bed for new database and information technology including the use of mmCIF as its exchange format. More recently, the NDB Project has transferred its technologies for archiving and querying nucleic acid structures to create a more general tool called Protein Finder. Protein Finder enables the user to search for structures contained in the PDB and to interactively create reports based on the PDB file. These are but a few examples of the value-added databases that continue to emerge as scientists combine their interests in particular research areas with new computer technologies.

H. Berman, P. Bourne and A. Wlodawer