Feature article

The Worldwide Protein Data Bank: safeguarding an indispensable archive

[wwPDB logo]

July 1, 2013, marks the 10-year anniversary of the founding of the Worldwide Protein Data Bank (wwPDB; wwpdb.org), the international collaboration that manages the PDB archive [1].

 

 

From modest beginnings

[Subtilisin] Subtilisin, one of the first structures deposited to the PDB archive. 1sbt, R. A. Alden, J. J. Birktoft, J. Kraut, J. D. Robertus, C. S. Wright. (1971) Atomic coordinates for subtilisin BPN′ (or Novo). Biochem. Biophys. Res. Commun. 45, 337-344.

Starting from just 7 protein crystal structures in 1971, the PDB archive has grown rapidly over the past 42 years. Last year alone, 9,972 new structures were deposited, more than in the first 25 years of the PDB combined. Today, the archive contains over 90,000 structures and at its current rate of growth will reach the 100,000 structure mark in 2014, the International Year of Crystallography.

On July 1, 2003, the way in which the PDB archive was managed was transformed by the founding of the Worldwide Protein Data Bank organization. From its inception, the PDB has been an international archive and the establishment of the wwPDB ensured that these valuable data will continue to be stored, managed and kept freely available for the benefit of scientists worldwide.

The wwPDB organization nowadays consists of four partners: the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB; http://rcsb.org) and BioMagResBank (BMRB; http://bmrb.wisc.edu) in the USA, the Protein Data Bank in Europe (PDBe; http://pdbe.org) and the Protein Data Bank Japan (PDBj; http://pdbj.org).

wwPDB milestones through the years

 
[Timeline] Selected highlights from the first 10 years of the wwPDB. 2003: The wwPDB is established. 2004: First PDB entry from Africa released (1ydk[2]). 2006: BMRB joins the wwPDB. 2007: First archive-wide remediation includes improved representation of virus assemblies. 2008: The PDB archive passes the 50,000 entry mark. 2009: Second release of remediated data includes improved representation of binding sites of ligands and metal ions (shown: 1rjk[6]). 2011: Historic symposium commemorating 40 years of the archive held at Cold Spring Harbor Laboratory; the PDB is the oldest surviving electronic archive of biomolecular data. 2012: PDB data and EMDB maps become part of the same ftp tree, simplifying distribution of these two important structural archives (shown: EMD-5476[7]). 2013: Testing of the next-generation wwPDB system for deposition and annotation begins.

2003 wwPDB established.

2004 First X-ray crystal structure from Africa is released in the PDB (1ydk[2]).

2005 PDB data available in PDBML/XML format.

2006 BMRB joins the wwPDB.

2007 First archive-wide remediation includes updated sequence information and primary citations, improved representation of virus assemblies, and standardized chemistry and nomenclature for monomers and ligands.

2008 50,000 entries in the PDB archive. Experimental data are mandatory for deposition. X-ray Validation Task Force (VTF) is convened[3].

2009 Second release of remediated data includes details about the chemistry of polymers and the ligands bound to it, biological assemblies, and binding sites of ligands and metal ions. NMR VTF is convened. Deposition of chemical shifts mandatory for NMR structures.

2010 Provision of wwPDB validation reports becomes a requirement for manuscript submission, starting with the IUCr journals. 3DEM VTF is convened[4].

2011 PDB40 symposium commemorating 4 decades of the archive held at Cold Spring Harbor Laboratory; the PDB is now the oldest electronic archive of biomolecular data. At a wwPDB workshop, the major developers of X-ray structure-determination software agree to adopt PDBx/mmCIF as the principal format for structure deposition.

2012 PDB data and EMDB maps become part of the same ftp tree, simplifying distribution of these two important structural archives. SAS Task Force is convened[5].

2013 Testing of the next-generation wwPDB system for deposition and annotation begins. 10,000th NMR structure is released. PDBx/mmCIF is announced as the future standard format for deposition and distribution of PDB data. Updated wwPDB Charter goes into effect on July 1, starting the second decade of the wwPDB.

wwPDB activities

The wwPDB partner sites each act as deposition, processing and distribution centres for PDB data. They work together and in consultation with the wider community to define deposition and annotation policies, file formats and validation standards for structural data. This close collaboration between the member organizations is vital to guarantee that the global community of PDB users is provided with reliable and consistent data.

While working jointly on all aspects of data representation and processing, each partner site also offers independent tools and services that help make the wealth of data about biomacromolecular structure and function easily accessible to the user community.

[PDB members] Members of the PDB, past and present, in attendance at the PDB40 symposium. www.wwpdb.org/PDB40.html (Photo by Constance Brukin)

wwPDB activities are overseen by an international advisory committee comprising experts in X-ray crystallography, 3DEM, NMR and bioinformatics.

Future Challenges

The increasing volume, diversity and complexity of biological data being deposited in the PDB, and the emergence of hybrid techniques to obtain structural insights into biologically relevant molecules, complexes and molecular machines, all present major challenges for the management and presentation of structural data.

To address these challenges, the wwPDB partners are jointly developing a software system that will allow deposition, validation and annotation of complex and diverse macromolecular structures along with the underlying experimental data using a single interface. This new system will go into full production at all the wwPDB deposition sites early in 2014 and will then be able to handle depositions of structures of any size, determined using diffraction, NMR and/or EM methods.

Validation will be an integral part of the new deposition and annotation system. Assessment of coordinates, experimental data and associated metadata at the time of deposition is vital for improving the quality of the archive. In addition, it will help users with no or limited structural biology background to select the most appropriate structural models for their purposes.

Whatever new challenges the next 10 years will bring, the wwPDB will remain committed to maintaining high standards of quality, integrity and consistency of the macromolecular structure archive, and to make it freely available to an increasingly large, diverse and demanding global community of users.

References

1. H. M. Berman, K. Henrick and H. Nakamura. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980 (2003), doi:10.1038/nsb1203-980.

2. D. C. Kuhnert, Y. Sayed, S. Mosebi, M. Sayed, T. Sewell and H. W. Dirr. Tertiary interactions stabilise the C-terminal region of human glutathione transferase A1-1: a crystallographic and calorimetric study (2005), J. Mol. Biol. 349, 825-838.

3. R. J. Read, P. D. Adams, W. B. Arendall III, A. T. Brunger, P. Emsley, R. P. Joosten, G. J. Kleywegt, E. B. Krissinel, T. Lütteke, Z. Otwinowski, A. Perrakis, J. S. Richardson, W. H. Sheffler, J. L. Smith, I. J. Tickle, G. Vriend and P. H. Zwart. A New Generation of Crystallographic Validation Tools for the Protein Data Bank (2011), Structure, 19, 1395-1412.

4. R. Henderson, A. Sali, M. L. Baker, B. Carragher, B. Devkota, K. H. Downing, E. H. Egelman, Z. Feng, J. Frank, N. Grigorieff, W. Jiang, S. J. Ludtke, O. Medalia, P. A. Penczek, P. B. Rosenthal, M. G. Rossmann, M. F. Schmid, G. F. Schröder, A. C. Steven, D. L. Stokes, J. D. Westbrook, W. Wriggers, H. Yang, J. Young, H. M. Berman, W. Chiu, G. J. Kleywegt and C. L. Lawson. Outcome of the First Electron Microscopy Validation Task Force Meeting (2012), Structure, 20, 205-214.

5. J. Trewhella, W. A. Hendrickson, M. Sato, T. Schwede, D. Svergun, J. A. Tainer, J. Westbrook, G. J. Kleywegt and H. M. Berman. Meeting Report of the wwPDB Small-Angle Scattering Task Force: Data Requirements for Biomolecular Modeling and the PDB. Structure, in the press.

6. J. L. Vanhooke, M. M. Benning, C. B. Bauer, J. W. Pike, H. F. DeLuca, Molecular structure of the rat vitamin D receptor ligand binding domain complexed with 2-carbon-substituted vitamin D3 hormone analogues and a LXXLL-containing coactivator peptide (2004), Biochemistry, 43, 4101-4110.

7. S. Benlekbir, S. A. Bueler, J. L. Rubinstein. Structure of the vacuolar-type ATPase from Saccharomyces cerevisiae at 11 Å resolution.(2012), Nat. Struct. Mol. Biol. 1, 1356-1362.