Sharing structural data
The crystallographic community is remarkable in terms of its data-sharing practices, which allow scientists worldwide to gain new insights and knowledge from the collaborative collection of data. These exemplary data-sharing practices have enabled a number of crystallographic databases to exist for over 50 years and researchers have been able to learn from comprehensive archives of crystal structures covering inorganic, organic, metal-organic and biological macromolecular compounds, which has been of enormous value to a wide range of research applications. In crystallography, however, there isn’t just one database but a variety of different ones, differing in subject area or chemical space, experimental or computational method and database implementation, sustainability and access models. Clear distinctions were drawn up between some databases when they were established and researchers needed to know into which database to deposit their data and which database to search for their particular subject area. In the decades since these databases were set up the world has changed, as have the needs and expectations of the researcher. Advances in science mean that the distinctions between historically separate disciplines and sectors have become blurred; for instance, in research to design new batteries, gas storage systems, zeolites, catalysts, magnets and fuel additives all cross the boundaries between the worlds of inorganic and organic structures. This is coupled with a desire from researchers for more integrated databases and the expectation that a single search will be able to find all the required information.
Over the last few decades these desires and expectations have led to more collaborations, integrations and linking between data sources, be it publishers and repositories or between the data repositories themselves. The Cambridge Crystallographic Data Centre (CCDC) is one of many organisations that has started to adapt to these changing landscapes.
Earlier this year we saw the launch of joint deposition and access services for crystallographic data across all chemistry through a collaboration between the CCDC and FIZ Karlsruhe – Leibniz Institute for Information Infrastructure (FIZ Karlsruhe). As a result, researchers and educators worldwide, working across all fields of chemistry, are now able to explore over one million crystallographic structures through a joint Access Structures service, enabling them to view and retrieve deposited data sets associated with structures in the Cambridge Structural Database (CSD) and the Inorganic Crystal Structure Database (ICSD). This was the result of a large-scale project to unify deposition and access processes at the CCDC and FIZ Karlsruhe, and both organisations are agreed on the value this has for the community and look forward to collaborating further in the years to come.
In parallel with this work the CCDC has been working closely with the Protein Data Bank (PDB) and these close ties have resulted in a variety of joint activities. An example is the integration of Mogul into the PDB’s ligand validation pipeline for newly deposited structures, which allows researchers to use the knowledge-based library of molecular geometry derived from the CSD to check the geometry of their ligands. Three years ago we also saw the introduction of linking between the PDB's chemical component dictionary and structures in the CSD to allow researchers to more easily connect between the two data resources. More recently the CCDC has also developed CSD-CrossMiner to allow users to search databases such as the CSD and the PDB simultaneously using pharmacophore queries. This delivers an overall interactive search experience with application areas in interaction searching, scaffold hopping or the identification of novel fragments for specific protein environments. While developed from a drug discovery perspective, these methods are potentially applicable in broader fields, for example to find potential co-former molecules for crystallisation or for finding promising new ligating moieties in organometallic compound design.
We hope that these community initiatives are of value to researchers worldwide and we look forward to being driven towards new opportunities and collaborations based on the needs of the community we serve.
 Interactive and Versatile Navigation of Structural Databases, O. Korb, B. Kuhn, J. Hert, N. Taylor, J. Cole, C. Groom and M. Stahl (2016). J. Med. Chem. 59, 4257-4266; DOI: 10.1021/acs.jmedchem.5b01756