A vision involving raw data archiving via local archives as a supplement to the existing processed data archives (PDB, CSD, ICDD etc)

John R. Helliwell*a, Brian McMahon*b and Thomas C. Terwilligerc

aSchool of Chemistry, University of Manchester, M13 9PL, UK; bIUCr, 5 Abbey Square, Chester CH1 2HU, UK; cLos Alamos National Laboratory, Mailstop M888, Los Alamos, NM 87545, USA

Abstract

The International Union of Crystallography (IUCr) is investigating the advantages for the crystallographic community of requiring, rather than only encouraging, the archiving of raw, unprocessed, experimental data - typically in the form of diffraction images - measured from a crystal, a fibre or a solution. The issue is being evaluated by an IUCr Working Group (see http://forums.iucr.org/). Such archiving could allow as yet undeveloped processing methods to have access to the originally measured data; and archiving raw data could help to reduce the occurrence of scientific fraud.

On the other hand, raw data sets are orders of magnitudes larger than structure factors and molecular coordinates. The debate within our community about this much larger proposed archiving effort revolves around the issue of 'cost versus benefit'. Costs can be reduced by preserving the raw data in local repositories, either at centralized synchrotron and neutron research institutes, or at research universities.

In parallel developments, sensitivities to avoiding research malpractice are encouraging Universities to establish their own data repositories for research and academic staff. These various 'raw data archives' would complement the existing processed data collections available through the curated structural databases and IUCr journals.

They would, however, most likely have gaps in their global coverage arising from the variability in the level of resources that individual institutions would be able to provide. In some cases, there are initiatives to provide centralized resources to multiple institutions (for example, the TARDIS project [1] provides federated indexing and discovery services for Australian universities). Assignment of unique identifiers by central authorities such as DataCite [2] will also help to assemble in a uniform format the metadata required to allow the creation of global federated services for the management and interrogation of archives of raw data sets.

We therefore believe that a realistic vision for the future includes a sufficiently large raw data archive, with reasonable global coverage, achieved through distributed and federated repository services.

[1] Androulakis, S., J. Schmidberger, Bate, M. A., DeGori, R., Beitz, A., Keong, C., Cameron, B., McGowan, S., Porter, C. J., Harrison, A., Hunter, J., Martin, J. L., Kobe, B., Dobson, R. C. J., Parker, M. W., Whisstock, J. C., Gray, J., Treloar, A., Groenewegen, D., Dickson, N. & Buckle, A. M. (2008). Federated repositories of X-ray diffraction images. Acta Cryst. D64, 810-814 [ doi:10.1107/S0907444908015540 ]

[2] DataCite (2009). http://www.datacite.org