Crystallographic data

Motivations, challenges, horror stories and opportunities: Experiences of diffraction data management, archival and publication at the UK National Crystallography Service

Simon J. Coles

UK National Crystallography Service, Chemistry, Faculty of Natural and Environmental Sciences, University of Southampton, Highfield, Southampton, SO17 1BJ, UK

Abstract

 The UK National Crystallography Service (NCS) provides data collection and full structural analysis services on small molecules and is a `Mid-range Facility' operating between the local research laboratory and the national synchrotron. The NCS has driven large-scale data management projects starting with CombeChem [1] and evolving to arrive at the Web2.0 world we work in today and the laboratory houses the most powerful laboratory-based small molecule diffractometer in the world. This level of service support has significant implications for the management, archival and publication of the diffraction data and structural results produced [2]:

1) Scale and volume (management): The NCS generates approximately an order of magnitude more data than the conventional research laboratory - this means that simply writing data to 'local' disks and labelling them has not been an option for many years. There is a difference in the approach to data management of diffraction images as opposed to derived and results data and as such we have adopted more conventional long-term storage options for these.

2) Scale and volume (dissemination): Generating so much data produces the follow-on issue of trying to publish and disseminate it. Derived and results data are stored and made available through information management, electronic laboratory notebook and repository systems - these are permanently available to authorised users, or in some cases openly online (http://ecrystals.chem.soton.ac.uk/). Managing raw data is rather more complicated and will be discussed in detail.

3) Accountability: As service providers we have a 'duty of care' for the diffraction data we collect on behalf of others - it is often many years before a collaborator comes back to the NCS and points out a problem with the analysis or wishes to publish the results. Moreover, as a large scale service operating in the modern age we are now moving into a policy-driven era of 'Service Level Agreements' and 'Terms and Conditions of Use', where we must commit to long term archival and curation of data on behalf of our users.

4) Disciplinary differences: As research becomes ever more inter- or multi-disciplinary, the systems we put in place for a particular discipline don't necessarily support the working practices and culture of others - the NCS laboratory now also houses instrumentation supporting macromolecular crystallography and materials science.

5) Institutional boundaries: The NCS provides data for researchers working in other institutions and operates out of both university and synchrotron facilities - this means data management crossing different technical, administrative and cultural boundaries.

[1] S.J. Coles, J.G. Frey, M.B. Hursthouse, M.E. Light, A.J. Milsted, L.A. Carr, D. DeRoure, C.J. Gutteridge, H.R. Mills, K.E. Meacham, M. Surridge, E. Lyon, R. Heery, M. Duke & M. Day, An E-Science Environment for Service Crystallography - from Submission to Dissemination, J. Chem. Inf. Model., (2006), 46, 1006-1016.

[2] S.J. Coles & P.A. Gale, Changing and challenging times for service crystallography, Chem Sci., (2012), 3, 683-689.