Towards policy for archiving raw data for macromolecular crystallography: Recent experience
aCrystal and Structural Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands; bSchool of Chemistry, University of Manchester, Brunswick Street, Manchester, M13 9PL, UK
Abstract
The need to consider what constitutes exemplar levels of metadata and raw diffraction data deposition, including the possibility of lossless compression, to be associated with future macromolecular crystallography publications is a pressing need. We have recently undertaken a relevant practical examination of procedures to liaise between distant research centres, Utrecht and Manchester, for network transfer of diffraction data images within a real research study (two anti-cancer drugs binding to a test protein, lysozyme). The prospects for local University archiving are being explored within a Research Data Management Service (RDMS) for the University of Manchester. Data of eleven lysozyme crystals, co-crystallized with cisplatin, carboplatin, DMSO and NAG (N-acetylglucosamine), were measured using two diffractometers and processed with four software packages [1]. Most of the crystals diffracted to 1.7 A. In total the data take 35.3Gb of disk space, and 20Gb when compressed (using the LZW algorithm of ncompress for lossless data compression). The diffractometers were equipped with completely different detectors: an R-axis IV imaging plate (with 3000x3000 pixels and 300mm diameter) and a Bruker Platium135 CCD (with 1024x1024 binned pixels and 92x92 mm2). Measurement strategies are determined by the detector aperture (size and distance) and the design of the goniometer. Based on 4/mmm point group symmetry the average redundancy of the Rigaku and Bruker data was 21.1 and 22.4 respectively, which corresponds to 4.5 and 12.0 per Gb. The equipment's data collection software writes vital information about the experimental set-up in the header of the images, such as the detector type, pixel size, overflow handling, the wavelength (trivial in case of a home source), the sample-to-detector distance, the goniometer axes and detector swing angle, the rotation axis and range, hopefully sufficient for any alien data processing software to correctly interpret the data. The images were processed by the internal software of the equipment, either d*Trek or Proteum2, and with Mosflm [2] and EVAL [3]. The EVAL software suite can read compressed images, so all EVAL data processing was done with compressed raw data. A comparison of the data processing and refinement statistics is made. Problems that were encountered will be discussed. They relate to distortion and flood field corrections, the necessity to refine cell orientation matrices, the establishment of crystal slippage and loss of tetragonal symmetry and removal of reflections affected by diffraction of ice. The paper describing this analysis [4] will make reference to the raw data archived at the University of Manchester, and each data set will have an associated DOI and reference to RCSB.
[1] Tanley, S.W.M, Schreurs, A.M.M., Kroon-Batenburg, L.M.J., Meredith, J., Prendergast, R.,Walsh, D., Bryant, P. , Levy. C. & Helliwell, J.R. (2012), Acta Cryst. D68, 601-612.
[2] Leslie, A.G.W. & Powell, H.R (2007). Evolving Methods for Macromolecular Crystallography. 245 41-51, ISBN 978-1-4020-6314-5
[3] Schreurs, A.M.M., Xian, X. & Kroon-Batenburg, L.M.J. (2009). Journal of Applied Crystallography 43, 70-82
[4] Tanley, S.W.M, Schreurs, A.M.M, Helliwell, J.R. & Kroon-Batenburg, L.M.J. (2012), J. Appl. Cryst. Submitted.