Dear all
Loes has hit the nail on the head here - any format is only useful while there is software to read & interpret it correctly.
As both Loes and Ladek have said, conversion from one format to another can (and usually does) lead to loss of information. My view is that it is best avoided.
It's great that Crysalis Pro, HKL & EVAL can read many formats (BTW, so can Mosflm [which is the only one for which all the source code is freely available] and XDS, and d*Trek, etc...), but depending on these for the future is somewhat limiting. Crysalis Pro and HKL are (at least partly) commercial, closed source software - so what happens if the company goes out of business or the current developers decide they not longer want to support the programs? EVAL & XDS are closed source - what happens when the developers retire? Even though Mosflm is open source, its funding (and probably support) will finish during the next few years.
It strikes me that a good way forward with this might be to support inclusion of the 300-odd detector formats in a project like DIALS - which is open source and designed so that new detector formats can be added "at the drop of a hat" (I think the Pilatus 12M at Diamond took about a morning). While adding support for hundreds of "historical" formats is outside the immediate scope (and funding) of DIALS, the nature of the project means that it is something that can be done by interested parties - then checked and used by third parties.
One thing has been missing from the discussion, though - and that's the storage of what I call the "brass plate" images used for distortion corrections along with the images from the data collections themselves. Uncorrected images can be somewhat challenging to process... On 7 Dec 2015, at Mon7 Dec 09:50, Kroon-Batenburg, L.M.J. (Louise) wrote: Dear all, I agree totally with Herbert that imgCIF/CBF and NeXus/HDF5 data formats meet our requirements for long term storage. Currently at synchrotrons this is the "natural" format; see the mail of Andy Gotz on ESRF data policy. Metadata are stored separately as I understand, but also in HDF/Nexus (right?). I believe that Dectris will not adapt their use of the miniCBF header for PILATUS, but will adhere to the full HDF5/Nexus format for EIGER. So it is not the future large facility data for which we should fear loss of fidelity, but for the earlier synchrotron data from various detectors, and often lacking vital information in the headers. Home source data have also many detector formats, with more accurate header information but often tightly linked with the equipment's software. Exporting the images often comes with loss of information. It is great that processing software can decipher all these 280 (binary) images data formats and headers (like HKL and EVAL) but indeed alongside data archiving we should also store the software, as Herbert rightfully mentions. Thus for future home source data archiving it would be necessary to convince manufacturers to write their data in full imgCIF/CBF or HDF5/Nexus. Granted: some manufacturers have documented their detector file formats very well, but still they should take the responsibility for writing conversion software to imgCIF/CBF, allowing the user to meet data archiving requirements of their funding agencies. Best wishes, Loes ___________________________________________________________ Dr. Loes Kroon-Batenburg Dept. of Crystal and Structural Chemistry Bijvoet Center for Biomolecular Research Utrecht University Padualaan 8, 3584 CH Utrecht The Netherlands E-mail : l.m.j.kroon-batenburg@uu.nl phone : +31-30-2532865 fax : +31-30-2533940 From: dddwg [dddwg-bounces@iucr.org] on behalf of Wladek Minor [wladek@iwonka.med.virginia.edu] Sent: Sunday, December 06, 2015 6:12 AM To: IUCr Working Group on Diffraction data Deposition; Kamil Dziubek Subject: [***SPAM***] Re: [dddwg] Initiation of formal proposal resulting from discussions at the DDDWG satellite meeting of the ECM Croatia Dear All, HKL process around 280 frame formats. Among them are data that were converted to more popular formats in order to 'be processed'. The conversion usually lead to suboptimal data. For some experiments (ordinary small molecules and protein MR) this usually does not affect results significantly. However for SAD and absolute configuration may lead to serious problems. Processing that looks OK not necessarily produce correct data - see several paper withdrawals from Nature and Science. The header proliferation ca be easily limited by idea of setting short header that is the same for all formats, detectors and allow for processing the data. Images are more difficult die to various compression schemes and peculiarities. Best regards Wladek On 12/5/2015 5:10 PM, Herbert J. Bernstein wrote: Dear Colleagues, Having a "hub" format is certainly a useful idea for conversions, but to be useful for the full range of formats is would be a good idea for it to be able to faithfully preserve all data and metadata likely to appear in any of the images we need to deal with, and for it to deal with modern multi-image files. Also, is would be a good idea for such a hub to support some range of appropriate compressions, so that whether you are dealing with, say, a few 1 megapixel images, or with, say, a run of 3600 18-megapixel 32-bit-pixel images collected with a Eiger 16M in less than a minute the conversions could be manged with reasonable network and storage requirements could be managed. At the same time it would be desirable for such a hub format to acceptable for use both with home sources and at beamlines. I am not certain that there is any one format that can satisfy all these requirements for all applications, but at the moment, I believe that the combination of imgCIF/CBF and NeXus/HDF5 comes fairly close, which is why the IUCr Committee on the Maintenance of the CIF Standard and the NeXus International Advsory Committee have been working for the past few years at making those two formats fully interoperable, using the Dectris Eiger as used in MX data collection as a test case. There is much work still to be done, but the results so far look promising. I would suggest carefully considering the results of that effort in designing any archiving strategy. Regards, Herbert _______________________________________________
dddwg mailing list
dddwg@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/dddwg
--
Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616
http://krzys.med.virginia.edu/CrystUVa/wladek.htm
US-mail address:
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736
Fed-Ex address:
Department of Molecular Physiology and Biological Physics
1340 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908 ---- _______________________________________________ dddwg mailing list Harry -- Dr Harry Powell, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0QH
Chairman of International Union of Crystallography Commission on Crystallographic Computing Chairman of European Crystallographic Association SIG9 (Crystallographic Computing)
|