Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [dddwg] Initiation of formal proposal resulting fromdiscussions at the DDDWG satellite meeting of the ECM Croatia

Dear all

Loes has hit the nail on the head here - any format is only useful while there is software to read & interpret it correctly.

As both Loes and Ladek have said, conversion from one format to another can (and usually does) lead to loss of information. My view is that it is best avoided.

It's great that Crysalis Pro, HKL & EVAL can read many formats (BTW, so can Mosflm [which is the only one for which all the source code is freely available] and XDS, and d*Trek, etc...), but depending on these for the future is somewhat limiting.  Crysalis Pro and HKL are (at least partly) commercial, closed source software - so what happens if the company goes out of business or the current developers decide they not longer want to support the programs? EVAL & XDS are closed source - what happens when the developers retire? Even though Mosflm is open source, its funding (and probably support) will finish during the next few years.

It strikes me that a good way forward with this might be to support inclusion of the 300-odd detector formats in a project like DIALS - which is open source and designed so that new detector formats can be added "at the drop of a hat" (I think the Pilatus 12M at Diamond took about a morning). While adding support for hundreds of "historical" formats is outside the immediate scope (and funding) of DIALS, the nature of the project means that it is something that can be done by interested parties - then checked and used by third parties.

One thing has been missing from the discussion, though - and that's the storage of what I call the "brass plate" images used for distortion corrections along with the images from the data collections themselves. Uncorrected images can be somewhat challenging to process...

On 7 Dec 2015, at Mon7 Dec 09:50, Kroon-Batenburg, L.M.J. (Louise) wrote:

Dear all,

I agree totally with Herbert that imgCIF/CBF and NeXus/HDF5 data formats meet our requirements for long term storage.  Currently at synchrotrons this is the "natural" format; see the mail of Andy Gotz on ESRF data policy. Metadata are stored separately as I understand, but also in HDF/Nexus (right?). I believe that  Dectris  will not adapt their use of the miniCBF header for PILATUS, but will adhere to the full HDF5/Nexus format for EIGER. So it is not the future large facility data for which we should fear loss of fidelity, but for the earlier synchrotron data from various detectors, and often lacking vital information in the headers. Home source data have also many detector formats, with more accurate header information but often tightly linked with the equipment's software. Exporting the images often comes with loss of information. It is great that processing software can decipher all these 280 (binary) images data formats and headers (like HKL and EVAL) but indeed alongside data archiving we should also store the software, as Herbert rightfully mentions. Thus for future home source data archiving it would be necessary to convince manufacturers to write their data in full imgCIF/CBF or HDF5/Nexus. Granted: some manufacturers have documented their detector file formats very well, but still they should take the responsibility for writing conversion software to imgCIF/CBF, allowing the user to meet data archiving requirements of their funding agencies.

Best wishes,
Loes

___________________________________________________________

Dr. Loes Kroon-Batenburg

Dept. of Crystal and Structural Chemistry
Bijvoet Center for Biomolecular Research
Utrecht University
Padualaan 8, 3584 CH Utrecht
The Netherlands

E-mail : l.m.j.kroon-batenburg@uu.nl
phone  : +31-30-2532865
fax    : +31-30-2533940


From: dddwg [dddwg-bounces@iucr.org] on behalf of Wladek Minor [wladek@iwonka.med.virginia.edu]
Sent: Sunday, December 06, 2015 6:12 AM
To: IUCr Working Group on Diffraction data Deposition; Kamil Dziubek
Subject: [***SPAM***] Re: [dddwg] Initiation of formal proposal resulting from discussions at the DDDWG satellite meeting of the ECM Croatia


Dear All,

HKL process around 280 frame formats. Among them are data that were converted to more popular formats in order to 'be processed'. The conversion usually lead to suboptimal data. For some experiments (ordinary small molecules and protein MR)  this usually does not affect results significantly. However for  SAD and absolute configuration may lead to serious problems. Processing that looks OK not necessarily produce correct data - see several paper withdrawals from Nature and Science.

The header proliferation ca be easily limited by idea of setting short header that is the same for all formats, detectors and allow for processing the data.

Images are more difficult die to various compression schemes and peculiarities.

Best regards

Wladek


On 12/5/2015 5:10 PM, Herbert J. Bernstein wrote:
Dear Colleagues,

  Having a "hub" format is certainly a useful idea for conversions, but to be useful for the full range of formats is would be a good idea for it to be able to faithfully preserve all data and metadata likely to appear in any of the images we need to deal with, and for it to deal with modern multi-image files.  Also, is would be a good idea for such a hub to support some range of appropriate compressions, so that whether you are dealing with, say, a few 1 megapixel images, or with, say, a run of 3600 18-megapixel 32-bit-pixel images collected with a Eiger 16M in less than a minute the conversions could be manged with reasonable network and storage requirements could be managed.  At the same time it would be desirable for such a hub format to acceptable for use both with home sources and at beamlines.

  I am not certain that there is any one format that can satisfy all these requirements for all applications, but at the moment, I believe that the combination of imgCIF/CBF and NeXus/HDF5 comes fairly close, which is why the IUCr Committee on the Maintenance of the CIF Standard and the NeXus International Advsory Committee have been working for the past few years at making those two formats fully interoperable, using the Dectris Eiger as used in MX data collection as a test case.  There is much work still to be done, but the results so far look promising.

  I would suggest carefully considering the results of that effort in designing any archiving strategy.

  Regards,
    Herbert


On Sat, Dec 5, 2015 at 3:55 PM, Kamil Dziubek <rumianek@amu.edu.pl> wrote:

Dear All,

Indeed Esperanto is merely one of very many diffraction image formats, and therefore perhaps not worthy of particular attention. The interesting feature is however not the format itself, but its use as a vehicle to transform other common area-detector data formats via a translator software. With that advantage, CrysAlisPro is one of few commercial diffraction data software packages (provided with lab setups) capable of importing and processing not only native, but also foreign image formats. This conversion tool is described in the paper I referred to in my previous email and seems to work pretty well.

With the observation in mind, if it would not be feasible to quickly "convince all developers to rewrite their firmware to output a common image format" (as Mike said in his outline) an external interconversion tool could be actually helpful. For example, the software called Open Babel (http://openbabel.org) is known to convert over 110 file formats and data in the fields of molecular modeling, computational chemistry and cheminformatics.

Yours,
Kamil

On 2015-12-04 20:10, Herbert J. Bernstein wrote:

Dear Colleagues,

  Crysalis Esperato is one of the very large number of good image formats for diffraction images.  There are more than 200 of them.  If we get into the habit of archiving all images in their native formats, then we had better also archive all the necessary software to read those images, or when the time comes to read back images from that archive, we may find it very difficult a few years later without that software and some way to run it on then-current systems.

  Regards,
    Herbert

On Fri, Dec 4, 2015 at 10:58 AM, Kamil Dziubek <rumianek@amu.edu.pl> wrote:

Dear All,

Thank you Mike for your brief recapitulation about reaching a consensus on a common raw data storage format. As John and Brian have noted, MX and 'small molecule' single crystal diffraction studies (and not only molecular crystals, also minerals, inorganics, etc.) are at antipodes concerning the commonly used data collection setups. I hope that as soon as such a generic image format will be generally accepted, the authors of the software provided with home lab diffractometers can include conversion tools in the updated versions of their programs.

I would like also to draw your attention that one of the companies providing instruments for single crystal diffraction experiments, namely Rigaku Oxford Diffraction, introduced a generic data image format called 'Esperanto', and included it in the commercial data processing software package CrysAlisPro. This format is an efficient tool converting the most common area-detector data formats (Dectris, Rigaku d*trek, BrukerAXS saxi, Mar/Rayonix, Stoe IPDS) to be imported into the CrysAlisPro software. It has proved useful in a number of cases, including high pressure single crystal diffraction experiments (I have already used this method to process the data collected at two beamlines at the ESRF and one at SOLEIL).

The details of the method are given in the following paper:

http://scripts.iucr.org/cgi-bin/paper?S0909049513018621

Best wishes,

Kamil

On 2015-12-03 18:00, John Helliwell wrote:

Dear Mike,
Many thanks for bringing your proposal about area detector raw data image formats, that you aired in Rovinj, forward.
You mention imgcif and HDF5/NeXus explicitly and so we invite Herbert Bernstein, as chair of that work, to respond directly to your proposal and its possible practical implementation.
Thankyou,
John and Brian
PS Just one, admittedly very specific detail, whilst the bulk of MX data is collected at the synchrotron (estimated at around 90%) we believe that about 95% of 'small molecule' single crystal are detector data is measured on home lab set ups.

Emeritus Prof of Chemistry John R Helliwell DSc_Physics
Perspectives in Crystallography
 
 
From: dddwg [dddwg-bounces@iucr.org] on behalf of Michael Probert [Michael.Probert@newcastle.ac.uk]
Sent: 03 December 2015 15:05
To: dddwg@iucr.org
Subject: Re: [dddwg] Initiation of formal proposal resulting from discussions at the DDDWG satellite meeting of the ECM Croatia
 

Dear All,

 

following a lively and entertaining discussion at this year's DDDWG satellite meeting in Croatia, I feel that we should attempt to formalise some of the thoughts discussed. Therefore I enclose a starting point for discussion in a proposal at the bottom of this email. I feel very strongly about the need for advancement in this area and that the time is absolutely correct to initiate this. It has recently been pointed out that some institutions are already archiving raw data and defining sensible protocols for this seems incredibly sensible if not an absolute necessity for the longevity of such projects.

 

Please feel free to comment on the outline below - I would hope that we could come to some agreed position that could then be taken forward by the group leaders as representative of our collective feelings on the issue of data storage, usefulness and to a certain extent future proofing.

 

I hope that I have managed to convey my ideas clearly and that the proposal makes sense. I am certain that there are aspects that need clarification and am equally certain that a large degree of finessing may be required before this can be taken to the next step. However we must start somewhere and condensing ideas from the meeting seems a good place to start.

 

Many thanks for your time, bye for now

 

Mike

 

The need for fully archived data is becoming more apparent and the
volume of said data is becoming ever greater. One of the larger
hurdles to this process is that for the data archived to be useful it
must be stored in a format that allows other users the ability to
interact with it. Some years ago the idea of imgCIF was created, but
for various reasons instrument manufacturers were reluctant to adapt
to this format. Since then with the advent of newer detector
technologies there has been a small explosion in the number and
variety of frame formats that are currently in use. It now seems a
daunting uphill task to convince all developers to rewrite their
firmware to output a common image format, therefore an alternative
must be found. As a community we currently archive data (positions and
structure factors) in a common format - CIF. There is no reason why
this philosophy would not work for the raw data as well. Users
currently convert all of their processed data into CIF format for
publication, therefore I put it to the DDDWG that one sensible way
forward would be to have users archive their raw data in a common format
(be that imgCIF or HDF5/NeXus) at the point of submission. There are
currently image conversion utilities available for some image formats
and it would not take a large investment of time to generate these for
all users; indeed, I am sure nearly all of these are written in various
places around the world. If the conversion is lossless and all
information on the experimental setup is maintained then there is no
reason for any degradation of data, but there is the huge advantage
that this information would then be of use to everyone for
reinvestigation or authentication protocols. I believe this results in
one moderately sized problem in deciding which format is the best to
use for archiving. This problem can be approached in different ways
although there is, I believe, a simple and pragmatic answer; the
majority of raw data is now produced at synchrotrons due to the
technologies employed - therefore we should take the direction from

them as they are mostly working towards something common in format.


 
 
 
Dr Michael R. Probert
Head of Crystallography
Lecturer in Inorganic Chemistry
School of Chemistry
Newcastle University
Bedson Building
Newcastle upon Tyne
NE1 7RU

tel: +44(0) 191 208 6641
fax: +44(0) 191 208 6929
_______________________________________________
dddwg mailing list
dddwg@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/dddwg
 
 

_______________________________________________
dddwg mailing list
dddwg@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/dddwg

_______________________________________________
dddwg mailing list
dddwg@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/dddwg
 
 
 



_______________________________________________
dddwg mailing list
dddwg@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/dddwg

-- 
Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616
http://krzys.med.virginia.edu/CrystUVa/wladek.htm


US-mail address:
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736

Fed-Ex address:
Department of Molecular Physiology and Biological Physics
1340 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908

----
_______________________________________________
dddwg mailing list

Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0QH

Chairman of International Union of Crystallography Commission on Crystallographic Computing
Chairman of European Crystallographic Association SIG9 (Crystallographic Computing) 




_______________________________________________
dddwg mailing list
dddwg@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/dddwg

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.