[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- From: James Hester <jamesrhester@xxxxxxxxx>
- Date: Fri, 10 Sep 2010 17:51:56 +1000
- In-Reply-To: <alpine.BSF.2.00.1009030735110.95035@epsilon.pair.com>
- References: <AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><AANLkTilqKa_vZJEmfjEtd_MzKhH1CijEIglJzWpFQrrC@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229542@SJMEMXMBS11.stjude.sjcrh.local><AANLkTikTee4PicHKjnnbAdipegyELQ6UWLXz9Zm08aVL@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229552@SJMEMXMBS11.stjude.sjcrh.local><AANLkTinZ4KNsnREOOU6sVFdGYR_aQHcjdWr_ko648NGm@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA5416659DED8C@SJMEMXMBS11.stjude.sjcrh.local><AANLkTintziXhwVCEFD0yUtTDo9KG8ut=oL4OgmkjmEBe@mail.gmail.com><alpine.BSF.2.00.1008240629120.23114@epsilon.pair.com><AANLkTi=+qZQrWJ3duOzWyPq5H=w1GOVbeKRfFLTR8u5a@mail.gmail.com><alpine.BSF.2.00.1008240920580.23114@epsilon.pair.com><AANLkTikRLKp6oREvD4KcgUd-H-Cu6xoOrGWgQE1zUyx7@mail.gmail.com><alpine.BSF.2.00.1009022333190.52468@epsilon.pair.com><AANLkTimLUnUjNuS9EmMbtTurxB3MGtGvM6gWxZw6aRLE@mail.gmail.com><alpine.BSF.2.00.1009030735110.95035@epsilon.pair.com>
Thanks Herbert for this detailed information, which is a great help to me in forming an opinion. Please understand that we are not even close to considering excluding imgCIF from CIF. Rather, I am collecting information in order to form an opinion and work with everybody to find a solution which then goes back to the DDLm group and then on to COMCIFS regarding CIF2. Speculation about potential consequences for imgCIF are just part of the information-gathering process. In general terms, CIF is now a 'framework', which I think will make bringing XML and HDF5 developments under the CIF umbrella relatively simple. Please also understand that my comments about the usefulness of CBFlib were in the context of a typical beamline user wishing to handle their data, rather than from a programmer's point of view. I was not casting aspersions on CBFlib, rather seeking more information (which you have provided). I am afraid that terminology here may be confusing me: I would like to talk about imgCIF as a pure ASCII format (eg IT Vol G p 40 para 15) and CBF as the binary equivalent. However, your previous statements indicate that imgCIF could also be written in UTF16 encoding. So: when you speak of the Dectris detector output as 'imgCIF', what encoding is used? The point you make about embedding imgCIF into a text-only format (in this case XML) is, I agree, a use-case that we have to consider. I see merit in the position that 'CIF2 content' inside a container is not constrained by encoding, in those cases where the container is able to specify the encoding itself. This is *pedantically* true already in that the 'header' of the container file as a whole is *not* the CIF2 magic header. So: what does everyone think of the following statement being included in the standard? "Note that a CIF2-conformant character stream that forms part of a larger stream is not constrained to be in UTF8 encoding if the encoding of the CIF2 stream is specified in a standards-conformant manner within the enclosing stream. For example, CIF2 content within an XML file is not constrained to be UTF8-encoded as standard XML attributes can be used to manage encoding." (Perhaps John B, who has shown superior wordsmithing capabilities, could polish this up a bit?) On Fri, Sep 3, 2010 at 11:10 PM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote: > Here is more detail on the use of CBFlib. > > I know for sure that CBFlib is used directly by mosflm and adxv. While XDS > uses code that was prototyped in the Fortran part of CBFlib, they work with > their own versions. However, Kay Diederichs has also used the CBFlib C code > for work on simulations. Paul Ellis started HKL2000 off with CBFlib, but I > don't know if they stayed with it. > > As a practical matter, whether someone uses CBFlib itself, it is an > essential part of the documentation that people use to understand how the > various compression schemes work, and they use the utility cif2cbf from the > package both as an external converter and as a validator and as a debugger > when they don't want to put all the functionality in their own code. If you > have a funny CBF in any of the semi-infinite number of representations, > cif2cbf allows you to check it, get a hex dump of it or convert it to a > specific compression scheme or format that some other program needs to > process that file. > > In other words, CBFlib on its own _is_ useful. > > Sorry about not giving you a list re imgCIF use, I thought you were asking > me about CBFlib use -- every beamline that uses a Dectris Pilatus 6M > produces imgCIF as the default. This had been a byte-offset compressed > binary with a mini-header. Dectris has now moved up to writing a full > header. There were some beamlines with some of the older smaller Dectris > detectors that were producing TIFF, but all currently delivered Dectris > detectors of all sizes produce imgCIF as the default. > > All the major detector manufacturers now offer CBF as an option except for > Bruker which is debugging an optional CBF output. When I checked at the ACA > meeting in July they all also said that their processing packages can accept > CBF as an input. > > On the XML use, I would suggest a more broad-minded attitude. Judging from > the workshop I was at in January at ESRF, it has much broader support than > just from Diamond, especially for spectra which have smaller data volume > than images. HDF5 is the most widely accepted scientific binary data format > for the physics community, and XML is the easiest and most reliable way to > port smaller HDF5 datasets from site to site. The problem with XML is that > for large files such as crystallographic images ordinary straight-text XML > produces huge, impractical files. binutf allows for a compromise in which > you have a true XML UCS-2 file but with the binary having only a 7% > overhead. > > I have no choice -- I _will_ (indeed already do) produce CIFs with UCS-2 > binary sections. If COMCIFS repeats the unfortunate decision of 1997 of > saying that what the synchrotron community needs can't be called CIF, we'll > just go back to calling it imgNCIF (which is an acronym for image-not-CIF), > but we will still have to produce it for the community. In 1998 after we had > a face-to-face discussion at a BNL workshop, that decision was reversed and > what the synchrotron community needed was folded under the CIF umbrella, and > imgNCIF became imgCIF. I hope we can have discussions now to avoid the need > for a pointless schism. > > Your proposal on the relationship between CIF2 and imgCIF sounds like a > replay of the discussions we had in 1997, with CIF headers following one > standard and binary sections following another. You can make that work, but > it is clumsy and hard for users to work with. It is better if we have one > simple, comprehensible standard for the files they work with as a whole. > > Let me be clear -- imgCIF is produced worldwide and used for thousands of > images daily. These older "legacy" imgCIF images will be around for a long > time to come, and whatever new imgCIF (or if you force us to it, imgNCIF) > images we produce will need to be, and will be, supported by software that > handles both the legacy and the new images and has a clean interface to HDF5 > and XML as well. I would greatly prefer that this be coordinated with > COMCIFS and done in a way that helps the community to understand the > relationship between CIF and imgCIF, but if COMCIFS feels a need to return > to its 1997 position and exclude the data we work with from its charge, then > imgCIF can return to being imgNCIF. > > If we are to resolve this, then, as in 1998, we need a meeting or e-meeting. > Once you have a web-cam, I would suggest you and I have a skype meeting to > frame the issues in dispute and organize a wider meeting. > > -- Herbert > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Fri, 3 Sep 2010, James Hester wrote: > >>> On Fri, 3 Sep 2010, James Hester wrote: >>> >>>> Thanks Herbert for providing the imgCIF perspective. >>>> >>>> I am unfortunately severely restricted in my ability to attend >>>> overseas meetings at present, for family and work reasons. I am also >>>> keen to have our discussions written down and available for perusal by >>>> those that will come later. >>> >>> How about an e-meeting? >> >> OK, I think we need to try online as my carefully crafted arguments >> seem to be misunderstood more often than not. >> Let me buy a web cam first! >> >>>> We need to discuss the relationship of imgCIF to CIF2 explicitly, if >>>> imgCIF is going to influence our decisionmaking. Some questions for >>>> Herbert to answer for the record: >>>> >>>> 1. How widely used are non-CBF forms of imgCIF at present? By "widely >>>> used" I mean both >>>> (a) supported by software packages that allow one to do "useful >>>> work", most obviously to extract diffraction spots >>> >>> I assume by "non-CBF" you mean the forms that do the binary sections >>> in something that is not pure binary -- all software that uses CBFlib >>> supports them automatically for reading. For writing, most software >>> chooses one representation for writing, usually byte-offset or >>> packed binary, except when we have to debug -- then the ascii >>> forms, esp. the hexdump form are very useful. >> >> You are correct in interpreting what I mean by "non-CBF". >> >> I understand that CBFlib supports everything, but CBFlib on its own is >> not useful. Do you know approximately what programs use CBFlib? I >> know only of rasmol, but you presumably know of many more. >> >>>> (b) provided as an output format (even optionally) by beamlines or >>>> detector manufacturers >> >>> See above >> >> I see nothing in your reply on the availability of imgCIF files from >> detectors or instruments. >> >>>> 2. What is the advantage of having "pure text" image files? Why isn't >>>> a format like CBF more appropriate? >>> >>> While I agree, when we deal with people who like XML e.g. the NeXus >>> form of imgCIF, then we have no choice -- no binary is allowed, so >>> UCS-2 becomes important. Don't ask me to defend XML. It is simply a >>> fact of life. >> >> I am guessing that this NeXuS-XML requirement is coming from Diamond, >> and if this is what they want I can see why you are keen to integrate >> imgCIF into HDF5, so that HDF5-XML conversion can be carried out the >> standard HDF5 way, rather than encapsulating the entire imgCIF file as >> a NeXuS-XML dataset. OK: so apart from this relatively recent and >> frankly crazy-wierd use case, is there any other use-case for >> pure-text imgCIF? Can we regard the "Diamond" case as a >> beaurocratically-driven kluge that will be resolved via your HDF5 >> work, leaving no other reason to create a space-efficient CIF2 version >> of imgCIF? >> >>>> 3. What is the problem with a scenario where "pure text" imgCIF >>>> remains in its current CIF1 form, and CIF2 advances are incorporated >>>> into the CIF sections of CBF? >>> >>> I don't understand this question, nor the assumptions behind it. >> >> Let me be less obtuse: >> I envision a CBF2 format, which is a CBF file with CIF2 instead of >> CIF1 syntax. A corresponding imgCIF2 format exists. We *do not care* >> about the space-efficiency of these imgCIF2 files. We recommend that >> all new crystallographic image-handling applications should target >> CBF2 only, rendering space-efficiency of imgCIF2 files irrelevant. >> Legacy applications, of which there are very few, will be restricted >> to the original imgCIF, which is very rarely produced in any case >> (anticipating your answers to my above questions). >> >> What are your (Herbert's, anybody else's) thoughts on such a plan? >> >>>> Herbert: your work merging a DDL2-based version with DDLm-like >>>> features in HDF5 format sounds interesting. Are you planning to >>>> present a motivation and/or discussion of this work at some stage? >>> >>> This is the subject of some grant applications, so not appropriate for >>> detailed open discussion in this forum at this time. The motivations >>> are simple -- to satisfy the demands of several major facilities for >>> easy integration of crytallographic synchrotron images into HDF5-based >>> data >>> management systems while preserving access to metadata, and to extend >>> HDF5 >>> with relational meta-data access. This second aspect is an increasingly >>> critical need and will go forward in any case. If we have >>> a meeting or e-meeting, I can explain better. >> >> OK, I think reading between the lines I see where this is coming from >> (read your CACM article as well, BTW). It'd be good to discuss some >> of these plans at some stage. >> >>>> >>>> On Tue, Aug 24, 2010 at 11:31 PM, Herbert J. Bernstein >>>> <yaya@bernstein-plus-sons.com> wrote: >>>>> >>>>> Dear James, >>>>> >>>>> I have not been at all reticent -- imgCIF will be very poorly >>>>> supported >>>>> by CIF2 as currently proposed. Of necessity, imgCIF changes encodings >>>>> internally -- that it why it uses MIME -- same problem as email with >>>>> images, same solution. >>>>> >>>>> Any purely text version has at least a 7% overhead as compared to >>>>> pure binary. Restricting to UTF-8 increases the overhead to at least >>>>> 50%. >>>>> We may get away with the 7% (UTF-16). The 50% version (UTF-8) will be >>>>> ignored by the community as unworkable. The most likely to be used >>>>> version >>>>> will be the current DDL2-based version with embedded compressed >>>>> binaries >>>>> that I am augmenting with DDLm-like features >>>>> and merging in with HDF5. >>>>> >>>>> As I noted many months ago, the unfortunate reality is that the >>>>> current CIF2 effort will not merge well with imgCIF. If avoiding >>>>> a split is a important -- we need a meeting. I would suggest >>>>> involving Bob Sweet and holding it at BNL in conjunction with >>>>> something relevant to NSLS-II. >>>>> >>>>> Regards, >>>>> Herbert >>>>> >>>>> ===================================================== >>>>> Herbert J. Bernstein, Professor of Computer Science >>>>> Dowling College, Kramer Science Center, KSC 121 >>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>> >>>>> +1-631-244-3035 >>>>> yaya@dowling.edu >>>>> ===================================================== >>>>> >>>>> On Tue, 24 Aug 2010, James Hester wrote: >>>>> >>>>>> Hi Herbert: regarding imgCIF, I agree that splitting it off is not a >>>>>> desirable outcome. I would like to get an idea of how well imgCIF can >>>>>> be accommodated under the various encoding proposals currently >>>>>> floating around, as you have been rather reticent to bring it up. My >>>>>> naive take on things is that a UTF8-only encoding scheme for CIF2 >>>>>> would not pose significant issues for imgCIF, and a decorated UTF16 >>>>>> encoding in the style of Scheme B would be even better, and quite >>>>>> adequate, so imgCIF is not actually presenting any problems and so was >>>>>> a red herring. >>>>>> >>>>>> I'm not sure that face-to-face or Skype discussions are necessarily >>>>>> going to be more productive. Writing things down, while slower, >>>>>> allows me at least to collect my thoughts and those of other >>>>>> participants, and hopefully make a reasoned contribution (my apologies >>>>>> if I am too long-winded) and as an added bonus those thoughts are >>>>>> recorded for later reference. For example, where would I now find the >>>>>> background on why a container format for imgCIF is such a bad idea? >>>>>> Presumably that was all thrashed out in face to face discussions, and >>>>>> no record now remains. >>>>>> >>>>>> On Tue, Aug 24, 2010 at 8:56 PM, Herbert J. Bernstein >>>>>> <yaya@bernstein-plus-sons.com> wrote: >>>>>>> >>>>>>> Dear Colleagues, >>>>>>> >>>>>>> James' and John's last interchange is so voluminous, I doubt any of >>>>>>> us has been able to fully appreciate the rich complexity of ideas >>>>>>> contained therein. For example, one of the suggestions far down in >>>>>>> the text is: >>>>>>> >>>>>>> (James now) Indeed. My intent with this specification was to ensure >>>>>>> that third parties would be able to recover the encoding. If imgCIF >>>>>>> is >>>>>>> going to cause us to make such an open-ended specification, it is >>>>>>> probably a sign that imgCIF needs to be addressed separately. For >>>>>>> example, should we think about redefining it as a container format, >>>>>>> with a CIF header and UTF16 body (but still part of the >>>>>>> "Crystallographic Information Framework")? >>>>>>> >>>>>>> The idea of an imgCIF "header" in CIF format and a image in another >>>>>>> is >>>>>>> an >>>>>>> old, well-established, thoroughly discussed, and mistaken idea, >>>>>>> rejected >>>>>>> in 1998. The handling of multiple images in a single file (e.g. >>>>>>> a jpeg thumbnail and crystal image and a full-size diffraction image) >>>>>>> requires the ability to switch among encodings within the file -- >>>>>>> something handled by the current DDL2 and MIME-based imgCIF format >>>>>>> and >>>>>>> which would be a serious problem in CIF2 has currently proposed, >>>>>>> increasing the chances that we will have to move imgCIF entirely into >>>>>>> HDF5 and abandon the CIF representation entirely, sharing only >>>>>>> the dictionary and not the framework. >>>>>>> >>>>>>> If you look carefully, you will see a similar trend with mmCIF, in >>>>>>> which >>>>>>> and XML representation sharing the dictionary plays a much more >>>>>>> important role than the CIF format. >>>>>>> >>>>>>> Is it really desirable to make the new CIF format so rigid and >>>>>>> unadaptable that major portions of macromolecular crysallography >>>>>>> end up migrating to very different formats, as they already are >>>>>>> doing? Yes, there is great value in having a common dictionary, >>>>>>> but would there not be additional value in having a sufficiently >>>>>>> flexible common format to allow for more software sharing than >>>>>>> we now have? It is really desirable for us to continue in the >>>>>>> direction of a single macromolecular experiment having to >>>>>>> deal with HDF5 and CIF/DDL2/MIME representations of the image data >>>>>>> during collection, CCP4-style CIF representations during processing >>>>>>> and deposition and legacy PDB and PDBML representations in subsequent >>>>>>> community use? If we could be a little bit more flexible, we might >>>>>>> be >>>>>>> able to reduce the data interchange software burdens a little. >>>>>>> Right now, this discussion seems headed in the direction of simply >>>>>>> adding yet another data representation (DDLm/CIF2) to the mix, >>>>>>> increasing the chances of mistranslation and confusion, rather >>>>>>> that reducing them. >>>>>>> >>>>>>> Please, step back a bit from the detailed discussion of UTF8 and >>>>>>> look at the work-flow of doing and publishing crystallographic >>>>>>> experiments and let us try to make a contribution that simplifies >>>>>>> it, not one that makes it more complex than it needs to be. >>>>>>> >>>>>>> I suggest we need to meet and talk, either face-to-face, or by skype. >>>>>>> >>>>>>> Regards, >>>>>>> Herbert >>>>>>> >>>>>>> ===================================================== >>>>>>> Herbert J. Bernstein, Professor of Computer Science >>>>>>> Dowling College, Kramer Science Center, KSC 121 >>>>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>>>> >>>>>>> +1-631-244-3035 >>>>>>> yaya@dowling.edu >>>>>>> ===================================================== >>>>>>> >>>>>>> _______________________________________________ >>>>>>> cif2-encoding mailing list >>>>>>> cif2-encoding@iucr.org >>>>>>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> T +61 (02) 9717 9907 >>>>>> F +61 (02) 9717 3145 >>>>>> M +61 (04) 0249 4148 >>>>>> _______________________________________________ >>>>>> cif2-encoding mailing list >>>>>> cif2-encoding@iucr.org >>>>>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding >>>>> >>>>> _______________________________________________ >>>>> cif2-encoding mailing list >>>>> cif2-encoding@iucr.org >>>>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> T +61 (02) 9717 9907 >>>> F +61 (02) 9717 3145 >>>> M +61 (04) 0249 4148 >>>> _______________________________________________ >>>> cif2-encoding mailing list >>>> cif2-encoding@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding >>> >>> _______________________________________________ >>> cif2-encoding mailing list >>> cif2-encoding@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding >>> >>> >> >> >> >> -- >> T +61 (02) 9717 9907 >> F +61 (02) 9717 3145 >> M +61 (04) 0249 4148 >> _______________________________________________ >> cif2-encoding mailing list >> cif2-encoding@iucr.org >> http://scripts.iucr.org/mailman/listinfo/cif2-encoding > > _______________________________________________ > cif2-encoding mailing list > cif2-encoding@iucr.org > http://scripts.iucr.org/mailman/listinfo/cif2-encoding > > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ cif2-encoding mailing list cif2-encoding@iucr.org http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- Follow-Ups:
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- References:
- Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line . .. .. .. .. .. .. .. .. .. .. .. .. .. . (Bollinger, John C)
- Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (James Hester)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (James Hester)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (James Hester)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Prev by Date: Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
- Next by Date: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- Prev by thread: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- Next by thread: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- Index(es):