Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics

Comments interpolated below.

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Fri, 3 Sep 2010, James Hester wrote:

> Thanks Herbert for providing the imgCIF perspective.
>
> I am unfortunately severely restricted in my ability to attend
> overseas meetings at present, for family and work reasons.  I am also
> keen to have our discussions written down and available for perusal by
> those that will come later.

How about an e-meeting?

>
> We need to discuss the relationship of imgCIF to CIF2 explicitly, if
> imgCIF is going to influence our decisionmaking.  Some questions for
> Herbert to answer for the record:
>
> 1. How widely used are non-CBF forms of imgCIF at present?  By "widely
> used" I mean both
>   (a) supported by software packages that allow one to do "useful
> work", most obviously to extract diffraction spots

I assume by "non-CBF" you mean the forms that do the binary sections
in something that is not pure binary -- all software that uses CBFlib
supports them automatically for reading.  For writing, most software
chooses one representation for writing, usually byte-offset or
packed binary, except when we have to debug -- then the ascii
forms, esp. the hexdump form are very useful.

>   (b) provided as an output format (even optionally) by beamlines or
> detector manufacturers

See above

> 2. What is the advantage of having "pure text" image files?  Why isn't
> a format like CBF more appropriate?

While I agree, when we deal with people who like XML e.g. the NeXus
form of imgCIF, then we have no choice -- no binary is allowed, so
UCS-2 becomes important.  Don't ask me to defend XML.  It is simply a
fact of life.

> 3. What is the problem with a scenario where "pure text" imgCIF
> remains in its current CIF1 form, and CIF2 advances are incorporated
> into the CIF sections of CBF?

I don't understand this question, nor the assumptions behind it.
>

> Herbert: your work merging a DDL2-based version with DDLm-like
> features in HDF5 format sounds interesting.  Are you planning to
> present a motivation and/or discussion of this work at some stage?

This is the subject of some grant applications, so not appropriate for
detailed open discussion in this forum at this time.  The motivations
are simple -- to satisfy the demands of several major facilities for
easy integration of crytallographic synchrotron images into HDF5-based 
data management systems while preserving access to metadata, and to extend 
HDF5 with relational meta-data access.  This second aspect is an 
increasingly critical need and will go forward in any case.  If we have
a meeting or e-meeting, I can explain better.

>
> On Tue, Aug 24, 2010 at 11:31 PM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>> Dear James,
>>
>>  I have not been at all reticent -- imgCIF will be very poorly supported
>> by CIF2 as currently proposed.  Of necessity, imgCIF changes encodings
>> internally -- that it why it uses MIME -- same problem as email with
>> images, same solution.
>>
>>  Any purely text version has at least a 7% overhead as compared to
>> pure binary.  Restricting to UTF-8 increases the overhead to at least 50%.
>> We may get away with the 7% (UTF-16).  The 50% version (UTF-8) will be
>> ignored by the community as unworkable.  The most likely to be used version
>> will be the current DDL2-based version with embedded compressed binaries
>> that I am augmenting with DDLm-like features
>> and merging in with HDF5.
>>
>>  As I noted many months ago, the unfortunate reality is that the
>> current CIF2 effort will not merge well with imgCIF.  If avoiding
>> a split is a important -- we need a meeting.  I would suggest
>> involving Bob Sweet and holding it at BNL in conjunction with
>> something relevant to NSLS-II.
>>
>>  Regards,
>>    Herbert
>>
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>   Dowling College, Kramer Science Center, KSC 121
>>        Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                 +1-631-244-3035
>>                 yaya@dowling.edu
>> =====================================================
>>
>> On Tue, 24 Aug 2010, James Hester wrote:
>>
>>> Hi Herbert: regarding imgCIF,  I agree that splitting it off is not a
>>> desirable outcome.  I would like to get an idea of how well imgCIF can
>>> be accommodated under the various encoding proposals currently
>>> floating around, as you have been rather reticent to bring it up.  My
>>> naive take on things is that a UTF8-only encoding scheme for CIF2
>>> would not pose significant issues for imgCIF, and a decorated UTF16
>>> encoding in the style of Scheme B would be even better, and quite
>>> adequate, so imgCIF is not actually presenting any problems and so was
>>> a red herring.
>>>
>>> I'm not sure that face-to-face or Skype discussions are necessarily
>>> going to be more productive.  Writing things down, while slower,
>>> allows me at least to collect my thoughts and those of other
>>> participants, and hopefully make a reasoned contribution (my apologies
>>> if I am too long-winded) and as an added bonus those thoughts are
>>> recorded for later reference.  For example, where would I now find the
>>> background on why a container format for imgCIF is such a bad idea?
>>> Presumably that was all thrashed out in face to face discussions, and
>>> no record now remains.
>>>
>>> On Tue, Aug 24, 2010 at 8:56 PM, Herbert J. Bernstein
>>> <yaya@bernstein-plus-sons.com> wrote:
>>>>
>>>> Dear Colleagues,
>>>>
>>>>   James' and John's last interchange is so voluminous, I doubt any of
>>>> us has been able to fully appreciate the rich complexity of ideas
>>>> contained therein.  For example, one of the suggestions far down in
>>>> the text is:
>>>>
>>>> (James now)  Indeed.  My intent with this specification was to ensure
>>>> that third parties would be able to recover the encoding. If imgCIF is
>>>> going to cause us to make such an open-ended specification, it is
>>>> probably a sign that imgCIF needs to be addressed separately.  For
>>>> example, should we think about redefining it as a container format,
>>>> with a CIF header and UTF16 body (but still part of the
>>>> "Crystallographic Information Framework")?
>>>>
>>>> The idea of an imgCIF "header" in CIF format and a image in another is an
>>>> old, well-established, thoroughly discussed, and mistaken idea, rejected
>>>> in 1998.  The handling of multiple images in a single file (e.g.
>>>> a jpeg thumbnail and crystal image and a full-size diffraction image)
>>>> requires the ability to switch among encodings within the file --
>>>> something handled by the current DDL2 and MIME-based imgCIF format and
>>>> which would be a serious problem in CIF2 has currently proposed,
>>>> increasing the chances that we will have to move imgCIF entirely into
>>>> HDF5 and abandon the CIF representation entirely, sharing only
>>>> the dictionary and not the framework.
>>>>
>>>> If you look carefully, you will see a similar trend with mmCIF, in which
>>>> and XML representation sharing the dictionary plays a much more
>>>> important role than the CIF format.
>>>>
>>>> Is it really desirable to make the new CIF format so rigid and
>>>> unadaptable that major portions of macromolecular crysallography
>>>> end up migrating to very different formats, as they already are
>>>> doing?  Yes, there is great value in having a common dictionary,
>>>> but would there not be additional value in having a sufficiently
>>>> flexible common format to allow for more software sharing than
>>>> we now have?  It is really desirable for us to continue in the
>>>> direction of a single macromolecular experiment having to
>>>> deal with HDF5 and CIF/DDL2/MIME representations of the image data
>>>> during collection, CCP4-style CIF representations during processing
>>>> and deposition and legacy PDB and PDBML representations in subsequent
>>>> community use?  If we could be a little bit more flexible, we might be
>>>> able to reduce the data interchange software burdens a little.
>>>> Right now, this discussion seems headed in the direction of simply
>>>> adding yet another data representation (DDLm/CIF2) to the mix,
>>>> increasing the chances of mistranslation and confusion, rather
>>>> that reducing them.
>>>>
>>>> Please, step back a bit from the detailed discussion of UTF8 and
>>>> look at the work-flow of doing and publishing crystallographic
>>>> experiments and let us try to make a contribution that simplifies
>>>> it, not one that makes it more complex than it needs to be.
>>>>
>>>> I suggest we need to meet and talk, either face-to-face, or by skype.
>>>>
>>>> Regards,
>>>>   Herbert
>>>>
>>>> =====================================================
>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>
>>>>                  +1-631-244-3035
>>>>                  yaya@dowling.edu
>>>> =====================================================
>>>>
>>>> _______________________________________________
>>>> cif2-encoding mailing list
>>>> cif2-encoding@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>>>
>>>
>>>
>>>
>>> --
>>> T +61 (02) 9717 9907
>>> F +61 (02) 9717 3145
>>> M +61 (04) 0249 4148
>>> _______________________________________________
>>> cif2-encoding mailing list
>>> cif2-encoding@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>
>> _______________________________________________
>> cif2-encoding mailing list
>> cif2-encoding@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>
>>
>
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.