[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics

Dear James,

   I have not been at all reticent -- imgCIF will be very poorly supported
by CIF2 as currently proposed.  Of necessity, imgCIF changes encodings
internally -- that it why it uses MIME -- same problem as email with
images, same solution.

   Any purely text version has at least a 7% overhead as compared to
pure binary.  Restricting to UTF-8 increases the overhead to at least 50%.
We may get away with the 7% (UTF-16).  The 50% version (UTF-8) will be 
ignored by the community as unworkable.  The most likely to be used 
version will be the current DDL2-based version with embedded 
compressed binaries that I am augmenting with DDLm-like features
and merging in with HDF5.

   As I noted many months ago, the unfortunate reality is that the
current CIF2 effort will not merge well with imgCIF.  If avoiding
a split is a important -- we need a meeting.  I would suggest
involving Bob Sweet and holding it at BNL in conjunction with
something relevant to NSLS-II.


  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769


On Tue, 24 Aug 2010, James Hester wrote:

> Hi Herbert: regarding imgCIF,  I agree that splitting it off is not a
> desirable outcome.  I would like to get an idea of how well imgCIF can
> be accommodated under the various encoding proposals currently
> floating around, as you have been rather reticent to bring it up.  My
> naive take on things is that a UTF8-only encoding scheme for CIF2
> would not pose significant issues for imgCIF, and a decorated UTF16
> encoding in the style of Scheme B would be even better, and quite
> adequate, so imgCIF is not actually presenting any problems and so was
> a red herring.
> I'm not sure that face-to-face or Skype discussions are necessarily
> going to be more productive.  Writing things down, while slower,
> allows me at least to collect my thoughts and those of other
> participants, and hopefully make a reasoned contribution (my apologies
> if I am too long-winded) and as an added bonus those thoughts are
> recorded for later reference.  For example, where would I now find the
> background on why a container format for imgCIF is such a bad idea?
> Presumably that was all thrashed out in face to face discussions, and
> no record now remains.
> On Tue, Aug 24, 2010 at 8:56 PM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>> Dear Colleagues,
>>   James' and John's last interchange is so voluminous, I doubt any of
>> us has been able to fully appreciate the rich complexity of ideas
>> contained therein.  For example, one of the suggestions far down in
>> the text is:
>> (James now)  Indeed.  My intent with this specification was to ensure
>> that third parties would be able to recover the encoding. If imgCIF is
>> going to cause us to make such an open-ended specification, it is
>> probably a sign that imgCIF needs to be addressed separately.  For
>> example, should we think about redefining it as a container format,
>> with a CIF header and UTF16 body (but still part of the
>> "Crystallographic Information Framework")?
>> The idea of an imgCIF "header" in CIF format and a image in another is an
>> old, well-established, thoroughly discussed, and mistaken idea, rejected
>> in 1998.  The handling of multiple images in a single file (e.g.
>> a jpeg thumbnail and crystal image and a full-size diffraction image)
>> requires the ability to switch among encodings within the file --
>> something handled by the current DDL2 and MIME-based imgCIF format and
>> which would be a serious problem in CIF2 has currently proposed,
>> increasing the chances that we will have to move imgCIF entirely into
>> HDF5 and abandon the CIF representation entirely, sharing only
>> the dictionary and not the framework.
>> If you look carefully, you will see a similar trend with mmCIF, in which
>> and XML representation sharing the dictionary plays a much more
>> important role than the CIF format.
>> Is it really desirable to make the new CIF format so rigid and
>> unadaptable that major portions of macromolecular crysallography
>> end up migrating to very different formats, as they already are
>> doing?  Yes, there is great value in having a common dictionary,
>> but would there not be additional value in having a sufficiently
>> flexible common format to allow for more software sharing than
>> we now have?  It is really desirable for us to continue in the
>> direction of a single macromolecular experiment having to
>> deal with HDF5 and CIF/DDL2/MIME representations of the image data
>> during collection, CCP4-style CIF representations during processing
>> and deposition and legacy PDB and PDBML representations in subsequent
>> community use?  If we could be a little bit more flexible, we might be
>> able to reduce the data interchange software burdens a little.
>> Right now, this discussion seems headed in the direction of simply
>> adding yet another data representation (DDLm/CIF2) to the mix,
>> increasing the chances of mistranslation and confusion, rather
>> that reducing them.
>> Please, step back a bit from the detailed discussion of UTF8 and
>> look at the work-flow of doing and publishing crystallographic
>> experiments and let us try to make a contribution that simplifies
>> it, not one that makes it more complex than it needs to be.
>> I suggest we need to meet and talk, either face-to-face, or by skype.
>> Regards,
>>   Herbert
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>    Dowling College, Kramer Science Center, KSC 121
>>         Idle Hour Blvd, Oakdale, NY, 11769
>>                  +1-631-244-3035
>>                  yaya@dowling.edu
>> =====================================================
>> _______________________________________________
>> cif2-encoding mailing list
>> cif2-encoding@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
cif2-encoding mailing list

Reply to: [list | sender only]