[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- From: "Herbert J. Bernstein" <yaya@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Tue, 24 Aug 2010 09:31:51 -0400 (EDT)
- In-Reply-To: <[email protected]>
- References: <[email protected]><[email protected]><[email protected]><8F77913624F7524AACD2A92EAF3BFA54166122952D@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><8F77913624F7524AACD2A92EAF3BFA541661229533@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><8F77913624F7524AACD2A92EAF3BFA541661229542@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><8F77913624F7524AACD2A92EAF3BFA541661229552@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><8F77913624F7524AACD2A92EAF3BFA5416659DED8C@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><[email protected]><[email protected]>
Dear James,
I have not been at all reticent -- imgCIF will be very poorly supported
by CIF2 as currently proposed. Of necessity, imgCIF changes encodings
internally -- that it why it uses MIME -- same problem as email with
images, same solution.
Any purely text version has at least a 7% overhead as compared to
pure binary. Restricting to UTF-8 increases the overhead to at least 50%.
We may get away with the 7% (UTF-16). The 50% version (UTF-8) will be
ignored by the community as unworkable. The most likely to be used
version will be the current DDL2-based version with embedded
compressed binaries that I am augmenting with DDLm-like features
and merging in with HDF5.
As I noted many months ago, the unfortunate reality is that the
current CIF2 effort will not merge well with imgCIF. If avoiding
a split is a important -- we need a meeting. I would suggest
involving Bob Sweet and holding it at BNL in conjunction with
something relevant to NSLS-II.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[email protected]
=====================================================
On Tue, 24 Aug 2010, James Hester wrote:
> Hi Herbert: regarding imgCIF, I agree that splitting it off is not a
> desirable outcome. I would like to get an idea of how well imgCIF can
> be accommodated under the various encoding proposals currently
> floating around, as you have been rather reticent to bring it up. My
> naive take on things is that a UTF8-only encoding scheme for CIF2
> would not pose significant issues for imgCIF, and a decorated UTF16
> encoding in the style of Scheme B would be even better, and quite
> adequate, so imgCIF is not actually presenting any problems and so was
> a red herring.
>
> I'm not sure that face-to-face or Skype discussions are necessarily
> going to be more productive. Writing things down, while slower,
> allows me at least to collect my thoughts and those of other
> participants, and hopefully make a reasoned contribution (my apologies
> if I am too long-winded) and as an added bonus those thoughts are
> recorded for later reference. For example, where would I now find the
> background on why a container format for imgCIF is such a bad idea?
> Presumably that was all thrashed out in face to face discussions, and
> no record now remains.
>
> On Tue, Aug 24, 2010 at 8:56 PM, Herbert J. Bernstein
> <[email protected]> wrote:
>> Dear Colleagues,
>>
>> � James' and John's last interchange is so voluminous, I doubt any of
>> us has been able to fully appreciate the rich complexity of ideas
>> contained therein. �For example, one of the suggestions far down in
>> the text is:
>>
>> (James now) �Indeed. �My intent with this specification was to ensure
>> that third parties would be able to recover the encoding. If imgCIF is
>> going to cause us to make such an open-ended specification, it is
>> probably a sign that imgCIF needs to be addressed separately. �For
>> example, should we think about redefining it as a container format,
>> with a CIF header and UTF16 body (but still part of the
>> "Crystallographic Information Framework")?
>>
>> The idea of an imgCIF "header" in CIF format and a image in another is an
>> old, well-established, thoroughly discussed, and mistaken idea, rejected
>> in 1998. �The handling of multiple images in a single file (e.g.
>> a jpeg thumbnail and crystal image and a full-size diffraction image)
>> requires the ability to switch among encodings within the file --
>> something handled by the current DDL2 and MIME-based imgCIF format and
>> which would be a serious problem in CIF2 has currently proposed,
>> increasing the chances that we will have to move imgCIF entirely into
>> HDF5 and abandon the CIF representation entirely, sharing only
>> the dictionary and not the framework.
>>
>> If you look carefully, you will see a similar trend with mmCIF, in which
>> and XML representation sharing the dictionary plays a much more
>> important role than the CIF format.
>>
>> Is it really desirable to make the new CIF format so rigid and
>> unadaptable that major portions of macromolecular crysallography
>> end up migrating to very different formats, as they already are
>> doing? �Yes, there is great value in having a common dictionary,
>> but would there not be additional value in having a sufficiently
>> flexible common format to allow for more software sharing than
>> we now have? �It is really desirable for us to continue in the
>> direction of a single macromolecular experiment having to
>> deal with HDF5 and CIF/DDL2/MIME representations of the image data
>> during collection, CCP4-style CIF representations during processing
>> and deposition and legacy PDB and PDBML representations in subsequent
>> community use? �If we could be a little bit more flexible, we might be
>> able to reduce the data interchange software burdens a little.
>> Right now, this discussion seems headed in the direction of simply
>> adding yet another data representation (DDLm/CIF2) to the mix,
>> increasing the chances of mistranslation and confusion, rather
>> that reducing them.
>>
>> Please, step back a bit from the detailed discussion of UTF8 and
>> look at the work-flow of doing and publishing crystallographic
>> experiments and let us try to make a contribution that simplifies
>> it, not one that makes it more complex than it needs to be.
>>
>> I suggest we need to meet and talk, either face-to-face, or by skype.
>>
>> Regards,
>> � Herbert
>>
>> =====================================================
>> �Herbert J. Bernstein, Professor of Computer Science
>> � �Dowling College, Kramer Science Center, KSC 121
>> � � � � Idle Hour Blvd, Oakdale, NY, 11769
>>
>> � � � � � � � � �+1-631-244-3035
>> � � � � � � � � �[email protected]
>> =====================================================
>>
>> _______________________________________________
>> cif2-encoding mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> cif2-encoding mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>
_______________________________________________ cif2-encoding mailing list [email protected] http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- Follow-Ups:
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (James Hester)
- References:
- Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line . .. .. .. .. .. .. .. .. .. .. .. .. .. . (Bollinger, John C)
- Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (James Hester)
- Prev by Date: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- Next by Date: Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. .
- Prev by thread: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- Next by thread: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics
- Index(es):

