[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. .. .

Dear Colleagues,

   I guess I do not understand the role of COMCIFS.  It appears that some 
of us think that COMCIFS has the power to control what people do.  I 
disagree.  We don't.  We cannot _require_ anything.  We can make 
reasonable suggestions, and, if they are indeed reasonable, many people 
will follow those suggestions. If, as in the case of requiring people not 
to use ordinary text editors to edit a CIF, what is being suggested is an 
inconvenient nuisance, we will simply be ignored, and we will have a large 
supply of non-compliant, unidentified pseudo-CIF2 files.

>>  Let me propose what I think would be a reasonable resolution:
>>
>> 1.  We come to a final resolution on what _information_ is in CIF2,
>> independent of the representation used.  I think we have that in hand.
>>
>> 2.  We present one UTF-8 based _representation_ of that information
>> for two essential purposes:
>>  2.1.  To have a concrete way in which to present examples of CIF2; and
>>  2.2.  To have a default assumed representation in which a CIF2 the
>> representation of which is not otherwise identified is most likely
>> to have been presented.
>>
>> 3.  That we suggest some reasonable mechanisms for helping software
>> developers and users to determine which of the very large number of
>> possible reprentations has been used for a given file, including,
>> but not limited to:
>>  BOM
>>  Magic number
>>  Extended idenfifying comments
>>  Encoding tags in the file itself with sufficient detail to allow developers
>> to get started, but
>> with a final decision deferred on everything other than the BOM
>> to allow for broad-based community discussion of what is clearly
>> a contentious issue.  The BOM is going to be in _any_ final list
>> because is is well-supported by several existing text
>> editors, and keeps getting forced into files without any user
>> control.

is about as far as we can go and have any hope of any degree of 
compliance.

To be blunt and specific -- if we _mandate_ a checksum in the file, we 
will just end up with lots of files with checksums that don't agree.  Our 
only hope is to recommend a checksum, and, most importantly, provide 
widely supported software to calculate it both for automatic insertions 
and for validation.

as for

> Herbert: you have not reacted to a suggestion that we simply reserve
> the first line of a CIF2 file for future expansion, and state that non
> UTF8 encodings for CIF2 would be considered by COMCIFS as the need
> arose.

this brings us back to being unreasonably rigid and fussy and certain to 
be ignored.  I cannot fgure out what "reserve the first line of a CIF2 
file" means in practice, and "non UTF8 encodings ... be considered by 
COMCIFS" has no practical meaning for what a poor user or software 
developer in, say, a code-page or UCS2 environment is supposed to do now.

At this rate, CIF2 will _never_ be adopted (and first we have to propose 
something), and that is a great shame.  I believe that what I suggested 
above is as far as we can go if we want to stop debating how many angels 
can dance on the head of a pin and get on with having a CIF2 for real 
people to use with real data and real software.

Regards,
   Herbert
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Mon, 13 Sep 2010, James Hester wrote:

> Your step 3 is what we are discussing here.  It is not sufficient to
> simply "suggest" reasonable mechanisms, as this leaves developers
> bewildered as to which of these potentially vague suggestions they can
> and should support, leading to confusion and inability of programs to
> communicate with one another.  Far better to *specify* mechanisms,
> which is what we are groping towards doing here.
>
> Herbert: you have not reacted to a suggestion that we simply reserve
> the first line of a CIF2 file for future expansion, and state that non
> UTF8 encodings for CIF2 would be considered by COMCIFS as the need
> arose.
>
> On Sun, Sep 12, 2010 at 12:33 AM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>> Dear Colleagues,
>>
>>  Let me propose what I think would be a reasonable resolution:
>>
>> 1.  We come to a final resolution on what _information_ is in CIF2,
>> independent of the representation used.  I think we have that in hand.
>>
>> 2.  We present one UTF-8 based _representation_ of that information
>> for two essential purposes:
>>  2.1.  To have a concrete way in which to present examples of CIF2; and
>>  2.2.  To have a default assumed representation in which a CIF2 the
>> representation of which is not otherwise identified is most likely
>> to have been presented.
>>
>> 3.  That we suggest some reasonable mechanisms for helping software
>> developers and users to determine which of the very large number of
>> possible reprentations has been used for a given file, including,
>> but not limited to:
>>  BOM
>>  Magic number
>>  Extended idenfifying comments
>>  Encoding tags in the file itself with sufficient detail to allow developers
>> to get started, but
>> with a final decision deferred on everything other than the BOM
>> to allow for broad-based community discussion of what is clearly
>> a contentious issue.  The BOM is going to be in _any_ final list
>> because is is well-supported by several existing text
>> editors, and keeps getting forced into files without any user
>> control.
>>
>> Regards,
>>  Herbert
>>
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>   Dowling College, Kramer Science Center, KSC 121
>>        Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                 +1-631-244-3035
>>                 yaya@dowling.edu
>> =====================================================
>>
>> On Sat, 11 Sep 2010, SIMON WESTRIP wrote:
>>
>>> Dear all
>>>
>>> I have found recent exchanges, especially Herbert's contributions
>>> regarding
>>> the real-world use of imgCIF, very
>>> enlightening. Primarily for reasons of flexibility, I now find myself
>>> inclined to support a CIF specification
>>> that allows a variety of encodings, provided that such are "clearly and
>>> unambiguously defined".
>>>
>>> To me, the clear and unambiguous definition should encompass a clear and
>>> unambiguous *declaration*
>>>  of the encoding; in the absence of such a declaration in the CIF or in
>>> its
>>> container, a default encoding
>>> should be assummed, either the default CIF encoding (which I think most
>>> agree should be UTF8) or inherited
>>> from the container?
>>>
>>> Though CIF1 has been successful without such a declaration (largely
>>> because
>>> of the ASCII restriction),
>>> I beleive it is essential in the case of CIF2.
>>>
>>> Cheers
>>>
>>> Simon
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ____________________________________________________________________________
>>> From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
>>> To: Group for discussing encoding and content validation schemes for CIF2
>>> <cif2-encoding@iucr.org>
>>> Sent: Friday, 10 September, 2010 19:24:05
>>> Subject: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. ..
>>> .
>>>
>>>
>>> On Friday, September 10, 2010 11:02 AM, Herbert J. Bernstein wrote:
>>>> As I have said before, we went through this approach
>>>> in 1997 and ended up going the other way -- treating the
>>>> text-based CIF and the binary CBF as parts of the _same_
>>>> format, not two different formats, not one being a serialization
>>>> of the other, but the same format.  This may seem like a
>>>> minor distinction, but it actually has strong implications
>>>> for software design and implementation, ensuring that
>>>> binaries in a CIF context are just a particular type of data
>>>> handled with all the same mecnahisms as ASCII data, allowing,
>>>> for example, multiple diffraction images and thumbnails in
>>>> one file in an order-independent way.
>>>>
>>>> You may be interested to know that the false dichotomy between
>>>> binary and text-based representations is not starting
>>>> to imapct HDF5, requiring some significant effort to now
>>>> work in database access, an aspect CIF1 supports -- why
>>>> throw it away for CIF2?
>>>
>>> Herb,
>>>
>>> Perhaps you're reading more into my comments than I intended to put
>>> there.
>>> In particular, I did not aim to suggest one on-disk/wire format should be
>>> a
>>> serialization of another, but rather that *all* on-disk/wire formats be
>>> characterized in terms of serialization of the Unicode character sequences
>>> described by most of the spec.  I meant "text" in that sense -- a sequence
>>> of Unicode characters -- not in the sense of a sequence of bytes
>>> conforming
>>> to some particular set of local conventions for text.  I meant
>>> "serialization" in the general sense of any reversible transformation of
>>> CIF
>>> text into a byte sequence, including those that rely on interpreting the
>>> CIF
>>> syntax.  That's aimed primarily at recognizing the use case in which CIF2
>>> is
>>> embedded in or transformed into some other format, such as XML.
>>>
>>> I postulate, but do not specify, a serialization form defining the CIF2
>>> version of what we have conventionally called "a CIF."  The details of
>>> that
>>> form are exactly what this list was established to discuss, and I did not
>>> intend to imply a particular resolution of our ongoing debate.  It was
>>> perhaps a mistake to include imgCIF/CBF on the list of possible
>>> alternative
>>> serialization forms, as it is far from settled whether it will fit under
>>> the
>>> umbrella of the 'CIF File' serialization form.  I apologize if that caused
>>> confusion.
>>>
>>> [... I wrote:]
>>>>> I think this matter would be best addressed by explicitly adopting an
>>> idea that we have discussed before: a formal separation between the
>>> definition of CIF text (i.e. James's "CIF2-conformant character stream")
>>> and
>>> the particular kind of packaging that we are accustomed to calling "a CIF"
>>> or "a CIF file".  James's suggestion implies such a separation anyway, so
>>> let's not do it halfway.  Given such a separation, the explanatory comment
>>> could be as simple as:
>>>>>
>>>>> "This specification's definition of the 'CIF File' serialization form
>>>>> for
>>> CIF2 text is not intended to preclude definition or use of other
>>> serialization forms, such as HDF5-based forms, XML-based forms, or
>>> imgCIF/CBF."
>>>>>
>>>>> I choose the term "serialization form" because it puts primary emphasis
>>> on the CIF text (which after all is the subject of the bulk of the
>>> specification).  Every correct serialization of CIF text is, by
>>> definition,
>>> transformable into CIF text form.
>>>
>>>
>>> Regards,
>>>
>>> John
>>> --
>>> John C. Bollinger, Ph.D.
>>> Department of Structural Biology
>>> St. Jude Children's Research Hospital
>>>
>>>
>>>
>>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>>
>>> _______________________________________________
>>> cif2-encoding mailing list
>>> cif2-encoding@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>>
>>
>> _______________________________________________
>> cif2-encoding mailing list
>> cif2-encoding@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>
>>
>
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding

Reply to: [list | sender only]