Hello All,

On Friday, September 10, 2010 6:25 AM, Herbert J. Bernstein wrote:
[James Hester wrote:]
>> "Note that a CIF2-conformant character stream that forms part of a
>> larger stream is not constrained to be in UTF8 encoding if the
>> encoding of the CIF2 stream is specified in a standards-conformant
>> manner within the enclosing stream.  For example, CIF2 content within
>> an XML file is not constrained to be UTF8-encoded as standard XML
>> attributes can be used to manage encoding."
>is almost reasoanble, but basically says that it will be easier to handle CIF2 is almost any external >container, rather than as itself.
>I would suggest saying.
>The description of a conformant CIF2 in terms of a UTF8 encoding is intended to provide clarity in the >description of a CIF2, not to prevent use of CIF2 in terms of other encodings, such as UCS-2 unicode  or code->page-based encodings needed for editors in particular system, nor to prevent used of transformed CIF2 in other >containers such as HDF5 and XML or imgCIF/CBF, as long as the decodings/encoding or other transformations that >would be necessary to go to and from a UTF8 CIF2 representation are clearly and unambiguously defined.

I think this matter would be best addressed by explicitly adopting an idea that we have discussed before: a formal separation between the definition of CIF text (i.e. James's "CIF2-conformant character stream") and the particular kind of packaging that we are accustomed to calling "a CIF" or "a CIF file".  James's suggestion implies such a separation anyway, so let's not do it halfway.  Given such a separation, the explanatory comment could be as simple as:

"This specification's definition of the 'CIF File' serialization form for CIF2 text is not intended to preclude definition or use of other serialization forms, such as HDF5-based forms, XML-based forms, or imgCIF/CBF."

I choose the term "serialization form" because it puts primary emphasis on the CIF text (which after all is the subject of the bulk of the specification).  Every correct serialization of CIF text is, by definition, transformable into CIF text form.

There remains a minor question of which CIF details are considered part of the serialization form, and which are an integral to the CIF text.  Character encoding and initial BOM (however we feel about that) are surely part of the serialization form.  Including end-of-line conventions as a serialization detail might also be convenient.  The biggest question for me is how to categorize the CIF version comment.  I am inclined to make it part of the serialization form, but I can see arguments both ways.



