Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. .

I find John's approach in terms of 'serialisation forms' reasonable
and acceptable insofar as it takes care of those situations where CIF
content is contained within something else.  This formulation, if
adopted, would solve the particular real-life use case that Herbert
put forward, of embedding an imgCIF in an XML file.  Herbert: do you
have any other real-life use cases of imgCIF that this solution does
not address?

I would note that adoption of the 'serialisation form' approach would
also immediately provide a somewhat hackish workaround for those
requiring non UTF8 encoding: simply embed the CIF material in an XML
file and use XML encoding switches.  Perhaps these sort of hacks are
what Herbert means by "it will be easier to handle CIF2 in almost any
external container".  John's formulation is also less restrictive than
my original proposal in that it does not require the container to be
able to handle encoding, but I'm prepared to leave that part out of
the spec and simply add a note to remind readers that they should
consider this issue.

To my mind, the encoding of plain CIF files remains an open issue.  I
do not view the mechanisms for managing file encoding that are
provided by current OSs to be sufficiently robust, widespread or
consistent that we can rely on developers or text editors respecting
them, so we require something like Scheme B for all files (not only at
the point of transfer to another OS).

On Sat, Sep 11, 2010 at 12:47 AM, Bollinger, John C
<John.Bollinger@stjude.org> wrote:
> Hello All,
> On Friday, September 10, 2010 6:25 AM, Herbert J. Bernstein wrote:
> [James Hester wrote:]
>>> "Note that a CIF2-conformant character stream that forms part of a
>>> larger stream is not constrained to be in UTF8 encoding if the
>>> encoding of the CIF2 stream is specified in a standards-conformant
>>> manner within the enclosing stream.  For example, CIF2 content within
>>> an XML file is not constrained to be UTF8-encoded as standard XML
>>> attributes can be used to manage encoding."
>>is almost reasoanble, but basically says that it will be easier to handle CIF2 is almost any external >container, rather than as itself.
>>I would suggest saying.
>>The description of a conformant CIF2 in terms of a UTF8 encoding is intended to provide clarity in the >description of a CIF2, not to prevent use of CIF2 in terms of other encodings, such as UCS-2 unicode  or code->page-based encodings needed for editors in particular system, nor to prevent used of transformed CIF2 in other >containers such as HDF5 and XML or imgCIF/CBF, as long as the decodings/encoding or other transformations that >would be necessary to go to and from a UTF8 CIF2 representation are clearly and unambiguously defined.
> I think this matter would be best addressed by explicitly adopting an idea that we have discussed before: a formal separation between the definition of CIF text (i.e. James's "CIF2-conformant character stream") and the particular kind of packaging that we are accustomed to calling "a CIF" or "a CIF file".  James's suggestion implies such a separation anyway, so let's not do it halfway.  Given such a separation, the explanatory comment could be as simple as:
> "This specification's definition of the 'CIF File' serialization form for CIF2 text is not intended to preclude definition or use of other serialization forms, such as HDF5-based forms, XML-based forms, or imgCIF/CBF."
> I choose the term "serialization form" because it puts primary emphasis on the CIF text (which after all is the subject of the bulk of the specification).  Every correct serialization of CIF text is, by definition, transformable into CIF text form.
> There remains a minor question of which CIF details are considered part of the serialization form, and which are an integral to the CIF text.  Character encoding and initial BOM (however we feel about that) are surely part of the serialization form.  Including end-of-line conventions as a serialization detail might also be convenient.  The biggest question for me is how to categorize the CIF version comment.  I am inclined to make it part of the serialization form, but I can see arguments both ways.
> Regards,
> John
> Email Disclaimer:  www.stjude.org/emaildisclaimer
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
cif2-encoding mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.