Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] UTF-8 BOM

On Mon, Jun 14, 2010 at 03:09:56PM -0500, Bollinger, John C wrote:
>>                                             Of course the world does
>> contain CIFs created other than by fully-conformant CIF writers. To
>> an extent the community should decide for itself how best to attempt
>> to handle deviations from full conformance. It would help, perhaps, if
>> those of us writing CIF readers would document specific practices that
>> the software takes to accommodate such deviations. Ideally, such
>> software should have a verbose logging mode that can be activated
>> whenever surprising behaviour in reading CIFs is encountered by
>> the user.
> I think it's exceedingly optimistic to expect "the community" to arrive
> at and abide by a single, consistent set of best practices.  The best
> you can hope for is that a small number of organizations and / or
> programs will exert enough influence to establish their own de facto
> standards.

I'm an optimist :-)
> We can exert some influence there, however.  Either the CIF spec or
> a companion spec could establish conformance requirements for CIF
> *processors*, including, for example, the ability to diagnose
> particular malformations.  The XML spec does this, as do some
> programming language specs.
> Such a document could also establish, perhaps, that CIF processors
> must be able to accept the UTF-8 encoding, and maybe even that they
> must assume UTF-8 by default.  That would establish the baseline and
> a guaranteed interoperability mode that we would otherwise lose by
> pushing character encoding outside the format specification.

Probably this is the route that I would prefer. Make the formal CIF spec
as clean as possible, even if it appears somewhat harsh, but sanction
particular processing protocols to accommodate well-defined and somewhat
frequent edge cases. We've grappled with this sort of thing before, in
the context of coercion rules for robust lexer/parsers. Again my
preference would be for the CIF spec to be strict, but the coercion
rules to be documented as a basis for building processing hardware
capable of handling certain well-characterised deviations from the
strict specification.

Having said that, I am not in favour of unpicking what we have already
effectively agreed by consensus. I'll be very happy to respond to James's
forthcoming call for a vote on the BOM issue and help if I can with
integrating the recent small refinements to the final draft specification.
It's more important to have a fixed spec that we can work with, than
to spend forever striving for a perfect solution.

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.