[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] UTF-8 BOM

On Mon, Jun 14, 2010 at 03:09:56PM -0500, Bollinger, John C wrote:
>>                                             Of course the world does
>> contain CIFs created other than by fully-conformant CIF writers. To
>> an extent the community should decide for itself how best to attempt
>> to handle deviations from full conformance. It would help, perhaps, if
>> those of us writing CIF readers would document specific practices that
>> the software takes to accommodate such deviations. Ideally, such
>> software should have a verbose logging mode that can be activated
>> whenever surprising behaviour in reading CIFs is encountered by
>> the user.
> I think it's exceedingly optimistic to expect "the community" to arrive
> at and abide by a single, consistent set of best practices.  The best
> you can hope for is that a small number of organizations and / or
> programs will exert enough influence to establish their own de facto
> standards.

I'm an optimist :-)
> We can exert some influence there, however.  Either the CIF spec or
> a companion spec could establish conformance requirements for CIF
> *processors*, including, for example, the ability to diagnose
> particular malformations.  The XML spec does this, as do some
> programming language specs.
> Such a document could also establish, perhaps, that CIF processors
> must be able to accept the UTF-8 encoding, and maybe even that they
> must assume UTF-8 by default.  That would establish the baseline and
> a guaranteed interoperability mode that we would otherwise lose by
> pushing character encoding outside the format specification.

Probably this is the route that I would prefer. Make the formal CIF spec
as clean as possible, even if it appears somewhat harsh, but sanction
particular processing protocols to accommodate well-defined and somewhat
frequent edge cases. We've grappled with this sort of thing before, in
the context of coercion rules for robust lexer/parsers. Again my
preference would be for the CIF spec to be strict, but the coercion
rules to be documented as a basis for building processing hardware
capable of handling certain well-characterised deviations from the
strict specification.

Having said that, I am not in favour of unpicking what we have already
effectively agreed by consensus. I'll be very happy to respond to James's
forthcoming call for a vote on the BOM issue and help if I can with
integrating the recent small refinements to the final draft specification.
It's more important to have a fixed spec that we can work with, than
to spend forever striving for a perfect solution.

ddlm-group mailing list

Reply to: [list | sender only]