Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Latest draft specification

Brian has now posted the document. You can find it at:

On Mon, Jul 5, 2010 at 2:19 PM, James Hester <jamesrhester@gmail.com> wrote:
I am happy to proceed as Brian suggests. As far actually preparing a draft goes, on May 7 John B kindly provided me with an editable version of the original draft specification with those suggested changes of his that were uncontroversial already included. I have updated that draft making the following specific changes:

1. Change the 2048-byte limit to a 2048-character limit
2. Incorporate XML-type newline handling
3. Refer to UTF-8 as the designated encoding for files conformant to the specification
4. State that U+FEFF is not part of the allowed character set (ie. would be everywhere a syntax error). I include this as the voting on this point, such as it was, gave a slight majority to option 2(a) over option 2(c)(ii).
5. Disallow Unicode non-characters. I have *not* dealt with the issue of disallowing non-printing characters. As the draft currently stands, non-printing characters are acceptable.

The updated draft is in Brian's hands, and I'm hoping he will post it to the IUCr website shortly for your comment.


On Fri, Jul 2, 2010 at 6:25 PM, Brian McMahon <bm@iucr.org> wrote:

Like Buridan's ass we are starving to death between the equally
enticing mound of hay that is UTF-8 and the smorgasbord of mixed
vegetables offered by multiple encodings.

I suggest that this group complete a *draft* CIF2 specification
that describes (if necessary) specific character allusions in
terms of a canonical UTF-8 encoding, and states that UTF-8 is the
designated encoding for files conformant to the specification.

Post the completed draft in the first instance to the cif-developers
list (since that is supposed to the the most relevant target audience),
but certainly to other lists at the same time if folk think that would
be productive. By all means accompany the release with a commentary on
the difficulties we have faced over the encoding issue; by all means
implement a survey and analyse the results to assess community demand
for an upward revision of the draft - but let us give people something
concrete to begin with, and challenge them actively to protest if the
proposal will impede their work.

Note that this proposal doesn't necessarily reflect a personal
preference for a single mandatory encoding - I still cannot
decide which I "prefer". But if the suggested draft is published,
I will not vote against it unless I suddenly see clearly a real
problem that it would throw up in the way of any applications I
would envisage writing. I would hope "the community" would respond
in similar vein, so that stated objections would both represent real
difficulties and help to define the environments giving rise to these
real difficulties.

Best wishes
ddlm-group mailing list

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.