Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] How we wrap this up

OK, final comments before this is wrapped up (hopefully):

1. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII' recently posted here and to COMCIFS.
2. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII', together with Brian's *recommendations*
3. UTF8-only as in the original draft
4. UTF8 + UTF16
5. UTF8, UTF16 + "local"

These can be broken down to:

'any encoding' (1, 2, and 5)

'specified encoding' (3 and 4)

Note I put 5 in the 'any encoding' category as I think 'local' could be interpretted as any encoding.

The 'any encoding' approach is to me unsatisfactory when considering that CIF is a data-exchange format
and should be specified in terms that allow the consumer to know exactly what to expect (i.e. no uncertainty in encoding).

'Specified encodings' can be seen as restrictive, especially if there is only one. A list of specified encodings could be
seen as inflexible and perhaps arbitrary (e.g. why isnt UTF32 on the list...).
If encoding is to be specified, it could be in terms of UTF8 + any Unicode encoding that is inherently identifiable
(which in reality boils down to the UTF family).

In either case, a degree of work will be required to accommodate user practice and the legacy of CIF1.
If the 'any encoding' approach is taken, I believe there should be a wealth of supporting material for
both users and developers to encourage the use of a default encoding (i.e. UTF8). Hence my recent support
for something along the lines of (2) above. This approach avoids mandating some of the less-satisfactory
schemes we have been discussing (e.g. declaration of encoding), but at least makes them available
to conscientious developers.

Equally, if CIF2 adopts 'specified encodings', there should be a wealth of supporting material for
both users and developers to enable transcoding.

The pedant in me would like to see 'specified encodings' (preferably UTF8 default + any inherently identifiable Unicode encoding),
but if the 'any encoding' approach is to be taken, I think it has to be described as Herbert proposes, with any schemes
for identifying the encoding left out of the 'specification' (let the specification reflect the uncertainty that is the encoding of a CIF :-)

Cheers

Simon





From: James Hester <jamesrhester@gmail.com>
To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@iucr.org>
Sent: Thursday, 23 September, 2010 1:37:48
Subject: [Cif2-encoding] How we wrap this up

Dear CIF2 encoding participants,

As Herbert has indicated, we are starting to run out of time for
resolution of the encoding issue.  I believe that we have now explored
the various proposals sufficiently to all have a good understanding of
the consequences and advantages of each approach.  So, after a round
of final comments, I propose that we vote on the general scheme that
we recommend.  We can then flesh out the details of the particular
scheme that we have settled on, and take this completed proposal to
the DDLm group for their approval, following which we will present the
entire CIF2 syntax document to COMCIFS for a formal vote.

The proposals that I believe are still on the table are:

1. Herbert's 'as for CIF1 proposal' recently posted here and to COMCIFS.
2. Herbert's 'as for CIF1 proposal', together with Brian's proposal
(if you agree that they are compatible)
2. UTF8-only as in the original draft
3. UTF8 + UTF16
4. UTF8, UTF16 + "local"

I have not included the hashcode proposal as I believe it no longer
has any supporters.

We would need to conduct a preferential vote.  I stress that this is
purely to determine the recommendation of this working group, and is
not in any way binding on COMCIFS.

James.
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.