[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
It seems the consensus is that UTF-8 is the encoding standard and everything else is a broken file. Not with standing that any application may parse it, attempt to determine the encoding and produce a corrected file. That is nice, but not a required part of the standard (which is the standard, to quote David, quoting me).
It seems to me Herb, the easiest thing you can do with regard to CBF (since it can’t be a real CIF) is to adopt everything from CIF-2 except that CBF must be encoded as UCS-2. That may give you a bigger footprint per ASCII character, but since these are a small number of data values in a CBF it may well be worth the price. That way at least the binary part is efficiently encoded.
If in the DDLm we manage to flag order dependency in the loop_ header (along with a flag to indicate row ordering is important in the loop_ - something John W wanted) then CBF will essentially be CIF-2 except that the encoding is UCS-2.
On 26/10/09 10:43 PM, "David Brown" <idbrown@mcmaster.ca> wrote:
cheers
Nick
--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering
The University of Western Australia t: +61 (0)8 6488 3452
35 Stirling Highway f: +61 (0)8 6488 1089
CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick
MBDP M002
CRICOS Provider Code: 00126G
e: Nick.Spadaccini@uwa.edu.au
Reply to: [list | sender only]
Re: [ddlm-group] [THREAD 4] UTF8
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] [THREAD 4] UTF8
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Wed, 28 Oct 2009 15:16:40 +0800
- Authentication-Results: postfix;
- In-Reply-To: <4AE5B593.3090404@mcmaster.ca>
It seems the consensus is that UTF-8 is the encoding standard and everything else is a broken file. Not with standing that any application may parse it, attempt to determine the encoding and produce a corrected file. That is nice, but not a required part of the standard (which is the standard, to quote David, quoting me).
It seems to me Herb, the easiest thing you can do with regard to CBF (since it can’t be a real CIF) is to adopt everything from CIF-2 except that CBF must be encoded as UCS-2. That may give you a bigger footprint per ASCII character, but since these are a small number of data values in a CBF it may well be worth the price. That way at least the binary part is efficiently encoded.
If in the DDLm we manage to flag order dependency in the loop_ header (along with a flag to indicate row ordering is important in the loop_ - something John W wanted) then CBF will essentially be CIF-2 except that the encoding is UCS-2.
On 26/10/09 10:43 PM, "David Brown" <idbrown@mcmaster.ca> wrote:
James has asked for the views of those of us who have just been watching this discussion (it seems to go by faster than I can follow).
For what is is worth, my stance is strongly the same as James'. He has laid out all the argumemts very succinctly - I will just briefly reinforce those I think most importatn.
1. Permitting one or more other encodings (presumably these must constitute a well defined list as their names must be recognized) immediately invites people to use them. To quote Nick, if we have a standard it should be a standard. If people decide to bend the standard they do so at their own peril.
2. If other encodings are depricated (yes, that word again) in order to encourage people to use the real standard, they are likely to be unaware that their program has used a depricated encoding and therefore has failed to identify it. If the writer of the CIF knows that it is using a depricated standard it can make the conversion. Only if it does not know it is using a depricated standard and therefore neither converts nor identifies the encoding will the depricated encoding get through.
3. Providing a space (at this stage) for identifying encodings that may (or may not) later become part of the standard is unnecessary since it can be added if and when such other encodings are allowed. UTF-8 then becomes the default.
4. Having a single standard requires that the readers need only consider one encoding and the writers need only support a conversion from the native to the CIF2 standard. Allowing 5 or 10 other encodings makes life easier for the writer since it does not have to provide any conversion, but there is a price. Every reader must be able to read 5 or 10 different encodings because it is not allowed to reject any of the depricated standardss. With a single standard the IUCr may or may not decide that they will handle different encodings, but that is their choice. Making different encodings legal removes that choice from the reader: it has to handle all possibilities, which is only likely to discourage people writing local programs for occasional use.
David
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
cheers
Nick
--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering
The University of Western Australia t: +61 (0)8 6488 3452
35 Stirling Highway f: +61 (0)8 6488 1089
CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick
MBDP M002
CRICOS Provider Code: 00126G
e: Nick.Spadaccini@uwa.edu.au
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] [THREAD 4] UTF8 (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] [THREAD 4] UTF8 (David Brown)
- Prev by Date: Re: [ddlm-group] CIF header
- Next by Date: Re: [ddlm-group] Triple-quoted strings
- Prev by thread: Re: [ddlm-group] [THREAD 4] UTF8
- Next by thread: Re: [ddlm-group] [THREAD 4] UTF8
- Index(es):