[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] What we have resolved so far

To: [email protected], Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] What we have resolved so far
From: James Hester <[email protected]>
Date: Thu, 19 Nov 2009 13:55:18 +1100
In-Reply-To: <C72AC749.124D2%[email protected]>
References: <[email protected]><C72AC749.124D2%[email protected]>

We should split the unresolved issues into separate threads otherwise
we're going to have a hell of a time tracking it all.  To those
reading this message: if I haven't managed to initiate the relevant
threads by the time you feel moved to reply, please do so yourself.

On Thu, Nov 19, 2009 at 12:58 PM, Nick Spadaccini <[email protected]> wrote:
> A quick response to JRH's email, to put things in context. My more detailed
> summary is still coming.
>
>
> On 19/11/09 8:22 AM, "James Hester" <[email protected]> wrote:
>
>> Nick's forthcoming email notwithstanding, here is a quick list of what
>> I think we have resolved and not resolved so far:
>>
>> RESOLVED:
>>
>> 1. �The new standard is called CIF2
>>
>> 2. �All files conforming to the new standard must have a header
>> containing something like the characters "#CIF2"
>>
>> 3. �Non-quote-delimited strings may not contain any
>> syntactically-significant characters (exact character set has been
>> specified by Nick, but before UTF-8 decision)
>>
>> 4. �Quote delimited strings may not contain instances of the
>> terminating character, regardless of following whitespace.
>
> Unless you do as in (5)
>
>> 5. �In a quote-delimited string, a reverse solidus escapes the
>> following character, if that character is otherwise syntactically
>> meaningful
>>
>> 6. �Files are UTF-8 encoded
>>
>> 7. �No tuples
>>
>> UNRESOLVED (with notes)
>>
>> 1. �Do we maintain the fixed line length restriction?
>> � � - I will post something to the relevant thread to provoke a resolution
>
> Currently at 2048 bytes. I will propose maintaining this in deference to
> legacy and future Fortran programmes.
>
>> 2. �Is an escaping reverse solidus part of the datavalue?
>> � � - This conversation didn't appear to resolve itself
>
> I will propose yes, that it is left to a downstream application. This is
> actually consistent with how Python works. My email timestamped
>
> Mon, 09 Nov 2009 10:35:41 +0800
>
> Explained my reasoning.
>
>>
>> 3. �Are square brackets permitted in datanames? (getting close to resolution)
>
> I will propose a character set restricted with only _ and . as allowed
> punctuation characters. All data names can be identifiers in dREL, and even
> those we assume won't be in dREL can be because someone writing a completely
> different dictionary can import our definitions and then add our data names
> to their dREL scripts.
>
> To simplify this issue I suggest avoiding the problem. Legacy CIF1 names
> will be aliased in CIF dictionaries so that when we read a CIF1 data name in
> a CIF1 file we can immediately map it to its CIF2 name (this avoids the need
> to remediate all existing CIF1 files).
>
>> 4. �Does STAR also adopt UTF-8 or go with straight binary? (This may
>> be up to Nick)
>
> I will propose binary. Any other application domain can then choose UTF-8,
> UTF-16, UCS2 or whatever encoding they wish. This will make Herb's imgCIF a
> legitimate STAR application while not a CIF2 application because of his
> binary component being in binUTF? binUCS?.
>
>> 5. �Can we use whitespace instead of comma as a list item delimiter?
>> � � -not yet tackled seriously but deserves consideration
>
> I will propose it has to be a comma, but make the coercion rule that space
> separated values in a list-type object be coerced into comma separated
> values. That is, read spaces as you want, but don't encourage them.
>
>
>> 6. �Are braces only or square brackets + braces used to delimit lists
>> and associative arrays?
>> � � - some consider this decision to be coupled to (3), obvious preference
>> � � � is for square brackets and braces if other issues are solved
>
> With my proposal for 3 acceptable, then I would propose returning to [] for
> lists and {} for associative arrays, making it possible to distinguish the
> two at the lexical level by reading the first character.
>
>> 7. �What is the exact form of the header comment (there was some
>> discussion of adding a second character such as % or !)?
>
> I think it should be the same as Unix shell headers.
>
>> 8. �Usage of triple-quoted strings: (a) do we need them? (b) do we
>> need both of them?
>
> (a) Yes if you want inline multiline strings. (b) Seems superfluous but
> makes encoding a """ in a ''' string much easier (and vice versa) without
> having to elide.
>
>> 9. �Are general unicode characters allowed in non-quote-delimited strings?
>
> You know my view on this. I want to discourage non-delimited strings and
> encourage delimited strings. But I can't see (for now) any reason that the
> characters sets have to be different.
>
> There is one thing about Unicode we have to clarify. The XML specification
> does not allow ALL Unicode characters because some of them (I think) break
> the parsing process. The exclusion set is small, but probably significant. I
> don't know the details but when we say Unicode characters we had better be
> explicit as to which. Herb, you seem to have a handle on the XML spec maybe
> you can explain what the exclusion set is and why. You can propose to this
> group what the Unicode set should be.
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia � �t: +61 (0)8 6488 3452
> 35 Stirling Highway � � � � � � � � � �f: +61 (0)8 6488 1089
> CRAWLEY, Perth, �WA �6009 AUSTRALIA � w3: www.csse.uwa.edu.au/~nick
> MBDP �M002
>
> CRICOS Provider Code: 00126G
>
> e: [email protected]
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

[ddlm-group] What we have resolved so far (James Hester)

Re: [ddlm-group] What we have resolved so far (Nick Spadaccini)

Prev by Date: Re: [ddlm-group] Relationship of CIF2 to legacy platforms

Next by Date: [ddlm-group] Use of elides in strings

Prev by thread: Re: [ddlm-group] What we have resolved so far

Next by thread: [ddlm-group] Which brakets are reserved?

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] What we have resolved so far