Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] What we have resolved so far

We should split the unresolved issues into separate threads otherwise
we're going to have a hell of a time tracking it all.  To those
reading this message: if I haven't managed to initiate the relevant
threads by the time you feel moved to reply, please do so yourself.

On Thu, Nov 19, 2009 at 12:58 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote:
> A quick response to JRH's email, to put things in context. My more detailed
> summary is still coming.
>
>
> On 19/11/09 8:22 AM, "James Hester" <jamesrhester@gmail.com> wrote:
>
>> Nick's forthcoming email notwithstanding, here is a quick list of what
>> I think we have resolved and not resolved so far:
>>
>> RESOLVED:
>>
>> 1.  The new standard is called CIF2
>>
>> 2.  All files conforming to the new standard must have a header
>> containing something like the characters "#CIF2"
>>
>> 3.  Non-quote-delimited strings may not contain any
>> syntactically-significant characters (exact character set has been
>> specified by Nick, but before UTF-8 decision)
>>
>> 4.  Quote delimited strings may not contain instances of the
>> terminating character, regardless of following whitespace.
>
> Unless you do as in (5)
>
>> 5.  In a quote-delimited string, a reverse solidus escapes the
>> following character, if that character is otherwise syntactically
>> meaningful
>>
>> 6.  Files are UTF-8 encoded
>>
>> 7.  No tuples
>>
>> UNRESOLVED (with notes)
>>
>> 1.  Do we maintain the fixed line length restriction?
>>     - I will post something to the relevant thread to provoke a resolution
>
> Currently at 2048 bytes. I will propose maintaining this in deference to
> legacy and future Fortran programmes.
>
>> 2.  Is an escaping reverse solidus part of the datavalue?
>>     - This conversation didn't appear to resolve itself
>
> I will propose yes, that it is left to a downstream application. This is
> actually consistent with how Python works. My email timestamped
>
> Mon, 09 Nov 2009 10:35:41 +0800
>
> Explained my reasoning.
>
>>
>> 3.  Are square brackets permitted in datanames? (getting close to resolution)
>
> I will propose a character set restricted with only _ and . as allowed
> punctuation characters. All data names can be identifiers in dREL, and even
> those we assume won't be in dREL can be because someone writing a completely
> different dictionary can import our definitions and then add our data names
> to their dREL scripts.
>
> To simplify this issue I suggest avoiding the problem. Legacy CIF1 names
> will be aliased in CIF dictionaries so that when we read a CIF1 data name in
> a CIF1 file we can immediately map it to its CIF2 name (this avoids the need
> to remediate all existing CIF1 files).
>
>> 4.  Does STAR also adopt UTF-8 or go with straight binary? (This may
>> be up to Nick)
>
> I will propose binary. Any other application domain can then choose UTF-8,
> UTF-16, UCS2 or whatever encoding they wish. This will make Herb's imgCIF a
> legitimate STAR application while not a CIF2 application because of his
> binary component being in binUTF? binUCS?.
>
>> 5.  Can we use whitespace instead of comma as a list item delimiter?
>>     -not yet tackled seriously but deserves consideration
>
> I will propose it has to be a comma, but make the coercion rule that space
> separated values in a list-type object be coerced into comma separated
> values. That is, read spaces as you want, but don't encourage them.
>
>
>> 6.  Are braces only or square brackets + braces used to delimit lists
>> and associative arrays?
>>     - some consider this decision to be coupled to (3), obvious preference
>>       is for square brackets and braces if other issues are solved
>
> With my proposal for 3 acceptable, then I would propose returning to [] for
> lists and {} for associative arrays, making it possible to distinguish the
> two at the lexical level by reading the first character.
>
>> 7.  What is the exact form of the header comment (there was some
>> discussion of adding a second character such as % or !)?
>
> I think it should be the same as Unix shell headers.
>
>> 8.  Usage of triple-quoted strings: (a) do we need them? (b) do we
>> need both of them?
>
> (a) Yes if you want inline multiline strings. (b) Seems superfluous but
> makes encoding a """ in a ''' string much easier (and vice versa) without
> having to elide.
>
>> 9.  Are general unicode characters allowed in non-quote-delimited strings?
>
> You know my view on this. I want to discourage non-delimited strings and
> encourage delimited strings. But I can't see (for now) any reason that the
> characters sets have to be different.
>
> There is one thing about Unicode we have to clarify. The XML specification
> does not allow ALL Unicode characters because some of them (I think) break
> the parsing process. The exclusion set is small, but probably significant. I
> don't know the details but when we say Unicode characters we had better be
> explicit as to which. Herb, you seem to have a handle on the XML spec maybe
> you can explain what the exclusion set is and why. You can propose to this
> group what the Unicode set should be.
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
>
> CRICOS Provider Code: 00126G
>
> e: Nick.Spadaccini@uwa.edu.au
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.