[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] What we have resolved so far
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] What we have resolved so far
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Thu, 19 Nov 2009 09:58:33 +0800
- Authentication-Results: postfix;
- In-Reply-To: <279aad2a0911181622y4db0076cme34dd63d8353a592@mail.gmail.com>
A quick response to JRH's email, to put things in context. My more detailed summary is still coming. On 19/11/09 8:22 AM, "James Hester" <jamesrhester@gmail.com> wrote: > Nick's forthcoming email notwithstanding, here is a quick list of what > I think we have resolved and not resolved so far: > > RESOLVED: > > 1. The new standard is called CIF2 > > 2. All files conforming to the new standard must have a header > containing something like the characters "#CIF2" > > 3. Non-quote-delimited strings may not contain any > syntactically-significant characters (exact character set has been > specified by Nick, but before UTF-8 decision) > > 4. Quote delimited strings may not contain instances of the > terminating character, regardless of following whitespace. Unless you do as in (5) > 5. In a quote-delimited string, a reverse solidus escapes the > following character, if that character is otherwise syntactically > meaningful > > 6. Files are UTF-8 encoded > > 7. No tuples > > UNRESOLVED (with notes) > > 1. Do we maintain the fixed line length restriction? > - I will post something to the relevant thread to provoke a resolution Currently at 2048 bytes. I will propose maintaining this in deference to legacy and future Fortran programmes. > 2. Is an escaping reverse solidus part of the datavalue? > - This conversation didn't appear to resolve itself I will propose yes, that it is left to a downstream application. This is actually consistent with how Python works. My email timestamped Mon, 09 Nov 2009 10:35:41 +0800 Explained my reasoning. > > 3. Are square brackets permitted in datanames? (getting close to resolution) I will propose a character set restricted with only _ and . as allowed punctuation characters. All data names can be identifiers in dREL, and even those we assume won't be in dREL can be because someone writing a completely different dictionary can import our definitions and then add our data names to their dREL scripts. To simplify this issue I suggest avoiding the problem. Legacy CIF1 names will be aliased in CIF dictionaries so that when we read a CIF1 data name in a CIF1 file we can immediately map it to its CIF2 name (this avoids the need to remediate all existing CIF1 files). > 4. Does STAR also adopt UTF-8 or go with straight binary? (This may > be up to Nick) I will propose binary. Any other application domain can then choose UTF-8, UTF-16, UCS2 or whatever encoding they wish. This will make Herb's imgCIF a legitimate STAR application while not a CIF2 application because of his binary component being in binUTF? binUCS?. > 5. Can we use whitespace instead of comma as a list item delimiter? > -not yet tackled seriously but deserves consideration I will propose it has to be a comma, but make the coercion rule that space separated values in a list-type object be coerced into comma separated values. That is, read spaces as you want, but don't encourage them. > 6. Are braces only or square brackets + braces used to delimit lists > and associative arrays? > - some consider this decision to be coupled to (3), obvious preference > is for square brackets and braces if other issues are solved With my proposal for 3 acceptable, then I would propose returning to [] for lists and {} for associative arrays, making it possible to distinguish the two at the lexical level by reading the first character. > 7. What is the exact form of the header comment (there was some > discussion of adding a second character such as % or !)? I think it should be the same as Unix shell headers. > 8. Usage of triple-quoted strings: (a) do we need them? (b) do we > need both of them? (a) Yes if you want inline multiline strings. (b) Seems superfluous but makes encoding a """ in a ''' string much easier (and vice versa) without having to elide. > 9. Are general unicode characters allowed in non-quote-delimited strings? You know my view on this. I want to discourage non-delimited strings and encourage delimited strings. But I can't see (for now) any reason that the characters sets have to be different. There is one thing about Unicode we have to clarify. The XML specification does not allow ALL Unicode characters because some of them (I think) break the parsing process. The exclusion set is small, but probably significant. I don't know the details but when we say Unicode characters we had better be explicit as to which. Herb, you seem to have a handle on the XML spec maybe you can explain what the exclusion set is and why. You can propose to this group what the Unicode set should be. cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- [ddlm-group] Which brakets are reserved? (Herbert J. Bernstein)
- Re: [ddlm-group] What we have resolved so far (James Hester)
- References:
- [ddlm-group] What we have resolved so far (James Hester)
- Prev by Date: [ddlm-group] What we have resolved so far
- Next by Date: Re: [ddlm-group] CIF-2 changes
- Prev by thread: [ddlm-group] What we have resolved so far
- Next by thread: Re: [ddlm-group] What we have resolved so far
- Index(es):