[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] What we have resolved so far
- To: Nick.Spadaccini@uwa.edu.au, Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] What we have resolved so far
- From: James Hester <jamesrhester@gmail.com>
- Date: Thu, 19 Nov 2009 13:55:18 +1100
- In-Reply-To: <C72AC749.124D2%nick@csse.uwa.edu.au>
- References: <279aad2a0911181622y4db0076cme34dd63d8353a592@mail.gmail.com><C72AC749.124D2%nick@csse.uwa.edu.au>
We should split the unresolved issues into separate threads otherwise we're going to have a hell of a time tracking it all. To those reading this message: if I haven't managed to initiate the relevant threads by the time you feel moved to reply, please do so yourself. On Thu, Nov 19, 2009 at 12:58 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote: > A quick response to JRH's email, to put things in context. My more detailed > summary is still coming. > > > On 19/11/09 8:22 AM, "James Hester" <jamesrhester@gmail.com> wrote: > >> Nick's forthcoming email notwithstanding, here is a quick list of what >> I think we have resolved and not resolved so far: >> >> RESOLVED: >> >> 1. The new standard is called CIF2 >> >> 2. All files conforming to the new standard must have a header >> containing something like the characters "#CIF2" >> >> 3. Non-quote-delimited strings may not contain any >> syntactically-significant characters (exact character set has been >> specified by Nick, but before UTF-8 decision) >> >> 4. Quote delimited strings may not contain instances of the >> terminating character, regardless of following whitespace. > > Unless you do as in (5) > >> 5. In a quote-delimited string, a reverse solidus escapes the >> following character, if that character is otherwise syntactically >> meaningful >> >> 6. Files are UTF-8 encoded >> >> 7. No tuples >> >> UNRESOLVED (with notes) >> >> 1. Do we maintain the fixed line length restriction? >> - I will post something to the relevant thread to provoke a resolution > > Currently at 2048 bytes. I will propose maintaining this in deference to > legacy and future Fortran programmes. > >> 2. Is an escaping reverse solidus part of the datavalue? >> - This conversation didn't appear to resolve itself > > I will propose yes, that it is left to a downstream application. This is > actually consistent with how Python works. My email timestamped > > Mon, 09 Nov 2009 10:35:41 +0800 > > Explained my reasoning. > >> >> 3. Are square brackets permitted in datanames? (getting close to resolution) > > I will propose a character set restricted with only _ and . as allowed > punctuation characters. All data names can be identifiers in dREL, and even > those we assume won't be in dREL can be because someone writing a completely > different dictionary can import our definitions and then add our data names > to their dREL scripts. > > To simplify this issue I suggest avoiding the problem. Legacy CIF1 names > will be aliased in CIF dictionaries so that when we read a CIF1 data name in > a CIF1 file we can immediately map it to its CIF2 name (this avoids the need > to remediate all existing CIF1 files). > >> 4. Does STAR also adopt UTF-8 or go with straight binary? (This may >> be up to Nick) > > I will propose binary. Any other application domain can then choose UTF-8, > UTF-16, UCS2 or whatever encoding they wish. This will make Herb's imgCIF a > legitimate STAR application while not a CIF2 application because of his > binary component being in binUTF? binUCS?. > >> 5. Can we use whitespace instead of comma as a list item delimiter? >> -not yet tackled seriously but deserves consideration > > I will propose it has to be a comma, but make the coercion rule that space > separated values in a list-type object be coerced into comma separated > values. That is, read spaces as you want, but don't encourage them. > > >> 6. Are braces only or square brackets + braces used to delimit lists >> and associative arrays? >> - some consider this decision to be coupled to (3), obvious preference >> is for square brackets and braces if other issues are solved > > With my proposal for 3 acceptable, then I would propose returning to [] for > lists and {} for associative arrays, making it possible to distinguish the > two at the lexical level by reading the first character. > >> 7. What is the exact form of the header comment (there was some >> discussion of adding a second character such as % or !)? > > I think it should be the same as Unix shell headers. > >> 8. Usage of triple-quoted strings: (a) do we need them? (b) do we >> need both of them? > > (a) Yes if you want inline multiline strings. (b) Seems superfluous but > makes encoding a """ in a ''' string much easier (and vice versa) without > having to elide. > >> 9. Are general unicode characters allowed in non-quote-delimited strings? > > You know my view on this. I want to discourage non-delimited strings and > encourage delimited strings. But I can't see (for now) any reason that the > characters sets have to be different. > > There is one thing about Unicode we have to clarify. The XML specification > does not allow ALL Unicode characters because some of them (I think) break > the parsing process. The exclusion set is small, but probably significant. I > don't know the details but when we say Unicode characters we had better be > explicit as to which. Herb, you seem to have a handle on the XML spec maybe > you can explain what the exclusion set is and why. You can propose to this > group what the Unicode set should be. > > cheers > > Nick > > -------------------------------- > Associate Professor N. Spadaccini, PhD > School of Computer Science & Software Engineering > > The University of Western Australia t: +61 (0)8 6488 3452 > 35 Stirling Highway f: +61 (0)8 6488 1089 > CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick > MBDP M002 > > CRICOS Provider Code: 00126G > > e: Nick.Spadaccini@uwa.edu.au > > > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] What we have resolved so far (James Hester)
- Re: [ddlm-group] What we have resolved so far (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] Relationship of CIF2 to legacy platforms
- Next by Date: [ddlm-group] Use of elides in strings
- Prev by thread: Re: [ddlm-group] What we have resolved so far
- Next by thread: [ddlm-group] Which brakets are reserved?
- Index(es):