[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Mon, 12 Oct 2009 21:24:18 +0800
- Authentication-Results: postfix;
- In-Reply-To: <20091012095043.GA28431@emerald.iucr.org>
On 12/10/09 5:50 PM, "Brian McMahon" <bm@iucr.org> wrote: > I am leaning increasingly to the viewpoint that - so long as it is > technically possible to interconvert old and new CIFs without loss > of information (although there may be loss of functionality in the > CIF1.2->CIF1.1 direction) - the new specification should be maximally > disruptive. By maximally disruptive, you mean strict. To this I strongly agree. This is what I have been insisting on, not because I am trying to be an SOB, but because it is a specification and hence has to be strictly defined. How strictly it is enforced in the short term is a different matter, and that is what has dominated the discussion to date. I do have a problem with the way the term "we should deprecate" has been thrown around. I think most habe grossly underestimated how difficult it will be to parse according to the specification, and then folding back in to the older spec if something violates the new spec. I can create any number of strings that appear to be in old spec, but are actually in new spec and the deprecating parser will be fooled completely. Needless to say I will build "maximally disruptive" parsers so that I can hopefully fulfil the full, new specification. > "Maximally" of course doesn't mean "gratuitously". Only changes > should be introduced that have clear purpose, and those changes > should be clearly and cleanly documented on the CIF web site, > with an accompanying description of the intended purpose. > > As Herbert says, there are many communities. They should be reassured > that there is no need for them to change *anything*, until and unless > they need to use the new functionality offered by CIF1.2. But we will need to know in some way that it is not CIF1.2, hence the magic comment etc. It can possibly be done with parsing phase that detects the context and is able to establish the version, but the magic comment makes it much easier. > We do need to provide interconversion tools, and I think it may > make life a little easier if we *mandate* a "#\#CIF_1.2" magic > comment version identifier. Agreed. Does its absence strictly mean it cannot be CIF1.2? > Note also that it is very likely that the vast majority of data > CIFs created by many (if not most) 1.2-capable applications will > in practice be fully conformant to the existing 1.1 specs. 99% even to the point that ascii will be by far the most dominant. > With this philosophy, therefore, my votes are: > > "1.2 Terminate all quoted strings on the occurrence of the > trailing quoted delimiter without consideration of the > next character" Yes. > BUT with James' proviso: "I would require that there was always > whitespace between tokens, regardless of the status of embedded > delimiters." You are requiring this of CIF1.2 writers. > > "2.3 Modify the current convention to forbid the use of > any characters other than a strictly limited set > of characters, making it an error to read a non-delimited > string that does not comply even if the intention > can be inferred from context" Yes. > "3.2 Use UTF-8" This I still have reservations about this. Simon accepts that text editors can't handle these easily (especially in edit-insert mode) but soon no-one will be using CIFs in this way. It will all be application driven. Somebody said this nearly 15 years ago. Namely me. I was wrong then, I think Simon will be proven wrong also - I hope not, but I fear he will be. People underestimate the power of the Unix philosophy inter-piped tools sets that deal with non binary input, with editing in text, the dominance of ascii etc. I believe James makes the same presumption about the emerging dominance of UTF-8. The UTF-8 spec is 16 years old! It is not as new as you think, and ascii still dominates. I am a little biased. Flex and Bison are a little wobbly on utf-8 so it makes life much harder (I am not bout to build a full STAR parser by hand). JavaCC is cool with utf-8, but I don't want to build everything in Java, just some of it. cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Index(es):