[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] CIF-2 changes
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] CIF-2 changes
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Mon, 09 Nov 2009 11:49:43 +0800
- Authentication-Results: postfix;
- In-Reply-To: <279aad2a0911081839l6be410f4udf47eb3b566e9765@mail.gmail.com>
On 9/11/09 10:39 AM, "James Hester" <jamesrhester@gmail.com> wrote: > Re datanames: remember that we have made a more or less explicit > promise that current datanames can be used without change in CIF2 > files, therefore datanames with square brackets will be legitimate in > CIF2 data files. I don't recall any discussion where we agreed to > work around this by some sort of reprocessing. I think the (unachievable) promise was that people would be able to submit a CIF1 as if it were a CIF2. That is not possible the moment you mandate (necessarily) that commas are disallowed in an non-delimited string. That "promise" simply cannot be kept (and with forethought would never have been made). The dictionaries in DDLm/dREL have datanames that do not contain any of the offending characters. If we are going to attempt to support current names we may need to decide which. I don't recall exactly but there is some inconsistency and/or duplication between small and macromolecular dictionaries, isn't there? > And the only CIF2 parsers that will fail when they see a square > bracket in a dataname are those that are (incorrectly) prepared to > accept no spaces between dataname and datavalue. So I repeat: the > only reason we have moved away from square brackets as list delimiters > is so that in the specific case that a space is missing between a > dataname and a datavalue the parser can continue. I see no other > justification. Yes it is the reason. But short of re-visiting the long discussion on whitespace as token separators (as they usually are) versus whitespace being 1 of the 2 token terminating characters, and the subsequent problem that there need to be two definitions for every type depending on their position in recursion, it is a necessary consequence. > > On Mon, Nov 9, 2009 at 12:55 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote: >> James and Joe are correct on this point. The dropping of [] was for reasons >> of ease to older CIF1 files. BUT absolutely it introduces problems also, >> while trying to ease other parts of the parsing process. I don't know if my >> thinking was mature enough on this issue when I suggested the change. >> >> Let me make my position clear. I WOULD MUCH PREFER to have lists defined by >> square brackets and associative arrays by curly brackets. In this way the >> parser can determine at the purely lexical level that it is in a list or an >> associative array on reading the first [ or { when it is in the context. >> >> My thinking for making both delimited by { came from the fact that there are >> existing datanames with embedded [ and a CIF2 parser will take this to be >> the beginning of a list. To simplify this parsing I suggested removing [] >> from the set of disallowed characters. Joe K quite correctly states that in >> a CIF2 file there can be no [] in a dataname so it will be safe. >> >> After this thread there was discussion on a leading comment identifying a >> file as CIF2. IF THIS IS present the dilemma is removed. At the first line >> of the parse we know whether to drop in to the CIF1 or the CIF2 lexical >> rules of our parser. BUT I am NOT sure if we MANDATED this first line >> comment. An alternative is to (essentially) require a re-parse to determine >> whether the file is CIF1 or CIF2. Such a pre-parser cannot assume either >> rule set, but go through the first X lines, character by character until it >> can confidently conclude it is one or the other. >> >> Either way these approaches remove the problem of CIF1 from the syntactic >> specification of CIF2 (again something I would prefer to do). >> >> We should vote on this since it will make the issue concrete. We can employ >> square brackets to identify lists if we abstract away the issue of existing >> CIF1 datanames to a higher level. Which is a moot point anyway because there >> are other aspects of CIF1 that break CIF2 parsers that we need to deal with. >> >> Finally employing [] makes it much easier to cast everything into Python >> (though this is just a convenience and not a critical reason for employing >> them). >> >> And yes, tuples have been dropped from the CIF2 data types. Immutability of >> a tuple is an implementation issue and not a representation issue. In terms >> of representation it makes no difference to call a CIF object a tuple or a >> list. > cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] CIF-2 changes (James Hester)
- Prev by Date: Re: [ddlm-group] CIF-2 changes
- Next by Date: Re: [ddlm-group] CIF-2 changes
- Prev by thread: Re: [ddlm-group] CIF-2 changes
- Next by thread: Re: [ddlm-group] CIF-2 changes
- Index(es):