[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF-2 changes

On 9/11/09 10:39 AM, "James Hester" <jamesrhester@gmail.com> wrote:

> Re datanames: remember that we have made a more or less explicit
> promise that current datanames can be used without change in CIF2
> files, therefore datanames with square brackets will be legitimate in
> CIF2 data files.  I don't recall any discussion where we agreed to
> work around this by some sort of reprocessing.

I think the (unachievable) promise was that people would be able to submit a
CIF1 as if it were a CIF2. That is not possible the moment you mandate
(necessarily) that commas are disallowed in an non-delimited string. That
"promise" simply cannot be kept (and with forethought would never have been

The dictionaries in DDLm/dREL have datanames that do not contain any of the
offending characters. If we are going to attempt to support current names we
may need to decide which. I don't recall exactly but there is some
inconsistency and/or duplication between small and macromolecular
dictionaries, isn't there?
> And the only CIF2 parsers that will fail when they see a square
> bracket in a dataname are those that are (incorrectly) prepared to
> accept no spaces between dataname and datavalue.  So I repeat: the
> only reason we have moved away from square brackets as list delimiters
> is so that in the specific case that a space is missing between a
> dataname and a datavalue the parser can continue.  I see no other
> justification.

Yes it is the reason. But short of re-visiting the long discussion on
whitespace as token separators (as they usually are) versus whitespace being
1 of the 2 token terminating characters, and the subsequent problem that
there need to be two definitions for every type depending on their position
in recursion, it is a necessary consequence.

> On Mon, Nov 9, 2009 at 12:55 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote:
>> James and Joe are correct on this point. The dropping of [] was for reasons
>> of ease to older CIF1 files. BUT absolutely it introduces problems also,
>> while trying to ease other parts of the parsing process. I don't know if my
>> thinking was mature enough on this issue when I suggested the change.
>> Let me make my position clear. I WOULD MUCH PREFER to have lists defined by
>> square brackets and associative arrays by curly brackets. In this way the
>> parser can determine at the purely lexical level that it is in a list or an
>> associative array on reading the first [ or { when it is in the context.
>> My thinking for making both delimited by { came from the fact that there are
>> existing datanames with embedded [ and a CIF2 parser will take this to be
>> the beginning of a list. To simplify this parsing I suggested removing []
>> from the set of disallowed characters. Joe K quite correctly states that in
>> a CIF2 file there can be no [] in a dataname so it will be safe.
>> After this thread there was discussion on a leading comment identifying a
>> file as CIF2. IF THIS IS present the dilemma is removed. At the first line
>> of the parse we know whether to drop in to the CIF1 or the CIF2 lexical
>> rules of our parser. BUT I am NOT sure if we MANDATED this first line
>> comment. An alternative is to (essentially) require a re-parse to determine
>> whether the file is CIF1 or CIF2. Such a pre-parser cannot assume either
>> rule set, but go through the first X lines, character by character until it
>> can confidently conclude it is one or the other.
>> Either way these approaches remove the problem of CIF1 from the syntactic
>> specification of CIF2 (again something I would prefer to do).
>> We should vote on this since it will make the issue concrete. We can employ
>> square brackets to identify lists if we abstract away the issue of existing
>> CIF1 datanames to a higher level. Which is a moot point anyway because there
>> are other aspects of CIF1 that break CIF2 parsers that we need to deal with.
>> Finally employing [] makes it much easier to cast everything into Python
>> (though this is just a convenience and not a critical reason for employing
>> them).
>> And yes, tuples have been dropped from the CIF2 data types. Immutability of
>> a tuple is an implementation issue and not a representation issue. In terms
>> of representation it makes no difference to call a CIF object a tuple or a
>> list.



Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au

ddlm-group mailing list

Reply to: [list | sender only]