Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF-2 changes




On 9/11/09 10:39 AM, "James Hester" <jamesrhester@gmail.com> wrote:

> Re datanames: remember that we have made a more or less explicit
> promise that current datanames can be used without change in CIF2
> files, therefore datanames with square brackets will be legitimate in
> CIF2 data files.  I don't recall any discussion where we agreed to
> work around this by some sort of reprocessing.

I think the (unachievable) promise was that people would be able to submit a
CIF1 as if it were a CIF2. That is not possible the moment you mandate
(necessarily) that commas are disallowed in an non-delimited string. That
"promise" simply cannot be kept (and with forethought would never have been
made).

The dictionaries in DDLm/dREL have datanames that do not contain any of the
offending characters. If we are going to attempt to support current names we
may need to decide which. I don't recall exactly but there is some
inconsistency and/or duplication between small and macromolecular
dictionaries, isn't there?
 
> And the only CIF2 parsers that will fail when they see a square
> bracket in a dataname are those that are (incorrectly) prepared to
> accept no spaces between dataname and datavalue.  So I repeat: the
> only reason we have moved away from square brackets as list delimiters
> is so that in the specific case that a space is missing between a
> dataname and a datavalue the parser can continue.  I see no other
> justification.

Yes it is the reason. But short of re-visiting the long discussion on
whitespace as token separators (as they usually are) versus whitespace being
1 of the 2 token terminating characters, and the subsequent problem that
there need to be two definitions for every type depending on their position
in recursion, it is a necessary consequence.

> 
> On Mon, Nov 9, 2009 at 12:55 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote:
>> James and Joe are correct on this point. The dropping of [] was for reasons
>> of ease to older CIF1 files. BUT absolutely it introduces problems also,
>> while trying to ease other parts of the parsing process. I don't know if my
>> thinking was mature enough on this issue when I suggested the change.
>> 
>> Let me make my position clear. I WOULD MUCH PREFER to have lists defined by
>> square brackets and associative arrays by curly brackets. In this way the
>> parser can determine at the purely lexical level that it is in a list or an
>> associative array on reading the first [ or { when it is in the context.
>> 
>> My thinking for making both delimited by { came from the fact that there are
>> existing datanames with embedded [ and a CIF2 parser will take this to be
>> the beginning of a list. To simplify this parsing I suggested removing []
>> from the set of disallowed characters. Joe K quite correctly states that in
>> a CIF2 file there can be no [] in a dataname so it will be safe.
>> 
>> After this thread there was discussion on a leading comment identifying a
>> file as CIF2. IF THIS IS present the dilemma is removed. At the first line
>> of the parse we know whether to drop in to the CIF1 or the CIF2 lexical
>> rules of our parser. BUT I am NOT sure if we MANDATED this first line
>> comment. An alternative is to (essentially) require a re-parse to determine
>> whether the file is CIF1 or CIF2. Such a pre-parser cannot assume either
>> rule set, but go through the first X lines, character by character until it
>> can confidently conclude it is one or the other.
>> 
>> Either way these approaches remove the problem of CIF1 from the syntactic
>> specification of CIF2 (again something I would prefer to do).
>> 
>> We should vote on this since it will make the issue concrete. We can employ
>> square brackets to identify lists if we abstract away the issue of existing
>> CIF1 datanames to a higher level. Which is a moot point anyway because there
>> are other aspects of CIF1 that break CIF2 parsers that we need to deal with.
>> 
>> Finally employing [] makes it much easier to cast everything into Python
>> (though this is just a convenience and not a critical reason for employing
>> them).
>> 
>> And yes, tuples have been dropped from the CIF2 data types. Immutability of
>> a tuple is an implementation issue and not a representation issue. In terms
>> of representation it makes no difference to call a CIF object a tuple or a
>> list.
> 

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.