Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

On Tue, 1 Mar 2011, James Hester wrote:
> Dear COMCIFS members:
> 
> The DDLm group is currently engaging in developing an elide mechanism
> for the CIF2 standard.  Our deliberations have reached something of an
> impasse due to disagreement around the use of triple quotes as a
> string delimiter.  Python is a popular programming language that also
> uses triple quotes to delimit strings. One side of the discussion
> considers that use of triple quotes as a string delimiter means that
> all escape sequences recognised by Python should also be recognised by
> CIF, in order to avoid confusion and improve consistency with
> mainstream (ie Python) practice.  The other side of the discussion
> sees little to benefit to CIF from including the additional ten or so
> escape sequences and advocates leaving them out of the CIF2 standard,
> instead adopting the minimal number of escape sequences to allow
> eliding.
> 
> We would like COMCIFS participants to provide some input as to the
> appropriate policy to be followed in this situation: should we seek
> maximum consistency with other usage of identical syntactical
> constructs, despite the imposition of unnecessary technical baggage?
> Or should we produce a standard as simple and streamlined as possible,
> despite the potential for confusion and unorthodox behaviour?

One CIF feature that no other software language supports natively
are measurement numbers, with their SUs. Maybe they should be encoded as
tuples for wider compatibility?

If you were comfortable with that and python was all important, you 
could define CIF2 to be a python data structure, suck it all in as a 
single string and simply hit eval() (or some wrapped "safe" form of eval).
For that matter, JSON is very similar in structure and effectively 
standardised.

Other programming languages support various forms of string expansion.
For instance in  bash/sh/csh/perl/php/tcl typically double quoted 
strings expand with various forms of "$substitution ${of} $(variables)". 

In Python there is also string expansion from lists and dictionaries:

   """ %(substitution)s %(of)s %(variables)d""" % \
       {'variables': _my_CIF_data_name's_value?, 
        'of' : 'very silly', 
        'substitution': 'this is'
       } 

Are these strings likely to be a construct that could exist in a CIF, or 
have a role in the post processing of parsed CIFs? I could see it as 
useful to ensure that values referenced in prose stay in sync with 
actual CIF data values.

Some CIF dictionaries contain regular expression definitions which
generally are easier to understand as python raw strings r"..." 
That wouldn't have direct impact on CIF2 string handling, but if the 
handler was already present for the dictionary, then it could presumably
be easily co-opted for the CIF, I suspect.

If the primary CIF2 stakeholders were assumed to be the various databases,
then maybe all CIF string values should really be optimised for direct 
injection via SQL (maybe its just convention but AFAIK, only single quotes 
seem to be significant)?

As Peter indicated, there's a spectrum of compatibilities that 
could be argued for or against, but where do you draw the line?

My personal preference would be for a lightweight spec that I could 
easily implement myself, at a pinch, in my language of choice
(or better, that someone else had already implemented), or for a 
more complicated spec when there were tools available that 
automatically built the parser and handler.

If I was writing Tcl, I wouldn't really want to wrap and include python in 
order to handle a string correctly, if thats what the implications are.


Doug
(from the peanut gallery)


Reply to: [list | sender only]