Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

  • To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
  • Subject: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
  • From: Nick Spadaccini <nick@csse.uwa.edu.au>
  • Date: Thu, 10 Mar 2011 15:01:38 +0800
  • Conversation: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
From: Doug <doug.duboulay@gmail.com>
>> One CIF feature that no other software language supports natively
>> are measurement numbers, with their SUs. Maybe they should be encoded as
>> tuples for wider compatibility?

I am not this does represent wider compatibility in reporting of
uncertainty. The not so arcane approach of number(su) with an implied scale
in su given the size of number, is one of the four ISO accepted reporting
syntaxes for uncertainty. Enshrining them in a data structure like a tuple
is not one of the accepted syntaxes. What's more we need separate the
representation in software (which quite reasonably could be a tuple) from
syntactic representation in a file.

>> If you were comfortable with that and python was all important, you
>> could define CIF2 to be a python data structure, suck it all in as a
>> single string and simply hit eval() (or some wrapped "safe" form of eval).
>> For that matter, JSON is very similar in structure and effectively
>> standardised.

Python is not more important than any other. Most other options have an
eval() or exec() path to evaluations of strings. Contrary to popular
misconception there is nothing in dREL that is Python. The syntax is generic
and the string encapsulation is pure STAR.
>> Other programming languages support various forms of string expansion.
>> For instance in  bash/sh/csh/perl/php/tcl typically double quoted
>> strings expand with various forms of "$substitution ${of} $(variables)".

That is why the current string syntax of the STAR that supports DDLm and
dREL treats the strings as raw, save for a single escape rule to protect
(only) the character that delimits the string. This is the way of
incorporating a " in to a " delimited string. Similarly for the ' character.

What to do with the string is left to the dictionary to define, in
particular exploiting the presence of dREL as an operational language.
>> In Python there is also string expansion from lists and dictionaries:
>>    """ %(substitution)s %(of)s %(variables)d""" % \
>>        {'variables': _my_CIF_data_name's_value?,
>>         'of' : 'very silly',
>>         'substitution': 'this is'
>>        } 
>> Are these strings likely to be a construct that could exist in a CIF, or
>> have a role in the post processing of parsed CIFs? I could see it as
>> useful to ensure that values referenced in prose stay in sync with
>> actual CIF data values.

Appropriately delimited such a string could be supported. What to do with
it, eg push it through an eval() method. However the ability to have the
above as two separate data items, one being a script and the other being the
data for the script while simultaneously being ordinary data in a CIF, would
be a little harder to functionally achieve. If of course the script is
written in dREL that would be much easier to achieve because the dictionary
engine is always evaluating dREL code.

>> Some CIF dictionaries contain regular expression definitions which
>> generally are easier to understand as python raw strings r"..."
>> That wouldn't have direct impact on CIF2 string handling, but if the
>> handler was already present for the dictionary, then it could presumably
>> be easily co-opted for the CIF, I suspect.

Again a regex handler does not have to be Python specific. All that is
necessary is an agreed syntax for the regex.

>> If the primary CIF2 stakeholders were assumed to be the various databases,
>> then maybe all CIF string values should really be optimised for direct
>> injection via SQL (maybe its just convention but AFAIK, only single quotes
>> seem to be significant)?

A minor API issue, but you do highlight misconceptions about stakeholder
uptake. If minor issues from databases systems actually force CIF2 to
redefine syntax to meet such needs, then CIF2 will become highly restricted
and beholden to vested interests of stakeholders. CIF2 should remain simple
and expressive, and let the API handle everything else.

>> As Peter indicated, there's a spectrum of compatibilities that
>> could be argued for or against, but where do you draw the line?

>> My personal preference would be for a lightweight spec that I could
>> easily implement myself, at a pinch, in my language of choice
>> (or better, that someone else had already implemented), or for a
>> more complicated spec when there were tools available that
>> automatically built the parser and handler.

That is how DDLm and dREL is implemented (as well you know since you are a
programmer of it).
>> If I was writing Tcl, I wouldn't really want to wrap and include python in
>> order to handle a string correctly, if thats what the implications are.

Again language choice is not part of the STAR at the syntactic level, and
need not be part of CIF2. It is not necessary, and far more general not to
lock a programming language in at the syntax level. It is more expressive to
allow an sequence of characters within a string and then level the
dictionary indicate how to deal with the internals.



Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au

Reply to: [list | sender only]