Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF-2 changes

*Note the request at the end for those that haven't indicated their
agreement to do so now*

I have now had some time to digest the discussion and form an opinion.
 Firstly, I agree that David's option 3 is a reasonable way forward,
while not being optimal: to restate David's proposal in case I or
others have misinterpreted it:

A CIF2 dictionary contains both CIF1 and CIF2 datanames (where these
differ), with the CIF1 datanames aliased to the CIF2 dataname.  A dREL
expression uses only the CIF2 dataname so has no syntactic issues.  A
CIF2 application can read a CIF1 file and understand the translation
between the dataname variants.  As a result of this behaviour:

(1) A CIF1 datafile can make full use of DDLm dictionary methods
(2) There is thus no need to cater for CIF1 datanames in purely CIF2 files
(3) There is thus no need for square brackets in datanames, so we can
return to our original bracket behaviour

It is not an optimal solution for the following reason: hard-coded
datanames are absolutely essential in any scientific application using
CIF, as only the programmer knows what they want in a scientific sense
(e.g. anisotropic parameters to draw an ellipsoid).  Therefore, those
applications that have hardcoded CIF1 style datanames (and don't use
dictionaries) cannot accept certain CIF2 datafiles unless they either
(i) edit their source code to add the new datanames (ii) use a
converter (iii) use dictionaries, all of which require extra work.  As
a result, a CIF writing program will tend not to use CIF2 syntax, as
the receiving software is less likely to understand it.  But we have
resigned ourselves long ago to slow CIF2 uptake, and this decision
just makes it that bit slower.

I still think the optimal solution is a dREL 'quote' function, as I
really, really don't see the point of coercion rules in the CIF
context, but I am happy to now accept no square brackets in CIF2
datanames (while secretly hoping that in some distant future standard
we add them back in...)

So, finally, I would invite Brian, Joe, John and Simon to indicate
their acceptance or otherwise of this state of affairs, and we will
mark this as resolved.

On Tue, Nov 17, 2009 at 5:29 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote:
> David’s Option 3 is the simplest way forward, and actually revisits much of
> what was discussed back in 2007-08. Somehow those discussions were locked
> far back in my brain, only to be awakened by David’s summary. Thanks for
> that.
>
> So now I return to the STAR syntax. DDLm is part of STAR and hence
> restrictions on data names so they can be parsed etc is a STAR issue. I am
> brought around to Joe’s idea that STAR accepts any 8 bit character sequence
> since that is the most complete set – and that this will be restricted to
> UTF-8 within the CIF specification. Any other adoptee of STAR can choose
> whatever restricted encoding they wish.
>
> I still need to treat data names as programming identifiers within dREL so
> accordingly I propose we restrict the data names in STAR (and all variants)
> to be ASCII [A-Za-z0-9_.] as we have used in the sample dictionaries, DDLm
> and dREL.
>
> The data values will be represented as discussed in previous threads and
> that the reverse solidus and the token delimiters discussed will be ASCII
> characters. We can now return to [] as the list delimiters, and {} as the
> associative array delimiters.
>
> Backward compatibility to CIF1 names is handled by exploiting the _alias
> attributes in the definition. A CIF2 parser with dictionary can handle
> everything. Any CIF1 parser can handle CIF1 data files (also CIF2 data files
> up to a point, but won’t know what the data names mean – unless they have
> hardcoded it).
>
> A CIF2 parser would like a leading comment to tell it what sort of file it
> is parsing. It the absence of that comment, a pre-scan will need to be done.
> The telltale indicators it is a CIF1 data file are multiple occurrences of,
>
> (1) data names that potentially contain [] or /
> (2) unquoted strings with illegal characters
> (3) quoted strings that result in parse failure (typically because they must
> have an embedded [but not elided] quote character as allowed in CIF1).
>
> It needs to be a pre-scan because all 3 of the above in an identified CIF2
> data file would result in something quite different since there are coercion
> rules for when the whitespace separator is missing.
>
> For instance IF I KNOW it is a CIF2 file and I read
>
> _name[1]
>
> Then this can only be an error and I coerce into
>
> _name   [1]
>
> IF I DON’T KNOW the file type, the occurrence of _name[1] flags it as
> potentially a CIF1 file. If _name[1] is in an alias list, this re-enforces
> the likelihood of CIF1. Multiple instances of these “errors” (or any others
> in the above list) indicate it is a CIF1 file (my only other conclusion
> would be it is a VERY BADLY written CIF2).
>
> I think this takes us back to a very simple rule set, and I don’t think the
> restriction in the character set for data names will cause problems. For all
> the excitement of UTF-8 etc I know of programming languages that support
> reading and writing data in such encodings but I haven’t seen one that
> allows/encourages one to write programmes declaring identifiers in UTF-8
> character sets. (They well exist I just haven’t seen them).
>
>
> On 17/11/09 12:04 AM, "David Brown" <idbrown@mcmaster.ca> wrote:
>
> James,
>
> There seems to be a lull in the discussions on CIF2 syntax so this would be
> a good time for you, or appointed chosen by you, to summarize where we are
> at and propose a set of rules that will can work with as we move forward.  I
> realize that much of the work I have already done on dictionaries will need
> to be revisited, and Herbert also seems anxious to have some decisions on
> the various topics that have been discussed.
>
> I believe we have a consensus on a number of points, but these need to be
> written down clearly and need our formal agreement so we can move ahead.
>
> David
>
> ________________________________
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
>
> CRICOS Provider Code: 00126G
>
> e: Nick.Spadaccini@uwa.edu.au
>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.