Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Relationship asmong CIF2, STAR,CIF1 and Python. . . .

Following John's lead, but selecting only those features that I have a strong opinion about (good or bad) and
that may benefit my work with CIF as a publication format (i.e. beyond 'crystallographic' content):

* Feature: Unicode support

Good - will require care in implementation, and a strategy to address current practice of using escape sequences
to represent accents and greek symbols in CIF, but its the way forward.

* Feature: triple-quoted strings

In CIF2 as drafted, these are necessary to reduce the potential impact of the restrictions to the contents of single-quoted strings.
I wonder if we will see the use of the treble-quotes increase as the adoption of CIF2 increases - as it might be seen as a
convenient 'catch all' when editing a CIF by hand. Obviously, if the triple-quotes require a different syntax for the values they
delimit, then their use in this manner will not be straightforward. [Please bear in mind that this is purely speculation - and
in the context of preparing a CIF for journal publication]

* Feature: list data type

Very useful - e.g. for listing _id's that link to looped data - e.g. loop_ _author_name _author_address_id linked to loop_  _address_id _address_institution ...
 (though realization will obviously require new dictionary items)

* Feature: table data type

Potentially useful beyond the dictionary level - but I wouldn't expect to encounter it at the CIF file level in the short term.
Purely as a matter of taste, I would prefer the separation of the label and value to be by whitespace rather than colon, simply
because the basic lexing would follow the same rules as for all other delimited tokens - but I emphasize that this is solely
a personal quirk.

Please take these comments as intended - purely a personal view of how some of the features might relate to my work.

I am tempted to take this further and provide a picture of what I would like to see, again with
the emphasis on CIFs use as a publication format. As Herbert recognizes, the publishing field is progressive.
The IUCr journals activities reflects this - how far CIF can be exploited in this respect remains to be seen...



From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Friday, 14 January, 2011 17:54:43
Subject: Re: [ddlm-group] Relationship asmong CIF2, STAR, CIF1 and Python. . . .

On Friday, January 14, 2011 9:23 AM, Herbert J. Bernstein wrote:
>hearing what others think on the current version of CIF2 feature by feature would help.

I structure these comments after the changes document, and I comment primarily about the changes and additions in CIF2.  If more is desired then I probably need a list of the CIF features about which comments are solicited.

* Feature: Magic Code
This, or something like it, is necessary to enable CIF parsers to adapt automatically to CIF1 vs. CIF2 input.  I consider that a worthwhile thing to be able to do, so I am in favor of this feature.

* Feature: Unicode support
This is my favorite new CIF feature, though I think it could be slightly improved:
1) I think certain Unicode character categories (space and control characters) should be forbidden in data names
2) I think data block and save frame codes should follow the same rules as data names (except for the leading underscore), subject to line length limitations (it's not altogether clear from the changes document whether they do, but I have supposed not)

I do not object to limiting recognized whitespace characters and newlines as CIF2 currently does.  However, if (1) above were adopted then it would provide a clearer path to possible future full adoption of Unicode whitespace semantics.

I remain satisfied with the compromise we reached on encoding.

* Change: treatment of newline
I am on record as favoring this sort of treatment as far back as during the preparation of CIF 1.1, and I still favor it.

* Change: allowed whitespace-delimited data values
I would be happier if it were not necessary to place stronger restrictions on the ASCII content of these values, but if list and table data types are retained then I can live with it.
For forward-looking compatibility, it might be better to forbid Unicode whitespace from appearing in these, even though allowing such characters is not a problem for CIF2.

* Change: string delimiters ('") cannot be embedded in data values they delimit
As unusual as CIF1's rules for quoted data values are, I do not favor changing them in CIF2.  I would find the change more palatable with the addition of some sort of escape or elide by which the delimiter could be embedded, but more than that I would favor backwards compatibility.

* Restriction: text blocks cannot contain <newline>;
This is not a change from CIF 1.1 as far as I can tell, but our discussions have shown that it differs at least from some older interpretations of CIF.  I am satisfied with it, and I prefer it for compatibility with CIF 1.1.  If there were evidence of a non-trivial body of CIFs relying on a different interpretation, and if the changes to single-quoted string formatting were reversed, then I might be persuaded to change my mind.

* Feature: triple-quoted strings
I have never been especially hot about these, but I saw the advantage they provided when coupled with the changes to single-quoted string syntax, and their use for quoting text block delimiters.  I also see a compatibility advantage to using exclusively these for new syntax and semantics, such as we have recently been discussing.  I am mildly supportive.

* Feature: list data type
Inasmuch as this is supposedly needed for ddlm / dREL, I'm OK with it.  I imagine there are other reasonable use cases as well, though none presently come to mind.  I don't object to it, but neither do I particularly favor it.

* Feature: table data type
My opinion of this is pretty much the same as of the list data type.  I am slightly more negative about it because it requires a restriction of allowable whitespace-delimited data values relative to CIF 1.1, but I still do not object.

* Undocumented Change: updated data model
I have lately realized that many of the CIF2 syntax changes imply changes to the CIF data model, but that I find no documentation of those data model changes.  Really, the data model updates should have been settled first, for the syntax serves the model, not the other way around.  This would answer questions such as the one I recently injected into the elide discussion, about whether elides should be accepted that represent characters outside the revised CIF character set.

I have no problem with extension of the CIF1 data model, but I think we need to better characterize the details of the new model.



John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

ddlm-group mailing list
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.