Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] [John.Bollinger@STJUDE.ORG: Re: Feedback on draft CIF2specification from JohnBollinger]

  • To: ddlm-group@iucr.org
  • Subject: [ddlm-group] [John.Bollinger@STJUDE.ORG: Re: Feedback on draft CIF2specification from JohnBollinger]
  • From: Brian McMahon <bm@iucr.org>
  • Date: Mon, 10 May 2010 09:28:29 +0100
Forwarded on behalf of John Bollinger.

John: Apologies for the delay - I think this came in just after I left
the office for the weekend. To keep traffic flowing smoothly, I've
added you to ddlm-group for now. Feel free to unsubscribe if and
when you wish.


----- Forwarded message from "Bollinger, John C" <John.Bollinger@STJUDE.ORG> -----

From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
To: "'bm@iucr.org'" <bm@iucr.org>
Date: Fri, 7 May 2010 10:27:01 -0500
Subject: Re: [ddlm-group] Feedback on draft CIF2 specification from
Thread-Topic: Re: [ddlm-group] Feedback on draft CIF2 specification from

Dear Brian,

In view of the fact that the DDLm group appears to want to conduct
discussion of CIF2 on their own mailing list, and pursuant to IUCr's
instructions for non-members offering contributions to restricted
lists, I am sending you these remarks as a proposed contribution to
the ddlm-group list.  (Note: I am entirely willing to be added to the
DDLm list, but I don't think at the moment I'm prepared to contribute
outside the scope of CIF syntax discussion.) These are in response to
some of Joe Krahn's comments:

>> Point 21: (requiring the labels in a CIF2 Table to be quote-delimited.)
>This requirement was useless in the previous revision, because semicolon
>was in the set of characters that must be quoted. That is no longer true
>in the last revision.

The difficulty with regard to semicolon could be resolved, I think, by
allowing \n;-delimited strings as labels as well.  I should be surprised
to see real-world CIFs actually use such a feature, but this particular
question revolves around parsing and avoiding language ambiguity.  It
would appeal to me from a consistency standpoint for *all* the delimited
string representations to be usable as table index labels, though I don't
think it's very important.

>I still prefer that index labels be treated as any
>other data element, and not make it a special case. To do this,the
>delimiting colon should have optional whitespace. The index label only
>needs to be quoted by normal quote/whitespace delimiter lexing rules.
>Onquoted index lables are then valid, as long as whitespace separates
>the colon.

If the index labels are required to be delimited strings, and no
whitespace is allowed between the closing delimiter and the colon, then
the language has some nice properties:

1) lexical analysis can be completely insensitive to context, while at the
same time,
2) lexical analysis can reliably identify index labels

The role of a lexer performing such an analysis is completely separated
from the role of the grammar parser it feeds, while the lexer also eases
the parser's job by handling index label identification.  These properties
arise because a string that is not an index label must be separated from
so a delimited string followed immediately by a colon can only be an index
label or an error. In short, I am not in favor of allowing whitespace
between the index label and its colon.

On the other hand, I see no particular advantage to forbidding whitespace
between the colon and the following data value.  Removing that requirement
would allow tables to be written in a somewhat easier to read format, and
also would allow slightly longer values to be used without line folding.

For example, these would all be allowed and equivalent:

        "label": "value"



>Alternatively, the colon character could be made a mandatory-quoted
>character within tables. Then, only normal token quoting rules need

True, but I prefer to avoid special rules for particular contexts.
Another alternative, however, would be to _require_ the colon of the
index label to be separated from the value by whitespace.  That seems
to me more consistent with the rest of CIF2, and also could provide
for unambiguous use of unquoted index labels.  Thus all of the following
could be allowed and equivalent:

        "label": 'part1:part2'

        'label': part1:part2


;:              part1:part2

Doing that does make a bit more work for the parser, which must do the
right thing with

        label: part1:

in a Table, and also with

        _tag value:

outside, but that's doable.

>Either way, I think the optional whitespace around the colon allow nicer

I do like the option (at least) of whitespace after the colon, and I
agree that it affords nicer formatting.  I don't think the same is true
for whitespace before the colon, however, and I am particularly unexcited
by the prospect of

        label   :value

being allowed.

Best Regards,

John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

----- End forwarded message -----
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.