Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Feedback on draft CIF2 specification fromJohn Bollinger

James Hester wrote:
> Dear all,
> 
> If you haven't seen John Bollinger's useful feedback on our draft
> CIF2 specification, I recommend that you read it at
> 
> http://www.iucr.org/__data/iucr/lists/cif-developers/msg00269.html
...
It is nice to see someone else looking at the syntax details critically.

> Point 14: matching with the global, save, data etc. keywords should
> be case-insensitive.  This should be clarified in the spec
Is the plan for CIF2 to be case-sensitive for data names? If so, I think
it would be easier to make keywords case-sensitive as well, and require
lower case keywords.

> Point 15: (disallowing delimiters in strings).
I always disliked the highly non-standard CIF1 quoting rules. It is
trivial for a CIF2 parser to accept CIF1 embedded quotes with a warning,
because they are otherwise syntax errors for CIF2.

I would have much preferred Fortran-style quote escapes using double
quote characters, which also continues to be used in the common CSV
format. It is more effective than Python-philic triple quotes. But, it
is obviously too late to argue that point.

> Point 21: (requiring the labels in a CIF2 Table to be quote-delimited.)
This requirement was useless in the previous revision, because semicolon
was in the set of characters that must be quoted. That is no longer true
in the last revision. I still prefer that index labels be treated as any
other data element, and not make it a special case. To do this,the
delimiting colon should have optional whitespace. The index label only
needs to be quoted by normal quote/whitespace delimiter lexing rules.
Onquoted index lables are then valid, as long as whitespace separates
the colon.

For example, these are all equivalent:

  "index":a:b:c
  index : a:b:c
  "index" : "a:b:c"
  "index":"a:b:c"

Alternatively, the colon character could be made a mandatory-quoted
character within tables. Then, only normal token quoting rules need
apply. Also, some people might not like the ambiguous appearance of
unquoted colon-containing values as in the first example above, and
prefer the quotation requirement.

Using the requirement for quoting colons, and also keeping the optional
whitespace, the first two examples above are invalid. However, this
would also be valid:

  index:"a:b:c"

Either way, I think the optional whitespace around the colon allow nicer
formatting. The example in the proposal:

   "description":"""Cubic space group
   and metric cell vectors"""

With whitespace allowed, the value no longer needs to be split onto 2 lines:

   "description":
   """Cubic space group and metric cell vectors"""


Joe Krahn
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.