Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Draft JSON specification for CIF

Dear all,

we at the Crystallography Open Database (COD) have been experimenting with CIF <-> JSON conversion for a while. We have arrived at a decision to map the internal representation of CIF files used by our software package 'cod-tools' (http://wiki.crystallography.net/cod-tools/) directly into JSON. We have been using 'cod-tools' internal representation (described in Merkys et al., 2016, http://dx.doi.org/10.1107/S1600576715022396) for some time and though it includes some redundant information, it has proven itself useful in automatically handling CIFs at the COD.

I have generated an example of CIF2 file (from proposal by James, https://github.com/merkys/CIF2JSON/blob/master/test.cif) converted to JSON and put it here: https://github.com/merkys/CIF2JSON/blob/master/test-pp.json. I believe our conversion method solves at least a part of the concerns voiced in this discussion:

1) Top level container is an array instead of an object, thus, order of CIF datablocks (represented as objects) in an input file is retained; uniqueness of datablock names are not enforced;

2) Data items are keys of 'values' sub-object, thus uniqueness of data names within a data block / save frame are enforced;

3) Values are *always* represented as strings exactly as given in CIF file, without losing any precisions;

3) Types of values are stored alongside in a sub-object 'types' of datablock. Thus, '?' value of type 'unquoted string' is easily distinguishable from '?' value of type 'quoted string' or 'textfield', reducing the need of methods to escape '?' and '.' values with special meanings;

4) Values of a tag are always put in an array; there is a sub-object 'inloop' of datablock, which tells whether a tag is looped or not;

5) Loops, their tags and order are described in sub-array 'loops' of datablock (same purpose as "loop tags").

In addition, SUs are extracted and presented alongside in sub-object 'precisions'. Version of CIF parsed is stored in sub-object 'cifversion' and version of cod-tools internal representation is put in 'version'. Backwards conversion JSON -> CIF is currently possible for CIF1.1 only, but we are going to implement it for CIF2 in the nearest future.

Best wishes,

On 12/04/17 14:36, Robert Hanson wrote:
CAF_YUvWYYDuDRvuDN1k=T0Ym8M3enEBPr45KNWF9m1Wng21rEw@mail.gmail.com" type="cite">
I'd like to see an actual CIF or mmCIF file turned into JSON by this spec before judging.
But I'm pretty sure this is exactly what Jmol does already.

My first reaction is that "dataname.a" is problematic. The presence of the period demands that ["dataname.a"] syntax be used. And it suggest poor parsing of the data labels. But it might be the  simplest option.

An important aspect it seems to me is how to parse data names that include "." vs. "_". Jmol ignores this difference completely,  partially because mmCIF and CIF seem to differ some on this. (I can't remember what the IUCr spec says on "." vs. "_".)

The CIF file as shown is invalid. I presume there is an implied CIF2 header?

I would recommend REQUIRING a CIF2 header.

Bob Hanson

cif-developers mailing list

Andrius Merkys
PhD student at Vilnius University Institute of Biotechnology, Saulėtekio al. 7, V325
LT-10257 Vilnius, Lithuania
Lecturer at Vilnius University Faculty of Mathematics and Informatics, Naugarduko g. 24
LT-03225 Vilnius, Lithuania
cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.