Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Draft JSON specification for CIF

I forgot to say thank you for thinking hard about how to do this. Lots of good ideas there.

On Wed, Apr 12, 2017 at 7:28 PM, James Hester <jamesrhester@gmail.com> wrote:
In reply to Bob:

(1) CIF2 header: Sorry, my mistake, there should be a CIF2 header for the example CIF file (it is required).  I have corrected the draft at Github.
(2) ["dataname.a"] vs "dataname.a" . I don't understand why JSON would require that a string containing a period should be put inside square brackets

Nothing requires it. When people use JSON they typically transform it into JavaScript or other-language arrays. In that case we would have to use xxx["dataname.a"] to reference this field, because xxx.dataname.a would reference an element of xxx.dataname, which does not exist. It just seems odd to me. 

(3) Substituting "_" for ".". The proposed standard simply preserves the dataname as it appears in the CIF file (whether it contains "." or not). Strictly speaking, any equivalence between datanames can only be determined by consulting a CIF dictionary.  Although COMCIFS naming policy leads to some useful shortcuts, they are not part of the CIF syntax standard so should not be included in the JSON standard.

I guess I was thinking about mmCIF and its structured category.item keys such as _atom_site.auth_atom_id and wondered if one might want to have this JSON creating an _atom_site object with an auth_atom_id key/value pair when there is a period in the name. But then, again, we have CIF keys such as _atom_site.aniso_B[2][3]. As I think about it now, that's probably not worth the trouble, since the numbers in mmCIF keys like that are 1-based, not 0-based, and that would be a headache. I guess it is what it is. And, besides, mmCIF is not CIF2, is it....
(4) Agree that a test implementation that can digest a range of valid CIF files would be necessary before acceptance.  Once we have some consensus I'm happy to produce something in Python (although one using the CIFAPI would be great as well).
I guess it's no problem that a JSON reader might completely scramble the order of tags in a CIF file, right?

What about capitalization? While it is true that CIF data names are case insensitive -- and maybe for that reason -- it seems to me that normalizing that to all lower case would be a good move. It would make reading these immensely easier and faster. I think if I wrote a reader, I would have to go through all the keys and change them to lower case prior to doing anything with it. Otherwise I couldn't reliably check for a key.

One of the trickier aspects of reading CIF files is that whenever you have a single object -- say, a single atom, or a single operator -- you can represent  it as a loop or individually. This will lead to complications in JSON as well, because for many fields one will still have to check each time for the existence of an array type or a simple value. I can see how the "loop tags" idea might help there. Personally, I would prefer "loop_tags" not "loop tags" because, again, that's just a pain to have to always use ["xxxx"] to reference elements with keys like that.


cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.