Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF-JSON draft 2017-05-15


Bob writes:

 

OK, I think I am confused by all these terms. I just hadn't thought about all these names yet. I'm using the language of http://journals.iucr.org/j/issues/2016/01/00/aj5269/index.html#SEC3.9 here.

Here's what I suggest:

the JSON equivalents of the CIF data block headers must be lower case (not sure that matters if "data_" precedes them.)
the JSON equivalents of the CIF data block item names must be lower case (this is critical)
the JSON equivalents of the CIF save block names must be lower case (not sure  that matters if "save_" precedes them.)
the JSON equivalents of the CIF save block item names must be lower case (this is critical)
the JSON equivalents of the CIF table data item names may be any case (no restriction here)

I will just point out the that CIF2 spec does not use the word "dataname" and instead uses the more natural language "data name" and I wish this would, too.

Absolutely, I will insert the spaces and keep everything consistent with the CIF specs as you suggest.



On 16 May 2017 at 04:16, Robert Hanson <hansonr@stolaf.edu> wrote:

    Two points:

    1. I do not understand the stripping of "data_" and "save_" from the names we have for these.


These are stripped as the data_ and save_ parts of the names perform a syntactical role in CIF, and the JSON curly brace essentially replaces their syntactical function of encapsulation.  Put another way, requiring that datablock names started with 'data_' would be unnecessary additional baggage.

Yes, OK, I see that. Still, I would suggest it doesn't add a significant amount of baggage to add "data_" or "save_" once or twice in a file, and it significantly improves readability both by humans and machines, at least in my opinion. It provides a visual cue to the original CIF reference. Also, it is common for a JSON data reader to first get a list of keys without values, and, starting with that, know what to do with them.

Well, when "save " occurs it is more like 1000 (core CIF) to 7,500 (pdbx/mmCIF dictionary) times, so there may be some marginal benefit. Also, there is no problem naming a datablock "data_something" if you wish, that is not forbidden by the spec.  It is also possible to give datablocks completely different names when translating from a CIF format file, as the names really should be arbitrary.
 

I just noticed that in several of the files I have the _audit_link_block_code values reference data block names, but I think those are actually supposed to be referencing _audit_block_code entries. So I think these may be broken. Pretty sure they were hand-made:

data_comp1012814988
_exptl_crystal_type_of_structure comp
...
loop_
_audit_link_block_code
_audit_link_block_description
? 'common experimental and publication data'
1012814988_0_MOD 'modulated structure (Global data)'
1012814988_1_MOD 'modulated structure (subsystem 1)'
1012814988_2_MOD 'modulated structure (subsystem 2)'
1012814988_0_REFRNCE 'reference structure (Global data)'
1012814988_1_REFRNCE 'reference structure (subsystem 1)'
1012814988_2_REFRNCE 'reference structure (subsystem 2)'
...
data_1012814988_1_MOD
_cell_length_a  4.905(2)
...

Yes, the block name should be completely arbitrary and not referenced from within any block.

    2. Save frames.  What is the problem with just doing this?


         "data_another_block":{
            "_abc":["xyz"],
            "save_internal":{"_abc":["yzx"],
                            "_r.fruit":["apple","pear"],
                            "_r.colour":["red","green"]}
                            },
         }

    That is, why the special "frames" list?


Well, we could do it the way you suggest, perhaps with a capital 'S'.  Is there any reason to prefer one over the other? I would have thought that putting all save frames under a single name would make processing a datablock slightly easier, as you don't have to check every block entry for the 'save_' sequence when running through a datablock, especially as these frames only occur in dictionaries.  To get all save frames under the current spec you would just have to go something like:

defs = myjson['blockname']['frames']

but if you need them under your proposal you'd have to go something vaguely like:

defs = myjson['blockname']
save_names = defs.keys().filter(key[0:5] == "save_")

The former approach is faster, and seems simpler but perhaps that's just me?

BH: There's no capital/lower case issues for the first character data block item names. These in CIF all start with "_". So anything that doesn't do that could be a save name. I don't think speed is any issue here. These are very high-level operations -- only a handful per file. All data block item names have to be checked for "_" anyway. Right?

Like I said, dictionaries may have 7,500 save frames.  If they are all in 'frames' there is no need to check for leading underscores.


Yes, of course. I wasn't thinking. Maybe something in there about standard JSON quoting of \" and \n. I wonder if it should require that "new line" be a UNIX new line, \n, not \r or \r\n.

We follow JSON principles for encoding string values, so there is no need for us to second-guess that.  Translators from CIF files would have to follow the CIF spec in working out the actual string contents.  Worth adding this as a comment, though.

2. data names in case-normal form: I can see no problems with this, does anybody else have any thoughts?  Such a restriction would handily enforce the need for datanames to be canonically-caselessly unique as JSON requires all object names to be unique within their parent object.

I'm having problems identifying what "data name" means. Is that the CIF data block header names?  or the CIF data block item names? I think that's what got me confused.

A data name is the thing defined in a CIF dictionary, what I think you are referring to as "CIF data block item name".

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.