[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF-JSON draft 2017-05-15

Subject: Re: CIF-JSON draft 2017-05-15
From: Robert Hanson <hansonr@xxxxxxxxxx>
Date: Mon, 15 May 2017 22:09:03 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stolaf.edu; s=stolaf;h=mime-version:in-reply-to:references:from:date:message-id:subject:to;bh=IMD2acnL2MrTUGgBvbzDULwEKVRCUWlUAwwiZFgnQ1g=;b=imxfEUZoSo8KpO8hJKZupBOx9t+qfZAho7Az2mNI++JprBH+loSQELam2I7jXAwxSAis5RbXByy2BocMbkNQ9QZc0dDi1Z2urHaeJrV37KN5gqCs36BNzzBoMNQTmMrfJvU8SKE4dq31DJL+ss45Yw/rjk/Sgu4agCWO+iR2om8=
In-Reply-To: <CAF_YUvU+O3yb-JG_zZWvTW0i7EuzA1_J3c0Q8drWqb6rScSkEw@mail.gmail.com>
References: <CAM+dB2cAAY3CjC741WU0GqaWmAi2iGHtwZ82iA1bCwuNoG6nQg@mail.gmail.com><CAF_YUvVa0JXgAN9Bec6fjd6Nmco5W7EHWLN=Ba=daRR9U-2PzQ@mail.gmail.com><CACaHzQVYeOfS9zTe4gDJMVNCTgxtat0zhMAd2EMX=etXj5g2Gw@mail.gmail.com><CAF_YUvWV3huhwoBuCOkKXqCcYH+8kjFCPjoa=j_v61JC4H32Jg@mail.gmail.com><CAF_YUvUMbVz2j5=XXD5WD3G+0XB4tY=r1zwXuADB14qQNZTPvg@mail.gmail.com><CAF_YUvXPj9g6CEnsKmwZwrp7p550nyykrnQY9iN5wpEnDzrrYQ@mail.gmail.com><CAF_YUvUaJM0bwV4vKZwARQw4Q44LN+TOyr4Ghge3P_qn90tUHQ@mail.gmail.com><CAF_YUvUGexW-7iSfJWQ_c=WZNnbzBJeFaNfzKRVK4DKmdydp+A@mail.gmail.com><CAM+dB2dY8xEp8BeSJfXD5E9+H6-qkBCodMV_gc-fetLNp=x_WA@mail.gmail.com><CAF_YUvU+O3yb-JG_zZWvTW0i7EuzA1_J3c0Q8drWqb6rScSkEw@mail.gmail.com>

ps -- these are the two documents I am working from for CIF terminology:

https://www.iucr.org/resources/cif/spec/version1.1/cifsyntax
http://journals.iucr.org/j/issues/2016/01/00/aj5269/index.html#SEC3.9

I think it's important to be precise and consistent with those in referencing CIF concepts.

Bob

On Mon, May 15, 2017 at 10:05 PM, Robert Hanson <[email protected]> wrote:

Sorry for the fragmented messages -- probably should have waited until I was done with that. Here are my responses to James all in one document. Thanks very much for the clarification, James.

Hi again Bob,

On 16 May 2017 at 03:53, Robert Hanson <[email protected]> wrote:

    Two questions arising:

    1) CIF2 mentions byte order. This would be the (optional) first two characters of the data stream?

Byte order marks are a JSON syntax issue and thus not relevant to this specification. https://tools.ietf.org/html/rfc7159#section-8.1 states that JSON implementations must not add BOMs to streams, but parsers may ignore BOMs in the interests of interoperability.

OK, I think the real answer is that unlike UTF-16, UTF-8 does not have little- or big-endian byte order. Perfect.

    2) list item names need not be lower-case, right? Nothing I see in CIF says that they conform to the requirements of data names. Thus, CIF2 could have upper- and lower-case names in list items.

Currently lower case is only enforced for JSON datablock names. The CIF syntaxes require that datanames appearing in a datablock must have canonical caseless forms that do not match, but the datanames themselves do not have to be presented in canonical caseless form. As case is, in general, significant for datavalues, list item names and any other datavalues may be in upper case, capitalised etc.

OK, I think I am confused by all these terms. I just hadn't thought about all these names yet. I'm using the language of http://journals.iucr.org/j/issues/2016/01/00/aj5269/index.html#SEC3.9 here.

Here's what I suggest:

the JSON equivalents of the CIF data block headers must be lower case (not sure that matters if "data_" precedes them.)
the JSON equivalents of the CIF data block item names must be lower case (this is critical)
the JSON equivalents of the CIF save block names must be lower case (not sure that matters if "save_" precedes them.)
the JSON equivalents of the CIF save block item names must be lower case (this is critical)
the JSON equivalents of the CIF table data item names may be any case (no restriction here)

I will just point out the that CIF2 spec does not use the word "dataname" and instead uses the more natural language "data name" and I wish this would, too.

On 16 May 2017 at 04:16, Robert Hanson <[email protected]> wrote:

    Two points:

    1. I do not understand the stripping of "data_" and "save_" from the names we have for these.

These are stripped as the data_ and save_ parts of the names perform a syntactical role in CIF, and the JSON curly brace essentially replaces their syntactical function of encapsulation. Put another way, requiring that datablock names started with 'data_' would be unnecessary additional baggage.

Yes, OK, I see that. Still, I would suggest it doesn't add a significant amount of baggage to add "data_" or "save_" once or twice in a file, and it significantly improves readability both by humans and machines, at least in my opinion. It provides a visual cue to the original CIF reference. Also, it is common for a JSON data reader to first get a list of keys without values, and, starting with that, know what to do with them.

I just noticed that in several of the files I have the _audit_link_block_code values reference data block names, but I think those are actually supposed to be referencing _audit_block_code entries. So I think these may be broken. Pretty sure they were hand-made:

data_comp1012814988
_exptl_crystal_type_of_structure comp
...
loop_
_audit_link_block_code
_audit_link_block_description
? 'common experimental and publication data'
1012814988_0_MOD 'modulated structure (Global data)'
1012814988_1_MOD 'modulated structure (subsystem 1)'
1012814988_2_MOD 'modulated structure (subsystem 2)'
1012814988_0_REFRNCE 'reference structure (Global data)'
1012814988_1_REFRNCE 'reference structure (subsystem 1)'
1012814988_2_REFRNCE 'reference structure (subsystem 2)'
...
data_1012814988_1_MOD
_cell_length_a 4.905(2)
...

    2. Save frames. What is the problem with just doing this?

         "data_another_block":{
            "_abc":["xyz"],
            "save_internal":{"_abc":["yzx"],
                            "_r.fruit":["apple","pear"],
                            "_r.colour":["red","green"]}
                            },
         }

    That is, why the special "frames" list?

Well, we could do it the way you suggest, perhaps with a capital 'S'. Is there any reason to prefer one over the other? I would have thought that putting all save frames under a single name would make processing a datablock slightly easier, as you don't have to check every block entry for the 'save_' sequence when running through a datablock, especially as these frames only occur in dictionaries. To get all save frames under the current spec you would just have to go something like:

defs = myjson['blockname']['frames']

but if you need them under your proposal you'd have to go something vaguely like:

defs = myjson['blockname']
save_names = defs.keys().filter(key[0:5] == "save_")

The former approach is faster, and seems simpler but perhaps that's just me?

BH: There's no capital/lower case issues for the first character data block item names. These in CIF all start with "_". So anything that doesn't do that could be a save name. I don't think speed is any issue here. These are very high-level operations -- only a handful per file. All data block item names have to be checked for "_" anyway. Right?

1. Triple-quoted CIF strings: these are purely a CIF syntactical device for encapsulating a string value, there is no meaning attached to these that is not captured by a normal JSON string.

Yes, of course. I wasn't thinking. Maybe something in there about standard JSON quoting of \" and \n. I wonder if it should require that "new line" be a UNIX new line, \n, not \r or \r\n.

2. data names in case-normal form: I can see no problems with this, does anybody else have any thoughts? Such a restriction would handily enforce the need for datanames to be canonically-caselessly unique as JSON requires all object names to be unique within their parent object.

I'm having problems identifying what "data name" means. Is that the CIF data block header names? or the CIF data block item names? I think that's what got me confused.

Bob

Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

_______________________________________________
cif-developers mailing list
[email protected]
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]

Follow-Ups:

Re: CIF-JSON draft 2017-05-15 (Robert Hanson)

References:

CIF-JSON draft 2017-05-15 (James Hester)

Re: CIF-JSON draft 2017-05-15 (Robert Hanson)

Re: CIF-JSON draft 2017-05-15 (Marcin Wojdyr)

Re: CIF-JSON draft 2017-05-15 (Robert Hanson)

Re: CIF-JSON draft 2017-05-15 (Robert Hanson)

Re: CIF-JSON draft 2017-05-15 (Robert Hanson)

Re: CIF-JSON draft 2017-05-15 (Robert Hanson)

Re: CIF-JSON draft 2017-05-15 (Robert Hanson)

Re: CIF-JSON draft 2017-05-15 (James Hester)

Re: CIF-JSON draft 2017-05-15 (Robert Hanson)

Prev by Date: Re: CIF-JSON draft 2017-05-15

Next by Date: Re: CIF-JSON draft 2017-05-15

Prev by thread: Re: CIF-JSON draft 2017-05-15

Next by thread: Re: CIF-JSON draft 2017-05-15

Index(es):

Date

Thread

Discussion List Archives

Re: CIF-JSON draft 2017-05-15