[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
RE: CIF-JSON draft 2017-05-08
- Subject: RE: CIF-JSON draft 2017-05-08
- From: "Bollinger, John C" <John.Bollinger@xxxxxxxxxx>
- Date: Mon, 8 May 2017 18:13:38 +0000
- Accept-Language: en-US
- authentication-results: iucr.org; dkim=none (message not signed)header.d=none;iucr.org; dmarc=none action=none header.from=STJUDE.ORG;
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=SJCRH.onmicrosoft.com; s=selector1-stjude-org;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;bh=rxiDThXb+wN5wc3llHl0nnP16FFP+S2SP8Ubihh6jIo=;b=QHzHQBopF3qN7cMTYSyn/we/foUey/+gMIySCDCtehZRP1tATex6rlQI99EiwqV+nAWpbiYLGzN+EdNIW2dwahMjmRc0cK0zirAvP7TrS2ViqCPghrdlGBgbssaLKfKcnHq+s1s6aB6XPwWxmrB56jpEgT/vwupkTTtevmb5Bog=
- In-Reply-To: <CAF_YUvW=i0XjfzmgA=m03a=X4Y03+FfH8_TAZ2dNPVAhCmuy5w@mail.gmail.com>
- References: <CAM+dB2cwoCG6LhPUePRup_hQtM9mXqwL4tULTPf-WGwJGtKrOA@mail.gmail.com><MWHPR04MB051220BFF5C7093CD86702CCE0EE0@MWHPR04MB0512.namprd04.prod.outlook.com><CAF_YUvW=i0XjfzmgA=m03a=X4Y03+FfH8_TAZ2dNPVAhCmuy5w@mail.gmail.com>
- spamdiagnosticmetadata: NSPM
- spamdiagnosticoutput: 1:99
On Monday, May 08, 2017 11:46 AM, Robert Hanson wrote: > John, I'm still not clear what you mean by this: > >> It is in service to both points that I raised the issue of the ambiguity between CIF lists as values on one hand and multiple values in a loop on the other. >> I am satisfied to have loop tags be required so as to enable that to be disambiguated, but the more I think about it, the more convinced I become that the best solution would be to present every item’s values, whether one or many, in a JSON array. > > Could you give an example? There are two overlapping issues here. I sent an example of the first to the group on May 4th, showing how the same CIF-JSON could be interpreted in two different ways. That depends on the fact that as (still) specified, JSON arrays are used both as containers for the multiple values that a looped item takes, and also directly to represent individual values that are CIF2 lists. That can be disambiguated under the latest draft by referring to the relevant "loop tags" to determine whether a given item is presented as part of a loop. Of course, that disambiguation depends on the values items so identified being presented in an array, even when there is only one loop packet, and on List values that do not belong to a looped item being presented directly. James remarked that it could also be disambiguated by the program having prior knowledge of the expected data type for a given item. I won't repeat the whole message, but here's the ambiguous JSON: { "block": { "_xyz": [ 0.1, 0.2, 0.3 ] } } The other issue is indeed the one you describe, that CIF overall does not draw an inherent distinction between unlooped items and items presented in a single-packet loop. The mmCIF and other DDL2 dictionaries in fact explicitly disclaim any semantic distinction between those alternatives, so your example: > _chem_comp_atom.comp_id CA > _chem_comp_atom.atom_id CA > _chem_comp_atom.alt_atom_id CA > _chem_comp_atom.type_symbol CA > _chem_comp_atom.charge 2 > _chem_comp_atom.pdbx_align 0 > _chem_comp_atom.pdbx_aromatic_flag N > _chem_comp_atom.pdbx_leaving_atom_flag N > _chem_comp_atom.pdbx_stereo_config N > _chem_comp_atom.model_Cartn_x 0.000 ... is completely equivalent to: loop_ _chem_comp_atom.comp_id _chem_comp_atom.atom_id _chem_comp_atom.alt_atom_id _chem_comp_atom.type_symbol _chem_comp_atom.charge _chem_comp_atom.pdbx_align _chem_comp_atom.pdbx_aromatic_flag _chem_comp_atom.pdbx_leaving_atom_flag _chem_comp_atom.pdbx_stereo_config _chem_comp_atom.model_Cartn_x CA CA CA CA 2 0 N N N 0.000 As you say, the current CIF-JSON allows both of these: "_chem_comp_atom.model_Cartn_x" : "0.000" "_chem_comp_atom.model_Cartn_x":["-23.107","-22.157","-23.424"] Moreover, it *also* allows this: "_chem_comp_atom.model_Cartn_x" : ["0.000"] I would prefer that there not be two different ways to represent semantically equivalent data. > John, are you suggesting that perhaps every JSON entry should be an array so that no array test has to be made? So, for example, > _chem_comp.id HOH > _chem_comp.name WATER > _chem_comp.type NON-POLYMER > _chem_comp.pdbx_type HETAS > _chem_comp.formula "H2 O" > would become: > "_chem_comp.id":["HOH"], > "_chem_comp.name":["WATER"], > "_chem_comp.type":["NON-POLYMER"], > "_chem_comp.pdbx_type":["HETAS"], > "_chem_comp.formula":["H2 O"], > equivalence in CIF between scalars and items in single-packet loops. Yes, that's exactly what I'm suggesting. No option for an item's values being presented outside an array. That solves both problems: we have only one representation for item values (contained in an array), and therefore don't have to perform an array test, AND there is no longer any ambiguity about whether the outermost array is a container for values, or the value itself. > [...] > >>(3) Made loop tags compulsory >You lost me on this one. What's an example of what we are after here? Is it as in this example, from a magnetic CIF file: That was a quote from James's preceding message, describing one of the changes in the latest draft. He was talking about point (6), which now reads, in part, "A JSON datablock object *must* contain a special name: loop tags" (emphasis added). That change was one of my suggestions for providing for disambiguating loops from lists, albeit not the one I currently favor. This is an improvement because if the values of syntactically-unlooped items are expected to not be presented inside an array, as seems to be the case with the present draft, then you can rely on checking whether an item's name is present among the loop tags to determine how to interpret it. And you seem indeed to have gotten it, as your magCIF example is on target. It seems clear that this: > loop_ > _parent_propagation_vector.id > _parent_propagation_vector.kxkykz > k1 [-0.75 0.75 -0.75] is intended to be represented like so: > "_parent_propagation_vector.id": ["k1"] > "_parent_propagation_vector.kxkykz": [[-0.75 0.75 -0.75]] but consider this alternative CIF: _parent_propagation_vector.id k1 _parent_propagation_vector.kxkykz [-0.75 0.75 -0.75] Since it is semantically equivalent to the preceding one, it should be acceptable to transform it to the same CIF-JSON representation. But suppose we instead we translate it to this: > "_parent_propagation_vector.id": "k1" > "_parent_propagation_vector.kxkykz": [-0.75 0.75 -0.75] ... then we have just the kind of ambiguity I've been going on about. I am proposing that we allow only the first form. > Thus, if we happened upon the kxkykz entry first, we might presume we had a loop in the second case. We would have to know the context -- that kxkykz is always an array. But how/why would we know that context? And in some imaginable case, we might have: > > "_parent_propagation_vector.kxkykz": [-0.75 0.75 -0.75] > "_parent_propagation_vector.gxgygz": [-0.75 0.75 -0.75] >In which case without any context, we would decode these as loops. Yes, just so. Cheers, John ________________________________ Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer _______________________________________________cif-developers mailing listcif-developers@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- Re: CIF-JSON draft 2017-05-08 (Robert Hanson)
- Re: CIF-JSON draft 2017-05-08 (Marcin Wojdyr)
- References:
- CIF-JSON draft 2017-05-08 (James Hester)
- RE: CIF-JSON draft 2017-05-08 (Bollinger, John C)
- Re: CIF-JSON draft 2017-05-08 (Robert Hanson)
- Prev by Date: Re: CIF-JSON draft 2017-05-08
- Next by Date: Re: CIF-JSON draft 2017-05-08
- Prev by thread: Re: CIF-JSON draft 2017-05-08
- Next by thread: Re: CIF-JSON draft 2017-05-08
- Index(es):