Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: CIF-JSON draft 2017-05-08



On Monday, May 08, 2017 11:46 AM, Robert Hanson wrote:

> John, I'm still not clear what you mean by this:
>
>> It is in service to both points that I raised the issue of the ambiguity between CIF lists as values on one hand and multiple values in a loop on the other.
>> I am satisfied to have loop tags be required so as to enable that to be disambiguated, but the more I think about it, the more convinced I become that the best solution would be to present every item’s values, whether one or many, in a JSON array.
>
> Could you give an example?

There are two overlapping issues here.  I sent an example of the first to the group on May 4th, showing how the same CIF-JSON could be interpreted in two different ways.  That depends on the fact that as (still) specified, JSON arrays are used both as containers for the multiple values that a looped item takes, and also directly to represent individual values that are CIF2 lists.  That can be disambiguated under the latest draft by referring to the relevant "loop tags" to determine whether a given item is presented as part of a loop.  Of course, that disambiguation depends on the values items so identified being presented in an array, even when there is only one loop packet, and on List values that do not belong to a looped item being presented directly.  James remarked that it could also be disambiguated by the program having prior knowledge of the expected data type for a given item.  I won't repeat the whole message, but here's the ambiguous JSON:

{
  "block": {
    "_xyz": [ 0.1, 0.2, 0.3 ]
  }
}

The other issue is indeed the one you describe, that CIF overall does not draw an inherent distinction between unlooped items and items presented in a single-packet loop.  The mmCIF and other DDL2 dictionaries in fact explicitly disclaim any semantic distinction between those alternatives, so your example:

> _chem_comp_atom.comp_id                    CA
> _chem_comp_atom.atom_id                    CA
> _chem_comp_atom.alt_atom_id                CA
> _chem_comp_atom.type_symbol                CA
> _chem_comp_atom.charge                     2
> _chem_comp_atom.pdbx_align                 0
> _chem_comp_atom.pdbx_aromatic_flag         N
> _chem_comp_atom.pdbx_leaving_atom_flag     N
> _chem_comp_atom.pdbx_stereo_config         N
> _chem_comp_atom.model_Cartn_x              0.000

... is completely equivalent to:

loop_
 _chem_comp_atom.comp_id
 _chem_comp_atom.atom_id
 _chem_comp_atom.alt_atom_id
 _chem_comp_atom.type_symbol
 _chem_comp_atom.charge
 _chem_comp_atom.pdbx_align
 _chem_comp_atom.pdbx_aromatic_flag
 _chem_comp_atom.pdbx_leaving_atom_flag
 _chem_comp_atom.pdbx_stereo_config
 _chem_comp_atom.model_Cartn_x
CA  CA  CA  CA  2  0  N  N  N  0.000

As you say, the current CIF-JSON allows both of these:
"_chem_comp_atom.model_Cartn_x" : "0.000"
"_chem_comp_atom.model_Cartn_x":["-23.107","-22.157","-23.424"]

Moreover, it *also* allows this:
"_chem_comp_atom.model_Cartn_x" : ["0.000"]

I would prefer that there not be two different ways to represent semantically equivalent data.

> John, are you suggesting that perhaps every JSON entry should be an array so that no array test has to be made? So, for example,
> _chem_comp.id                                    HOH
> _chem_comp.name                                  WATER
> _chem_comp.type                                  NON-POLYMER
> _chem_comp.pdbx_type                             HETAS
> _chem_comp.formula                               "H2 O"
> would become:
> "_chem_comp.id":["HOH"],
> "_chem_comp.name":["WATER"],
> "_chem_comp.type":["NON-POLYMER"],
> "_chem_comp.pdbx_type":["HETAS"],
> "_chem_comp.formula":["H2 O"],
> equivalence in CIF between scalars and items in single-packet loops.

Yes, that's exactly what I'm suggesting.  No option for an item's values being presented outside an array.  That solves both problems: we have only one representation for item values (contained in an array), and therefore don't have to perform an array test, AND there is no longer any ambiguity about whether the outermost array is a container for values, or the value itself.

> [...]
>
>>(3) Made loop tags compulsory
>You lost me on this one. What's an example of what we are after here? Is it as in this example, from a magnetic CIF file:

That was a quote from James's preceding message, describing one of the changes in the latest draft.  He was talking about point (6), which now reads, in part, "A JSON datablock object *must* contain a special name: loop tags" (emphasis added).  That change was one of my suggestions for providing for disambiguating loops from lists, albeit not the one I currently favor.  This is an improvement because if the values of syntactically-unlooped items are expected to not be presented inside an array, as seems to be the case with the present draft, then you can rely on checking whether an item's name is present among the loop tags to determine how to interpret it.

And you seem indeed to have gotten it, as your magCIF example is on target.  It seems clear that this:

> loop_
>  _parent_propagation_vector.id
>  _parent_propagation_vector.kxkykz
>  k1 [-0.75 0.75 -0.75]

is intended to be represented like so:

>  "_parent_propagation_vector.id": ["k1"]
>  "_parent_propagation_vector.kxkykz": [[-0.75 0.75 -0.75]]

but consider this alternative CIF:

_parent_propagation_vector.id           k1
_parent_propagation_vector.kxkykz [-0.75 0.75 -0.75]

Since it is semantically equivalent to the preceding one, it should be acceptable to transform it to the same CIF-JSON representation.  But suppose we instead we translate it to this:

>  "_parent_propagation_vector.id": "k1"
>  "_parent_propagation_vector.kxkykz": [-0.75 0.75 -0.75]

... then we have just the kind of ambiguity I've been going on about.  I am proposing that we allow only the first form.

> Thus, if we happened upon the kxkykz entry first, we might presume we had a loop in the second case. We would have to know the context -- that kxkykz is always an array. But how/why would we know that context? And in some imaginable case, we might have:
>
>  "_parent_propagation_vector.kxkykz": [-0.75 0.75 -0.75]
>  "_parent_propagation_vector.gxgygz": [-0.75 0.75 -0.75]
>In which case without any context, we would decode these as loops.

Yes, just so.


Cheers,

John


________________________________

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________cif-developers mailing listcif-developers@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.