Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF-JSON new draft

Hello again,

"loop tags" is optional on the basis that dictionaries will provide the needed information on which datanames belong together in a loop, as well as on the type of dataitem to expect.  In other words, the programmer manipulating JSON-derived data already knows which dataitems belong together when writing the program, and so never needs the "loop tag" entry (counterexample?).  A potential issue does arise with DDLm child loops, which can be presented either separately from the parent loop, or together in a single loop with their parent datanames.  However, in this case the presence or absence of the child loop key dataname(s) is sufficient to determine which datanames go into which loops.

That said, I support John's suggestion to put all datanames in lists, as it nicely unifies datavalue representation, making programming easier. mmCIF is a case in point because in CIF syntax every mmCIF dataname can be looped, or if there is only a single value a dataname can instead appear unlooped.  Semantically (i.e. in the mmCIF dictionary) this just corresponds to "everything looped", so every single mmCIF dataname in JSON should be list-valued in any case.  I suppose you could also assert that a non-list-valued dataname is just syntactic sugar for 'a single-element potentially looped dataname', but how useful is this syntactic sugar?

James.

On 4 May 2017 at 23:48, Bollinger, John C <John.Bollinger@stjude.org> wrote:
On Wednesday, May 03, 2017 5:42 PM, Robert Hanson wrote:
> On Wed, May 3, 2017 at 4:03 PM, Bollinger, John C <John.Bollinger@stjude.org> wrote:
>> Dear CIF Developers,
>>
>> I additionally think that the current version of the specification is too lax about use of the "loop tags" item.  If the CIF version is unspecified among the metadata or if it is specified with value 2.0, and if the data block being represented contains either at least one loop or at least one unlooped item whose value is a CIF2 list, then the meaning of some values is ambiguous without the "loop tags".  It is undesirable to allow such ambiguity.
>
>
> John, I don't see that. Can you give a concrete example?

Sure.  Consider this JSON:

{
  "block": {
    "_xyz": [ 0.1, 0.2, 0.3 ]
  }
}

It appears to me that it corresponds both to

#\#CIF_2.0
data_block
_xyz [ 0.1 0.2 0.3 ]
# end of CIF

and to

#\#CIF_2.0
data_block
loop_
_xyz
 0.1
 0.2
 0.3
# end of CIF

Of course, the latter could trivially be converted to CIF 1.1, with that result also corresponding to the given JSON.

On the other hand, only the latter corresponds to this JSON:

{
  "block": {
    "_xyz": [ 0.1, 0.2, 0.3 ],
    "loop tags": [["_xyz"]]
  }
}

and only the former corresponds to this JSON:

{
  "block": {
    "_xyz": [ 0.1, 0.2, 0.3 ],
    "loop tags": []
  }
}

The problem arises from the similarity between items 5.v and 5.vii in the draft spec: CIF 2.0 list values are presented as JSON lists / arrays, and the multiple values of a looped item are also presented as JSON lists / arrays.  As I read the draft, however, the values of unlooped items are presented bare, as opposed to, for example, as single-element arrays.  Therefore, some form of metadata is required to distinguish whether an array presented as an item's value represents a single CIF list value or multiple distinct values taken by the item in a loop.  The appropriate "loop tags" field can provide the needed metadata if it is present, given the requirement that if it appears at all then it must describe all loops in the data block.  I had supposed that the ability to use it for that purpose was the reason for that constraint.

However, inasmuch as CIF does not make an inherent semantic distinction between items presented as scalars and the same items presented in a single-packet loop (and mmCIF in particular denies any significance to such differences), an alternative to requiring that loop tags be provided would be to present every item as if it were looped -- i.e. regardless of whether an item is presented syntactically in a loop in the CIF format, its one or many values are presented in an array in CIF-JSON.  In that case, the "loop tags" field would not need to carry the burden, and the constraints on it could even be relaxed a bit.  Such a representation is consistent with CIF's underlying data model as I conceive it (http://forums.iucr.org/viewtopic.php?f=27&t=77).


John


________________________________

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.