Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF-JSON draft 2017-05-08

Dear All,

John has made a clear and I think persuasive case for enclosing all datavalues in list brackets. The case against would appear to simply be that the non-list presentation of unlooped data is more familiar to CIF programmers. This case against is less persuasive given that CIF datafiles that use datanames from DDL2 and DDLm dictionaries do actually allow presentation of single-valued datanames either looped or unlooped with no change in meaning, so I think on balance John's suggestion is best.

As far as numbers go, it is clear that representation of numbers as strings should be allowed in order to support translation from CIF files.  John has argued on the basis of simplicity that we stick with strings only.  My counterargument is that we are trying to produce a lightweight working format, and if users *want* to work with numbers and bear the extra complexity that entails, we should consider it.  Otherwise, if we allow only strings, then ad-hoc optimisations (such as custom JSON names) are available to those unwilling to repeatedly bear the cost of parseFloat, and there could be an unfortunate consequence that multiple incompatible optimisations might appear, something which we could have avoided by providing a standard now.  Of course,  it would be great to wait until it was clear whether or not CIF-JSON users were passing around "optimised" CIF-JSON objects, and only then standardise on one particular mechanism. However, it will be too late then, as code will have been written to the original spec, potentially assuming incompatible behaviour, *and* the optimisers will have their custom code as well.

Given the above, I suggest the following course of action regarding representation of numbers:

(1)  We leave the current draft behaviour as is (i.e. strings only)
(2)  We reserve all capitalised names for future expansion
(3)  We agree to bump the CIF-JSON schema major version number if we introduce JSON numbers in the future
(4)  We warn CIF-JSON programmers to check the schema version for backwards compatibility (and use semantic versioning)

With the above in place, we release the spec and wait to see whether or not there is significant support in practice for native JSON numbers.  We then have the option of specifying any of the behaviours that we have discussed in previous threads.

Finally, it seems that JSON 'false' is favoured over '\uFFFF'. I will alter the next draft accordingly.

all the best,
James.


On 10 May 2017 at 00:20, Bollinger, John C <John.Bollinger@stjude.org> wrote:

On Tuesday, May 09, 2017 5:42 AM, Marcin Wojdyr wrote:
> On 8 May 2017 at 21:47, Bollinger, John C <John.Bollinger@stjude.org> wrote:
>
>> Interpreting that as a backward step depends on asserting some kind of inherent significance to whether an item or category is presented looped.  There isn't any.  The important characteristic is not the *presentation* of the category but its *multiplicity* and key structure.
>
> Well, "_list no" guarantees multiplicity 1. I cannot see another multiplicity control in DDL1/2.

I do not dispute that DDL1 can require one syntactic form vs. the other.  I am saying that the syntactic distinction has no bona fide semantic significance.  The underlying point of DDL1's "_list no" option is to specify that the item with that attribute shall take at most one value in any given data block.  Requiring that it not be looped is the mechanism, not the objective.  We don't need that in CIF-JSON (nor is it needed more generally in CIF), because we have the option of simply limiting ourselves to presenting at most one value for the item, or even of lifting the restriction altogether.

> But if DDL2 was designed to be like RDMB schema then unlimited number of entries in block indeed fits this design.
>
> Regarding "loop tags": the original motivation was to keep track which columns are in the same loop, so that JSON can be converted back to valid CIF without knowing the context. Both:
>
> loop_ _x _y 1 2 3 4
>
> and:
>
> loop_ _x 1 3
> loop_ _y 2 4
>
> will have the same JSON representation, but a dictionary may allow only one of the two.

I am not at all concerned with full-fidelity round-tripping of CIF data from its native serialization format through CIF-JSON and back.  Nor am I particularly concerned with conveying the "in the same loop" property via CIF-JSON, though I do not object to providing "loop tags" for the purpose. The primary reason to require items to be presented in the same loop is to ensure that within the group of items presented in that loop, each value of each item can be associated with a corresponding value of each of the other items.  There are additional reasons related to convenience of processing and maybe of human reading, but we've already thrown those out by providing looped items' values in separate JSON arrays instead of in a single array of packets.

If one wanted to perform validation of CIF data presented in CIF-JSON format then one would have to reconstitute the data into logical loops anyway.  Though one could use the "loop tags" for that purpose, I see little advantage to that relative to going by items' categories as defined in their dictionary (and we are already assuming a dictionary if we propose to validate).  If one does not care to validate, on the other hand, then it makes no difference whether associated items are presented in the same loop or in different ones, as long as we can make the needed associations between corresponding item values.  CIF-JSON provides only one way to do that (by matching index in items' value arrays), and it depends on the loop tags only for discovering associations about which we have no foreknowledge.  I anticipate that many applications of CIF-JSON will not need that, and those that do might be better off with COD-JSON.

If we really wanted to convey the original loop structure then we would be better off with a JSON structure that reflected it directly.  I don't see a need for that, and if we indeed reject that idea, as our current draft does, then let's agree that it's because conveying the loop structure is not always essential.


John


________________________________

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.