Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Draft JSON specification for CIF

Replies in-line.

On 13 April 2017 at 00:57, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Dear CIF-developers,

 

A JSON representation of CIF is absolutely something that can and should be done, and the proposal seems viable and fairly complete.  Nevertheless, there are some aspects of the proposal that bear discussion:

 

1. Which JSON are we talking about?  The one described within ECMA-262?  The one described by ECMA-404?  Presumably not RFC4627, but maybe its successor, RFC7159?  In fact, my vote would be for I-JSON, as described by RFC7493 (a restricted profile of RFC7159 JSON).   I-JSON handles several interoperability issues that otherwise we would either suffer or have to handle explicitly.


I agree that i-JSON should be the target.

 

2. It would be useful to explicitly consider which CIF structural and semantic constraints are to be mapped to JSON language constraints.  In particular,

 

  a) The proposed representation may inherently enforce the uniqueness of data names within a data block / save frame (supposing that we’re using I-JSON; see also below), but it does not inherently enforce the constraint that every data name in the same loop has the same number of values.  I raise these together because it would be relatively straightforward to flip that, so that it is having the same number of values that is inherently enforced.  On the other hand, I think there are alternatives that would allow both to be enforced, at the cost of a more complex and / or opaque structure.  (I reserve further comment on such alternatives, pending interest from the group.)


I think it would be reasonable to remind implementers to check that datanames belonging to the same loop have the same number of values.  If the number of values is inconsistent, the loop is essentially meaningless as it is not clear which entries match up.  Beyond this reminder, I would prefer that the JSON version is not made more complex.

 

 b) CIF requires not just that block names, frame names, and data names be unique within their respective scopes, but that their *normalized forms* be unique within their scopes.  That stronger constraint is alien to JSON; do we care about JSON-CIF enforcing that as an (I-)JSON constraint?


I like Bob Hansen's suggestion that all datanames be lower-cased, which becomes in the CIF2 context normalised form.  This would seem to involve minimal effort on the part of CIF2->JSON converters.

 

 c) I don’t see a particular value in placing stronger character encoding constraints on JSON-CIF than the underlying JSON standard places.  Selection of I-JSON would make this moot, however, for I-JSON in fact requires UTF-8 encoding, just as is proposed for JSON-CIF.


Let's go with i-JSON.

 

3. We have again run into the CIF quirk of having two distinct null values, whereas JSON has only one.  Although the two-character string "\\?" is certainly reminiscent of the special CIF ? value (as "\\." would be reminiscent of the . value), why not instead choose a value such as "\u0001" or "\uFFFF", which will never appear in a conforming instance of CIF’s native serialization?


Sounds good to me.  Is there any particular value that would be better?

 

4. The aspect of the proposal that I like least is the separate table of uncertainties.  CIF data conforming to a dictionary, such as mmCIF, that provides separate data names for SUs does not need it, and I don’t immediately see why the native serialization format (presented as a string) is not a suitable choice for uncertainty-bearing values of items without a separate SU item.


The separate uncertainties object is one solution to allowing JSON numbers to represent CIF numbers, as any appended uncertainty will need to be preserved separately somewhere. If we disallow JSON numbers as an option, then we can drop the uncertainties object.  I included JSON numbers as I speculated that use of JSON as an interprocess interchange format would mean that reformatting and reparsing strings that are manipulated internally as numbers would have an undesirable efficiency impact for large files (e.g. mmCIF).  I have not benchmarked this impact, however. Perhaps some programmers with JSON experience can comment?

all the best,
James.

 



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.