RE: Draft JSON specification for CIF

Dear CIF Developers,

I think I have come around to a position similar to Andrius's (below), but I'm going to approach it from a different direction.

One of the longtime CIF gotchas is that strings can be presented unquoted, and that leaves some ambiguities.  In particular, it is easy to misinterpret strings that have a form that could be interpreted as a number, but are meant to be interpreted as text.  Treating such a value as a number can corrupt or lose information, as Bob also observed.  One ultimately has to rely on item definitions to know how to interpret CIF data -- whether by reading definitions dynamically from a dictionary file, or simply by writing knowledge of specific items of interest directly into the program.  It is to be expected that JSON-CIF will frequently be handled with the help of general-purpose JSON libraries or even directly by JavaScript, which will not be prepared to make this kind of distinction.

Overall, whether a value is presented in quoted or unquoted form is significant in CIF, and therefore must be represented in some way in JSON-CIF (though the type of quoting used is unimportant).  Only some of that significance can be determined from a CIF itself, so in the general case, a program serializing CIF to JSON-CIF cannot be expected to have enough information to convey that information via data types alone. If we solve that problem by choosing a JSON form that preserves information about whether values are quoted, then we get natural representations of the two null values as well.  This does make JSON-CIF a slightly lower-level representation, but I think that's appropriate.

Additional fun fact: CIF can express numeric values that cannot be represented in any native numeric format available on a given machine.



Dear Marcin,

that's why I suggest retaining CIF value types. CIF parser knows the type of each value read (here is the difference between ? and '?') and I suggest storing this bit of information inside JSON explicitly. An alternative would be to use JSON boolean datatype and/or null value, or "\u0001" and "\uFFFF" as suggested by John. I personally recommend against using any escaping as this would add another layer of complexity.


On 13/04/17 18:13, Marcin Wojdyr wrote:
> Bob,
>> - This adds the unnecessary complication of what to do with "." and "?"
> I think you imply that . and ? should be expressed in JSON as "." and "?".
> But this would be ambiguous: JSON "?" could mean either unknown or string "?".
> Marcin
