Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF-JSON draft 2017-05-08

The problem with requiring that CIF numbers = JSON numbers is that it is not generally possible for a CIF->JSON parser to know when a CIF value is a number instead of simply a non-delimited string that looks like a number.  The only way to get this right is to have access to CIF dictionar(ies) that contain all the datanames appearing in the datablock, which is both a considerable overhead (especially for 7500 definitions in pdbx/mmCIF) and not foolproof due to local datanames.

The consumer of the JSON, on the other hand, will know which of the datanames that it cares about are numeric and perform the conversion (as per CIF rules, I don't know if the C++17 standard is relevant here). The optimisation that I see is that, having performed this conversion once, it may be nice to be able to pass the JSON on to another script written by a different author and preserve the conversion work that has been done.

On 11 May 2017 at 17:16, Marcin Wojdyr <wojdyr@gmail.com> wrote:
> As far as numbers go, it is clear that representation of numbers as strings
> should be allowed in order to support translation from CIF files.

Translation from CIF is *easier* when the numbers are written as
strings, because one doesn't even need to parse numbers. But,
comparing with complexity of parsing the CIF format, parsing numbs and
writing them as possibly two separate numbers is not that difficult.
The downside of the quoted representation in JSON is of course that
the recipient of such JSON file, after presumably using a third-party
JSON parser, needs to finish the parsing himself.

John reasonably argued that a single representation is better than
two. After thinking about it I'd agree. But I'd not agree with the
choice. Parsing numbers on the reading side should be done entirely by
a JSON parser. Usually there are more consumers of file formats than
producers, and the extra complexity is preferable in the CIF->JSON
step rather than when working with JSON.

If anyone thinks that parsing numbs is trivial and no extra complexity
is involved, I propose that someone familiar with C or C++ writes here
a (thread-safe) function that can parse the numb format. As a hint:
functions to parse numbers in a locale-independent way are available
only in C++17 which is not widely adopted yet.

Marcin
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.