Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: CIF-JSON draft 2017-05-08

On Thursday, May 11, 2017 3:18 AM, Marcin Wojdyr wrote:

> When I started experimenting with CIF parser a few months ago I noticed that some programs (I think cif2cif was one of them) just treat what looks like a number as a number, ignoring the ambiguity left in the CIF spec. But this indeed would not be fully spec conforming. Forget about my previous email.

Historical Notes:

The 1991 CIF paper specifies that data values should be interpreted as numbers if they are presented unquoted in a form that _starts_ like a number.  Much CIF software from around that time applied that rule, and some of that software persists today.  I've always supposed that that scheme was chosen for the convenience of software authors, and I've long wondered whether it might have been primarily a characterization of the actual behavior of a particular early parser.  Certainly line-length limits and the text-block syntax cater to the characteristics of Fortran I/O.

Regardless, although that approach puts the onus on CIF writers to quote string values where necessary to avoid them being misinterpreted as numbers or almost-numbers, doing so did not emerge as a consistent practice.  Moreover, data items came into common use whose values are strings that can take numeric-like form.  That combination made the implicit data typing approach unfeasible.  That left relying on prior or external information about data items to determine how to interpret their values, and that is basically where we are today.  The needed information is a key part of items' definitions in the relevant dictionary, and CIF 1.1 in fact specifies that that is the source from which it should be drawn.  ITvG section contains a discussion of this topic.

As ITvG observes, however, it may be that CIF software has to handle values for items for which it has no definition to draw upon.  ITvG suggests a strategy in such cases of interpreting values as numbers if they can successfully be parsed as numbers, but although that's a stronger condition than the one given in the 1991 paper, it is not fundamentally any better.  This is one of CIF's historic weak areas, rearing its ugly head again in our present discussion.

>> The consumer of the JSON, on the other hand, will know which of the
>> datanames that it cares about are numeric and perform the conversion
>> (as per CIF rules, I don't know if the C++17 standard is relevant here).
> I meant that there is no simple way to a write numb parser in C/C++ otherwise.

I suppose you were observing that conventional CIF numeric format is locale-insensitive, but the C standard library's `strtod()` is locale-sensitive, and the current locale on which it relies is a process-wide property.  Indeed, if you want your C program to parse CIF numeric format in a manner that is both thread-safe and correctly locale-insensitive then you need either to do at least a bit more work or to rely on a suitable third-party library, such as the CIF API.  How simple that is is a matter of opinion, I guess.  Before the CIF API, I had written parse functions for CIF numeric format at least twice before, in C and Fortran, and maybe in Java as well.




Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________cif-developers mailing listcif-developers@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.