Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Draft JSON specification, round 2

Dear All,

John raises an important and perhaps inevitable point: what exactly are we doing this for?  I have been assuming that we want to use JSON in order to have a lightweight, well-supported serialisation format for working with crystallographic data within client-based browser scripts and across the internet.  Thus we save programming time, processing time and transmission time.  Please comment on how accurate a description this is of common use cases.

The sticking point in the above description might be "crystallographic data".  I see two ways of looking at this:

(1): crystallographic data are the values taken by datanames defined in CIF dictionaries ("semantic approach")
(2): crystallographic data are the values taken by datanames defined in CIF dictionaries in files with CIF syntax ("syntactic approach")

The pleasant consequence of the semantic approach is that idiosyncracies of the CIF syntax are irrelevant. We must still find a way to express uncertainties, <?>, and <.> in JSON. Draft 2 uses the CIF syntax convention to store the uncertainties and discussion is ongoing on <?> and <.>.

The consequence of the syntactic approach is that, by attributing significance to the syntax, we are led to preserve even those parts of the CIF syntax that are either irrelevant to meaning or cause ambiguity, such as ordering, block name and delimiter type. 

I am strongly in favour of having the dictionaries as the central and only arbiter of meaning.  Once we have a JSON draft that is capable of expressing crystallographic information as expected by the DDL (as Draft 2 does) then we can add features that improve interoperability with CIF syntax files (because most JSONs will come from this source), such as the "Meta Data" entry and stipulating that block names match when JSON is generated from CIF.  I don't think round tripping is a worthwhile goal (but please provide a use case if you think it is), nor compromising on the advantages that JSON gives us by making the JSON unwieldy.

To answer John's specific examples of what we might want to toss out relative to the CIF syntax: anything relied upon by the DDL stays, everything else is extra. So uncertainties, <?> and <.> stay, but delimiter type, block ordering and dataname ordering go (yes, I know that <?> and <.> are not always explicit in DDL dictionaries - but it is easy to argue that they are inevitable consequences of a relational data model).

The situation was not this clear to me at the beginning.


On 21 April 2017 at 01:14, Bollinger, John C <John.Bollinger@stjude.org> wrote:
On Wednesday, April 19, 2017 1:32 AM, James Hester wrote:
> Reviewing last weeks' discussion, there is a clear bifurcation in the approaches to CIF-JSON that have arisen in practice: (1) the 'high fidelity' approach of COD-JSON (2) the 'low overhead' approach of JMol and Marcin.  This suggests that a single JSON is unlikely to satisfy all users.  Given that COD-JSON is available, implemented and complete, with open-source tools available, I propose we continue to explore here the 'low overhead' approach to see whether it can be brought to a similar state.

I'm game, but I think we may be jumping the gun, as indeed the observed bifurcation seems to indicate: in order to design an appropriate JSON serialization for CIF data, it seems we need to first consider the question, "Appropriate for what?"

> First let me summarise the points where I see consensus arising out of the discussions last week:
> (1)  No allowance needs to be made for expressing CIF numbers as JSON numbers, and therefore no "uncertainties" object is necessary
> (2)  To round-trip a CIF, information about which datavalues were quoted must be preserved
> (3)  Using an escape mechanism for CIF '?' is undesirable, instead \uFFFF or \u0001 would be suitable

But those points of consensus arose from a somewhat different set of assumptions, most importantly that we were looking for a single, presumably general-purpose JSON form.  If we're now rejecting that premise then it's not safe to assume that the apparent consensus on those points remains intact.  Again, what do we want this for?  What properties of this representation are important?  For example, if we are now focusing on minimizing overhead (which kind?) and are willing to sacrifice some fidelity, then perhaps we embrace the fidelity loss, and say,
(1) Uncertainties are not conveyed by CIF2-Low_Overhead-JSON, therefore (a) there is no "uncertainties" object, (b) any uncertainties expressed in data values are not meaningful, and perhaps even (c) known-numeric data values *may* be expressed as JSON numbers;
(2) We don't support full-fidelity round tripping anyway, so we assume that all data values presented as JSON strings and interpreted as strings can be treated as quoted CIF values;
(3) We don't care about the difference between the two flavors of null, so we represent both via JSON null.

As it stands now, I'm not sure on what grounds to evaluate those alternatives relative to the other proposed set.



Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
cif-developers mailing list

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.