Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Draft JSON specification, round 2

I think Andrius's comments are relevant to the goal of 'high fidelity' conversion of CIFs.  I've added some comments inline and some thoughts at the end.

On 20 April 2017 at 16:18, Andrius Merkys <andrius.merkys@gmail.com> wrote:
Dear all,

On 19/04/17 20:20, Robert Hanson wrote:
> Do you see these as having a serial array [...]  at the top level or
> an associative array {....}?

I would recommend the use of a serial array [...] to preserve the order
of datablocks of input CIF. Some arguments for this approach:

1) Serial array removes the ambiguity of the datablock order in JSON ->
CIF conversion, thus different programs will by default produce
diff-able output;

Diffable output seems like a rather high bar to set and again, more suitable for a 'high fidelity' approach. Perhaps a similar effect could be obtained by sorting on datablock name and then dataname?

2) Input CIFs *may* contain two or more datablocks with the same name
(i.e. concatenated files, incorrect files), thus CIF -> JSON converters
with top-level object approach must have explicit guidelines which of
these datablocks will get overwritten or discarded;

Datablock names are arbitrary, and so the CIF input routine is free to choose different names or complain, as appropriate.  Again, the approach in the draft is not a high fidelity approach but one that is consistent with CIF semantics.

3) I am aware of at least one CIF usage that makes use of datablock
order (Toby et al. 2003, https://doi.org/10.1107/S0021889803016819):
"The first block in the CIF contains information used in a publication";

I interpret this to mean that the CIF as output by GSAS2CIF will have publication information in the first datablock. This sentence does not have to be read as a requirement on input software to preserve this order, as the particular tags used for publication will be present only in this datablock, wherever it occurs.  I accept that a 'high fidelity' approach will seek to preserve order, either in the expectation that it might be assigned semantic meaning at some point, or to preserve a layout that is helpful to a human reader.

4) One can always construct an associative array out of serial array,
whereas vice versa is ambiguous.


I understand that top-level object approach should be sufficient for the
most of use cases, but I would argue against the loss of order information.

Andrius's comments underline the different needs of CIF-JSON users, and I suggest that we should expect to have two variants.

I'd also like to offer a different perspective: that CIF JSON is an alternative instance of the ontology described in CIF dictionaries. That is, rather than see a CIF-JSON instance as always derived from a pre-existing CIF file, to instead view it as one of a variety of alternative containers for crystallographic information described by the CIF dictionaries (and HDF5 is another possibility).  In this perspective, the requirements imposed by trying to capture the detailed CIF layout and syntax are irrelevant, and round-tripping is not a goal.  Note that the CIF syntax files are still the preferred format for archival purposes.

If we adopt the above perspective, COD-JSON is a JSON that is always linked to a particular CIF file, and CIF-JSON is simply an alternative packaging of crystallographic information that may or may not be related to a file (e.g. it could simply be an interchange format internal to a set of scripts within a browser window).  What do you think?

all the best,

cif-developers mailing list

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.