[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
--
Reply to: [list | sender only]
Re: Draft JSON specification, round 2
- Subject: Re: Draft JSON specification, round 2
- From: Robert Hanson <hansonr@xxxxxxxxxx>
- Date: Wed, 19 Apr 2017 21:43:26 -0500
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stolaf.edu; s=stolaf;h=mime-version:in-reply-to:references:from:date:message-id:subject:to;bh=o+vnB5a0ncK0dTvLQDiHRbSZDdQLafV5ACvhHe7Ilww=;b=PbBRnF8sMoggk2GnifvH8elOgp5OtAW9e45TGXX5TAQwpCaoySmvEgTfEBB8WX4Jx4S33Y1kYO9g9yGXKQiMeT/lNLLOTlMriwMQ3YiJqkFkqTg+3gq2dLYj8jXPgTUWAH81z6KGM2DA5EK0a8go2VO2CfnIJ/AVR6hMRcVnbI8=
- In-Reply-To: <CAM+dB2f6wLa0U8shrN5D6=B+u4tFWYPG9byXvA0FWf_9dWCGzQ@mail.gmail.com>
- References: <CAM+dB2d4HcnH7PZRC4jYO8KLyNxs4pws_baT7WKi6vRiD2z1ow@mail.gmail.com><CAF_YUvURvhf+CkErvsCjNr=b=Nq2=FkM68Z_Tap3EsAzfArc+w@mail.gmail.com><CAM+dB2fCeZh_LdnvMsjfYzLyz8wLqwDErCZkKYJR8TVudkc6xA@mail.gmail.com><CAF_YUvXh=J5UVAm1YqFt8xe8DuRS2qC_-wHk-Ou1aNJuvEuOxQ@mail.gmail.com><CAM+dB2f6wLa0U8shrN5D6=B+u4tFWYPG9byXvA0FWf_9dWCGzQ@mail.gmail.com>
OK, well, let's make sure the META_DATA has the CIF header in it - please not exactly like that, but something noteable like:
cif_version: "....."On Wed, Apr 19, 2017 at 9:35 PM, James Hester <jamesrhester@gmail.com> wrote:
On 20 April 2017 at 10:38, Robert Hanson <hansonr@stolaf.edu> wrote:On Wed, Apr 19, 2017 at 7:25 PM, James Hester <jamesrhester@gmail.com> wrote:I'm still not sure why you are keen to emphasise the CIF2 nature of the JSON - the concept is that this JSON handles all possible CIF data values. Any JSON parser will be happy with the syntax of the JSON, and any CIF application that plans to handle this JSON will be written in full knowledge of the existence of Unicode, list and array datavalues (because CIF2 already exists). Any datanames that take CIF1 data types (ASCII, no tables, no arrays) should still be restricted to these datatypes in the JSON.Hi Bob,We can't reserve a key at the top level unless we somehow distinguish block names (otherwise we can't distinguish between a data block called "metadata" and the actual "metadata" object). So if we were to prefix every block name with an underscore "_" then we could have informational top-level keys. Does that sound reasonable?
I see it as a JSON object at the top level, in keeping with the lack of significance of datablock ordering in CIF.I'm trying to remember my argument. I think it mainly had to do with all the crazy character business in CIF that was cleaned up in CIF2.All of that wacky 20th century craziness will be invisible to JSON and entirely internal to the CIF parserwhere the "schema_uri" would point to a JSON schema that could be used to validate the JSON."metadata":{"schema_name":"CIFThat said, I think it is very sensible to have versioning information available in a metadata tag (as COD-JSON does). As it is so easy to add extra keys to datablocks (e.g. "uncertainties", "ordering", "might-be-a-number","original_text") no doubt there will arise useful additions amongst the user community that could be added in future updates.
How does this look:-JSON",
"version":"1.0",
"schema_uri":"http://www.iucr.org/cif/cif-json/version_1.0.s "}chema That's the sort of idea. I was also thing about"numberTreatment": "asString""uncertaintyTreatment": nullor something like that.The trouble with such options is that they involve more programming for the input routines, which must then be able to cope with every combination of options (lots of if statements), and, being optional, the code to read numbers from strings still has to exist anyway. If there is a demand for 'pre-parsed' numbers, then we can define extra (optional) objects in a datablock that contain the results of the pre-parsing, and then software is free to ignore or take advantage of these numbers as it sees fit. What, if any, optional objects are considered useful right now?I'm more into camelCase than xxx_yyy, but you can do what you prefer. These don't have to be lower case.Come to think of it, all upper case keys could be non-cif keys since we specify all lower case for CIF keys. Are data names case sensitive? Thus, this could beMETA_DATAperhaps?Ooh, nifty idea. Loosely speaking, datablock names are caseless (may not canonically caseless match in Unicode speak) so we could stipulate that all datablock names are lower case (or the Unicode equivalent). Let's see what the rest of developers here think about this.
On 20 April 2017 at 03:20, Robert Hanson <hansonr@stolaf.edu> wrote:I think you have it, James. One suggestion if we wanted to be flexible would be to have a key that is reserved for indicating CIF-JSON ("JCIF2"? -- I prefer seeing that "2" there for emphasis) that indicates how numbers are handled and then let the reader beware. I recommend a key called metaData and have that as a place where encoding information could be placed.Do you see these as having a serial array [...] at the top level or an associative array {....}?BobOn Wed, Apr 19, 2017 at 1:32 AM, James Hester <jamesrhester@gmail.com> wrote:______________________________(2) To round-trip a CIF, information about which datavalues were quoted must be preserved(1) No allowance needs to be made for expressing CIF numbers as JSON numbers, and therefore no "uncertainties" object is necessaryDear CIF developers,Reviewing last weeks' discussion, there is a clear bifurcation in the approaches to CIF-JSON that have arisen in practice: (1) the 'high fidelity' approach of COD-JSON (2) the 'low overhead' approach of JMol and Marcin. This suggests that a single JSON is unlikely to satisfy all users. Given that COD-JSON is available, implemented and complete, with open-source tools available, I propose we continue to explore here the 'low overhead' approach to see whether it can be brought to a similar state.First let me summarise the points where I see consensus arising out of the discussions last week:(3) Using an escape mechanism for CIF '?' is undesirable, instead \uFFFF or \u0001 would be suitableSo: I propose changing the draft so that (1) all datavalues are strings (2) an unquoted question mark is replaced by \uFFFF (3) the 'uncertainties' object is removed. The resulting JSON would have the following properties:(1) It would not be possible to (re)create conformant input CIFs unless dictionary definitions are available for all datanames(2) CIF-JSON readers must parse numeric values as needed(3) CIF-JSON writers must explicitly format newly-inserted numeric values as JSON strings(4) A CIF containing numeric datavalues in delimited strings would be processed through a JSON application without detection of non-conformity. For example, JSON generated from the fragment
loop__atom_site.fract_x_atom_site.fract_y_atom_site.fract_z
"0.0" "0.5" "0.1234(4)"
"0.0" "0.0" "0.7500"would be processed without error by a JSON application that plots atomic positions (e.g. JMol).
Point (1) should be spelled out in all documentation and COD-JSON suggested as an alternative
Point (2) imposes extra work compared to parsing the values once before passing around the resulting object, although as Bob points out, the difference between the built-in JSON parser parsing a non-delimited string and your custom parser parsing a number with optional uncertainty is slight
Point (3) is very little extra work as formatting of numbers with uncertainties is not a typical JSON library operation
Point (4) is not a problem for any software that manipulates specific datanames, as they must know what datatype to expect and thus will parse numbers as needed. Any generic software that does rely on the meaning of specific datanames (e.g. a pretty printer) may have issues, although a suitable example doesn't come to mind.If the consensus is that we would actually like to keep delimiter information, we can add a new object to the CIF JSON: "might-be-a-number" whose value is a list of datanames that were not delimited in the CIF file *and* match the regexp for a number. If there is a mixture in a loop column, that is not a number.Thoughts?James.--
_________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-develop ers
--Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-develop ers
--
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-develop ers
--Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-develop ers
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif- developers
--
Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want,
it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
_______________________________________________ cif-developers mailing list cif-developers@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- Re: Draft JSON specification, round 2 (James Hester)
- Re: Draft JSON specification, round 2 (Robert Hanson)
- References:
- Draft JSON specification, round 2 (James Hester)
- Re: Draft JSON specification, round 2 (Robert Hanson)
- Re: Draft JSON specification, round 2 (James Hester)
- Re: Draft JSON specification, round 2 (Robert Hanson)
- Re: Draft JSON specification, round 2 (James Hester)
- Prev by Date: Re: Draft JSON specification, round 2
- Next by Date: Re: Draft JSON specification, round 2
- Prev by thread: Re: Draft JSON specification, round 2
- Next by thread: Re: Draft JSON specification, round 2
- Index(es):