Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF-JSON new draft

For those of us not used to working with Javascript, you can paste and run Bob's code into the Firefox developer scratchpad (in the developer tools accessible from the right-hand menu icon).  Other browsers presumably have similar functionality.

Anyway, in the spirit of incremental improvement I've adjusted Bob's code (code at end of post) to (i) include 3 significant figures only (for realism) (ii) no assignment for the JSON.parse case (the numbers are ready for use) (iii) use an iterator over the string arrays (map).  The justification for doing the latter instead of accessing individual elements is that software which needs to deal with a lot of numbers is likely to mostly input looped data (think atomic positions for a protein) and a built-in iterator should be more efficient. I get, for one million elements (Firefox 45.0 on Linux, 16GB memory, 2.3GHz laptop):

600 ms parseFloat
130 ms JSON.parse

(Note this is the result of the first time through, following runs are noticeably shorter perhaps due to some caching). These results seem to be broadly in line with the Python 2 numbers.   The parseFloat version may be faster if the float array can be preallocated. Anyway, I think it is plain that having JSON numbers in the JSON object is more efficient for having numerical values ready for calculations.

As John B implies, how noticeable this extra overhead will be will depend on how much other work the program is doing.  Simply adding in a few calculations will rapidly diminish the significance of this difference: simply asking to sum the numbers leads to the following results:

1100 ms parseFloat
650 ms JSON.parse
 
So the advantage has come down to less than 2x.  Essentially, on my computer I am paying an extra 1 second per 2 million floats for using strings.  Compared to the cost of e.g. sending the JSON over the network this would be minor.

The second issue with mandating JSON numbers, as John B said, is that any application that is creating JSON from a CIF file will need to know which values are numbers for all datanames in the file.  While we are not targeting such conversion as a primary goal, it will still be a common operation that we thus try to make easy/efficient.  And presumably the program constructing the JSON will be paying the cost to convert CIF strings to floats. So I think that mandating numbers is not worth it.

On the other hand, it would be a time saver if this conversion only needed to be done once rather than by each script within a context. What do we think about defining an optional object in each datablock as follows:

10. Each JSON datablock object may optionally contain the following names:
    (a) "as_numbers".  "as_numbers" is a JSON object. Each entry in this object is a dataname matching a dataname in the JSON datablock object.  Each of these names has a value with the same structure as the corresponding name in the parent JSON datablock object, except that a JSON number is present wherever a JSON string value occurs in the corresponding JSON datablock object's value.
   (b) "uncertainties".  This is structured identically to the "as_numbers" object. Each primitive value expressess the uncertainty for the corresponding "as_numbers" value as a JSON number.
In both (a) and (b), JSON null may be used where no number is available.

Scripts could then check whether or not the appropriate dataname has already been converted.  If transfer time of a file that is twice as large is an issue, these optional entries can be deleted before transmission.

James.

(dodgy Javascript code used for benchmarking - note the summation is included)

var c = 0
var n = new Intl.NumberFormat({"maximumSignificantDigits":3})
a = "[0"
for (var i = 0; i < 1000000; i++)
 a += ',"' + n.format(Math.random() * 100) + '"';
a +=  "]"

x = "[0"
for (var i = 0; i < 1000000; i++)
 x+= "," + n.format(Math.random()*100);
x += "]"

for (j = 1; j < 5; j++) {
    c = 0;
    t = +new Date;
    b = JSON.parse(a);
    var q = b.map(function(s){return parseFloat(s)});
    for (let i of q) {
      c = c + i;
    }
    t = +new Date - t;
    document.write(t + " ms parseFloat(JSON.parse([str,str,str...]))<br>");
    document.write(c + " sum<br>");
}

document.write("<br>")
for (j = 1; j < 5; j++) {
    c = 0;
    t = +new Date;
    d = JSON.parse(x);
    for (let i of d) {
      c = c + i;
    }
    t = +new Date - t;
    document.write(t + " ms JSON.parse([num,num,num,num])<br>");
    document.write(c + " sum<br>");
}

document.write("<br>")



On 2 May 2017 at 06:09, Robert Hanson <hansonr@stolaf.edu> wrote:
Marcin,

Right, OK. Here it is with a 1,000,000 element array of random strings or numbers.

Does this code look better? Each includes JSON parsing of the array, which either has strings or numbers. The first also requires, then, parsing of the floats upon use.

var c = 0
a = "[0"
for (var i = 0; i < 1000000; i++)
 a += ',"' + Math.random() + '"';
a +=  "]"

x = "[0"
for (var i = 0; i < 1000000; i++)
 x+= "," + Math.random();
x += "]"

for (j = 1; j < 5; j++) {
    t = +new Date;
    b = JSON.parse(a);
    for (var i = 0; i < 1000000; i++) {   
      c = parseFloat(b[i]);
    }
    t = +new Date - t;
    document.write(t + " ms parseFloat(JSON.parse([str,str,str...]))<br>");
}
document.write("<br>")
for (j = 1; j < 5; j++) {
    t = +new Date;
    d = JSON.parse(x);
    for (var i = 0; i < 1000000; i++) {
      c = d[i];
    }
    t = +new Date - t;
    document.write(t + " ms JSON.parse([num,num,num,num])<br>");
}


652 ms parseFloat(JSON.parse([str,str,str...]))
621 ms parseFloat(JSON.parse([str,str,str...]))
620 ms parseFloat(JSON.parse([str,str,str...]))
618 ms parseFloat(JSON.parse([str,str,str...]))

601 ms JSON.parse([num,num,num,num])
570 ms JSON.parse([num,num,num,num])
578 ms JSON.parse([num,num,num,num])
569 ms JSON.parse([num,num,num,num])

In my mind this is not a significant difference. 


​Bob

_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers




--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.