Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mmJSON (CIF-JSON-like format from PDBj)

  • Subject: Re: mmJSON (CIF-JSON-like format from PDBj)
  • From: James Hester <jamesrhester@xxxxxxxxx>
  • Date: Thu, 29 Jun 2017 13:05:01 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;h=mime-version:in-reply-to:references:from:date:message-id:subject:to;bh=UxC/9ZHD/OzN/TUJbNoECm4cPBdZ890sX3OJLDj9VzE=;b=PC14KqCi7ZztStKJN1Sc9NTZu1Kg/WWAQ6gpFCzZxuQDkcRYyAbJ4Ea0c1X5R39z1qdLdouUhHcvVFJM+yYBKg8N6/l5DQCUvh2erDkG1l6FtaXHWMDrdHqu2WqnfJHGjcMCK+zoXL8XEDFLPrKNGKV/Vng+9rG0gkdF40hxhza2wn0NYyVAKbpTWu14fVzaSDkjtYW9hQvw7KDI+Y21lKma+Lml4ioel2iDQcnZymFexjUBe2pitIqVPJK0+lFRLq4i4OpC9CrMCOZMjhrA7y+qfMk3M7UHc1i4bc2/jhS6dlz1MxUr0jVPhQnr4R84shtvMjZ9FsLRaDYEY8OgTw==
  • In-Reply-To: <CACaHzQXDA5RR+Tsv3jSFTK6A28RJitYX1Vyk-CZ1WJzXNZ9vNg@mail.gmail.com>
  • References: <CACaHzQXDA5RR+Tsv3jSFTK6A28RJitYX1Vyk-CZ1WJzXNZ9vNg@mail.gmail.com>
It is interesting to see the different choices made here.  As far as I can tell (I couldn't find any formal spec) the values for each data name are put into an array, which are then attached to a data names in an associative array (much like CIF-JSON). The key difference is that these per-loop associative arrays are then attached to category names in a higher-level associative array.

The github site for this project (https://github.com/gjbekker/cif-parsers) has an example:
  "data_PDBID": {
    "pdbx_category1": {
      "field1": [1, 2, 3, 4],
      "field2": ["one", "two", "three", "four"],
      "field3": [1.0, 2.0, 3.0, 4.0]
    "pdbx_category2": {
      "field1": [1, 2],
      "field2": ["one", "two"],
      "field3": [1.0, 2.0]
    "pdbx_category3": {
      "field1": [1],
      "field2": ["one"],
      "field3": [1.0]

What is also interesting is that the parser that produces this (found at the above website) appears
to optionally use the pdbx/mmCIF dictionary (in json form) and apply all the type information found

there to produce the correctly-typed arrays that appear in the above example. As far as I know
the wwPDB enforces a strict use of quotes to surround any non-numerical value so such

double-checking is not strictly required.

On 28 June 2017 at 20:46, Marcin Wojdyr <wojdyr@gmail.com> wrote:
Hi All,
I just came across mmJSON format:
and just wanted to share it. It's similar to CIF-JSON but tailored for
mmCIF files.

The paper about it says[1]:
An analysis showed that the compressed mmJSON is on average
approximately 33 or 56 % smaller than a compressed mmCIF or PDBML
formatted file, respectively, making it more suitable for web

[1] https://jcheminf.springeropen.com/articles/10.1186/s13321-016-0155-1
cif-developers mailing list

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.