Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: core_cif review note: _diffrn.crystal_id removed

  • To: Distribution list of the IUCr COMCIFS Core Dictionary Maintenance Group <coredmg@iucr.org>
  • Subject: Re: core_cif review note: _diffrn.crystal_id removed
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Mon, 12 Sep 2016 13:41:58 +1000
  • Cc: Saulius Grazulis <grazulis@ibt.lt>
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:in-reply-to:references:from:date:message-id:subject:to:cc; bh=ki5dac7/cwVa5a3ElwdzbPnRtsDvoxHDtmjr3nUAkr0=;b=Ts2VMK3pVH4jGSC9ZAVDqT9yhf46+/JVXoNXEVhVOwjgSmMKoNIWIfG3Ts0btoU2eUE9nIGu2+AfnqWj8wvxV3iCRfeW4M3zifqvDLFyqnwQ7FDPkdDi8mzk4o7NjJSxPkZr5FvegWX7SuG7MJvxAzjlPy8q3YRxP0H5qBV5r8HX8i1Vx6abry65dG2ogoeKYwkPVHONEzno7n+8M7WbScknwf+QtAOm/+lLxKHTsiC2Yv18DIxNzdPsd8KVkDXBK+1hHqAn0BkO6qKT+lRzroThfAR8VQdbBlynSvxagPGgKL2JZ1/C5qsl8oJF9dJT19vkxSHUeoITb6bvWpsdJg==
  • In-Reply-To: <2D96ACB4B332484AAC57FEC68EBC49A1364F5147@FHSDB2D11-2.csu.mcmaster.ca>
  • References: <CAM+dB2e-hLxZDk9=VaGdkas9UB1_aBCd5MyY2zg3TtJooO0gJg@mail.gmail.com><2D96ACB4B332484AAC57FEC68EBC49A1364F5147@FHSDB2D11-2.csu.mcmaster.ca>
Dear David and Core DMG,

I naturally agree that we must maintain definitions in order for our archived data to be interpretable, and also because those tags with definitions are now enshrined in software that will continue to produce those tags into the future. The question is simply whether or not some or all of these tags should be placed in a separate dictionary and associated with a different _audit.schema setting. 

To assess the practical significance of adjusting these tags, I have grepped through 380,000 or so Crystallography Open Database CIF files, and found _exptl_crystal_id used 584 times, but looped in only 3 cases, all from a single series of high-pressure experiments (COD 7104407,08,09, link: http://www.crystallography.net/7104407.cif). In each of these three files, the _exptl_crystal category is split between a looped and unlooped section, which is formally incorrect (see Vol G p 235: _exptl_crystal_F_000 must appear in list with _exptl_crystal_id). I conclude that the entire _exptl_crystal category can be defined as a Set category with no impact on currently-written software (which clearly is outputting unlooped exptl_crystal information).

_diffrn_refln_crystal_id appears once (COD 4023466) in a file in which exptl_crystal_id is not looped, so _diffrn_refln_crystal_id provides no new information. Likewise, I am advised by the IUCr office that the _exptl_crystal_id tag has appeared a total of 46 times over the last 14 years in the IUCr archives (around 1% of files), and none of the other _crystal_id tags are present at all.

Given the above, I therefore propose that we retain only _exptl_crystal.id in the exptl_crystal category, which becomes a Set category (i.e. one value per dataname) in keeping with current practice as established above. In addition, we define a 'multi-crystal' schema in a separate dictionary in which exptl_crystal is looped, and _diffrn_refln.crystal_id and refln.crystal_id are defined (and perhaps others). The only datafiles in the COD corpus whose interpretations are affected are the above 3 files, which are already formally incorrect (but of course do contain perfectly useful information for the human reader and in other categories).

I will edit the current core CIF dictionary draft accordingly, and expand the 'Legacy' section of the looping proposal (https://github.com/COMCIFS/comcifs.github.io/blob/master/looping_proposal.md) to include exptl_crystal. This editing is purely to expedite the process (rather than waiting for comments that may never come), and of course further discussion is welcome.

James.

FYI, the commands I used to obtain the above numbers are (in the COD download top directory):
(1) all occurrences of _exptl_crystal_id etc.:
 pcregrep -r '_exptl_crystal_id' 1/* 2/* 3/* 4/* 5/* 6/* 7/* 8/* 9/*
(2) all looped occurences of _exptl_crystal_id:
pcregrep -rM '(_[[:graph:]]+|loop_)[^[:graph:]]+_exptl_crystal_id' 1/* 2/* 3/* 4/* 5/* 6/* 7/* 8/* 9/*

and

pcregrep -rM '_exptl_crystal_id[^[:graph:]]+_[[:graph:]]+' 1/* 2/* 3/* 4/* 5/* 6/* 7/* 8/* 9/*


On 9 September 2016 at 00:18, Brown, David <idbrown@mcmaster.ca> wrote:
I should point out that the multiple crystal feature was added  because of requests from users. It may not be needed for routine work, but it is needed for archival purposes and in some cases for submission of papers to journals. Not every program needs to be able to handle multiple crystal files, but the feature needs to be present for those cases where the structure determination is dependent on the use of different crystals.

David

I. David Brown
Professor Emeritus
Department of Physics and Astronomy
McMaster University
Hamilton, Ontario, Canada
From: coreDMG [coredmg-bounces@iucr.org] on behalf of James Hester [jamesrhester@gmail.com]
Sent: September 7, 2016 21:28
To: Distribution list of the IUCr COMCIFS Core Dictionary Maintenance Group
Subject: core_cif review note: _diffrn.crystal_id removed

Dear Core DMG,

In the course of removing unneeded keys (as per a previous email), I noted that the draft core dictionary is inconsistent as to whether or not multiple crystals are supported.   Referring to Vol G, the original DDL1 core allowed multiple crystals to be listed in exptl_crystal, and these crystal ids could be included in the diffrn_refln and refln lists.  The new draft core adds these crystal ids to the exptl_crystal_face category (not too problematic) and to the diffrn category. The latter is nominally a set category (one value per dataname in DDL1) and so couldn't refer to more than a single crystal id.  I have therefore removed crystal_id from this category in the updated draft, which now accurately reflects the state of the DDL1 dictionary.

Looking to the future, we do now have an elegant solution for handling multiple crystals, by making use of the new _audit.schema arrangement. In an ideal world, the core dictionary would assume only one crystal, and a small expansion dictionary associated with a non-default value of _audit.schema would define exptl_crystal.id and the keys listed in the previous paragraph.  In this ideal world, software that did not want to deal with multiple crystals could happily stick to the default schema.

It's not clear to me how much the multiple crystal definitions in the DDL1 core are actually used. It would be great to have some comments, especially from software authors, as to whether or not they input/output _exptl_crystal_id as defined in the current DDL1 core dictionary.  For example, would your software input and process CIFs correctly if the reflection list contained multiple instances of the same h,k,l, each from different crystals? Do you actually output _refln_crystal_id in the reflection list?

I am currently preparing a core CIF draft containing the various small revisions described in this and previous emails and should have it available for you by the end of the week.

all the best,
James.
--

_______________________________________________
coreDMG mailing list
coreDMG@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/coredmg




--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
coreDMG mailing list
coreDMG@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/coredmg

[Send comment to list secretary]
[Reply to list (subscribers only)]