Re: Purely calculated structural data in CIF
- To: Distribution list of the IUCr COMCIFS Core Dictionary Maintenance Group<firstname.lastname@example.org>
- Subject: Re: Purely calculated structural data in CIF
- From: Antanas Vaitkus via coreDMG <email@example.com>
- Date: Mon, 27 Jun 2022 01:09:36 +0300
- Cc: Antanas Vaitkus <firstname.lastname@example.org>
- In-Reply-To: <CH2PR04MB69504FE911B865ED27BC2490E03E9@CH2PR04MB6950.namprd04.prod.outlook.com>
- References: <email@example.com><CH2PR04MB69504FE911B865ED27BC2490E03E9@CH2PR04MB6950.namprd04.prod.outlook.com>
the Crystallography Open Database (COD) maintainers have also encountered
a similar problem of identifying and marking purely calculated (theoretical) entries
that accidentally make it into the COD. Our approach is similar to the one proposed
by John -- we use a set of heuristics to semi-automatically identify potentially theoretical
entries and manually mark these entries using the '_cod_struct_determination_method'
data item from the COD CIF dictionary. This data item currently takes 1 of 3 enumerated
values ['single crystal', 'powder diffraction', 'theoretical'] so in a sense it can be viewed as
a rudimentary, COD-specific version of the '_exptl.method' data item. Having a more
standardised approach would be extremely helpful.
form than the one in the mmCIF dictionary . The main difference is that the
CIF CORE version is a free-form text field while the mmCIF version in an
enumerated set with 13 different values such as "X-RAY DIFFRACTION",
"ELECTRON MICROSCOPY", etc. one of which is "THEORETICAL MODEL".
I think that converting the CIF CORE version to an enumerated set would also
make sense, especially for the application discussed in this thread.
Â Â Â (e.g. with yes/no values).
b) Introduce a new data item that specifies the *theoretical* method
Â Â Â that was used (e.g. with values such as "Ab initio optimization",
Â Â Â "Geometric modelling", "Molecular dynamics", etc.). This data item
Â Â Â geometric modelling", "powder diffraction experiment calculated using
Often in the files of theoretically calculated structures:
* Lattice parameters are provided with a very precise decimal part
Â (more than 4 digits) and without standard uncertainties (no trailing
Â parentheses with the s.u. values).
* The Z number ('_cell_formula_units_Z') is not provided.
* Atomic displacement parameters are either not provided at all or
Â all values are set to 0 ('_atom_site_U_iso_or_equiv',
Â 'ATOM_SITE_ANISO' loop).
the strange features might have been stripped out, however, all of
the files contain references to the original publication in case you
would like to take a more purist approach. The full list of theoretical
structures can be retrieved using the following MySQL query:
mysql -u cod_reader -h www.crystallography.net cod -e 'SELECT `file` FROM `data` WHERE `method`="theoretical"';
 https://github.com/COMCIFS/cif_core/blob/master/cif_core.dic, commit 306cd53
As far as I am aware, we have no convention for this in Core CIF, but in mmCIF, it appears that one would be expected to use â€¦
_exptl.method 'theoretical model'
â€¦ to flag a computed structure.Â Other values of that data name supported by mmCIF provide for identifying various kinds of diffraction and NMR experiments by which the associated structure was determined.Â We could consider adding a corresponding item to Core CIF to support such marking going forward, but of course that does not help with recognizing existing CIFs describing computed structures.
As for identifying existing core CIFs describing structures determined ab initio or from molecular modeling, I donâ€™t see a better approach than heuristics such as you describe already using.Â Additional characteristics that such heuristics might check, especially in the context of checkCIF, would be absence of non-null values for substantially all data names in the _diffrn*, _exptl*, _refine*, _refln* and _reflns* categories.Â Exceptions that Â might be expected to be present include the proposed _exptl_method item; *_details items; and a handful of items, such as _exptl_crystal_absorpt_coefficient_mu, that are actually computed from the structure rather than being measured.
John C. Bollinger, Ph.D., RHCSA
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
(901) 595-3166 [office]
Caution: External Sender. Do not open unless you know the content is safe.
We are currently working on improving the checkCIF handling of powder diffraction CIFs, and have coincidentally fallen across an issue with handling purely calculated structural data, e.g. by DFT calculation. So far we have relied on finding the use of "DFT" within various datanames, e.g.
There is no guarantee of course that it would be present in this form.
Therefore, I would like to ask if anyone has any thoughts about how we would be able to simply identify or mark a particular structural datablock as containing calculated rather than experimental data.
With thanks for any thoughts or suggestions,
Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
coreDMG mailing list
Life Sciences Center,
Institute of Biotechnology,
room C521, SaulÄ—tekio al. 7,
LT-10257 Vilnius, Lithuania
_______________________________________________ coreDMG mailing list coreDMG@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/coredmg
[Send comment to list secretary]
[Reply to list (subscribers only)]
- Prev by Date: Re: _database.dataset_doi - any problems if this might be a DOI forraw data?
- Next by Date: Proposal to define data names for atomic analysis information
- Prev by thread: RE: Purely calculated structural data in CIF
- Next by thread: Suggestions for microsymposium topics for IUCr 2023