Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Specifying values 'less than something' in CIFs?

  • To: "Discussion list of the IUCr Committee for the Maintenance of the CIF Standard (COMCIFS)" <comcifs@iucr.org>
  • Subject: Specifying values 'less than something' in CIFs?
  • From: Saulius Grazulis <grazulis@ibt.lt>
  • Date: Sun, 29 Apr 2012 11:09:09 +0300
  • Organization: IBT
Dear COMCIFS members,

I am currently trying to validate all Crystallography Open Database CIFs
against the IUCr core dictionary.

A large amount of value type violations come from data items like this:

_refine_ls_shift/su_mean <0.001

(see e.g. http://www.crystallography.net/2232747.cif)

The data type in the core dictionary is specified as 'numb', but many
CIFs give string ('char') values, because of the attached "less than" sign.

For a human reader, the message in these data items seems more-or-less
clear: in interpret it as if the authors wanted to convey that they are
"pretty sure that the value negligible and can be treated as 0 for all
practical purposes; with very high probability it is less than <0.001"

How do we express this in CIF dictionary-consistent way?

One possibility would be to put in the value 0 (this is the lowest
possisble value for the _refine_ls_shift/esd_mean and other such tags),
denoting that in computations, the values (shifts) can be neglected;
then we could reason that since the authors put '<0.001' they are pretty
sure about it, so the probabilities for this to be true are above 99%;
therefore, if the measured values were normally distributed around the
mean 0, 0.001 would be something like 3*sigma ("the three sigma rule"),
and thus the esd would be 0.001/3 approx. = 0.0003. This would yield the
CIF encoding:

_refine_ls_shift/su_mean 0.0000(3)

Of course the values can not be negative, and we are not sure about
normality, and we are not sure about how precisely authors have
estimated the shifts and what confidence intervals they had in mind, but
since we do not have any more reliable estimates of standard deviation
for this value, the above notation should convey about the same message
as '<0.001', but in a CIF-consistent way.

I think such encoding should not confuse any valid CIF readers -- what
about you? Do you have any other suggestions how facts 'value is less
than ....' could/should be recorded?

I would like to run automatic conversion on COD and replace all similar
data items in a consistent and transparent way, so that the validation
messages for these data items do not obscure more serious problems.

Sincerely yours,

Dr. Saulius Gražulis
Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366
comcifs mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.