[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Specifying values 'less than something' in CIFs?

Hi,

On 04/29/2012 02:11 PM, Peter Murray-Rust wrote:

>     b) make software that detects and, when possible, corrects the most
>     common 'mistakes'; e.g. it is probably safe to change '100C' to '373.15'
>     (Kelvins), with a benign warning (COD deposition tools do this on
>     the fly).
> 
> This is really difficult. I don't like changing things and certainly not
> without metadata. Maybe an additional local COD_ field that gives the
> heuristic value. That way we don't corrupt the past but also allow
> people to search and compute on "better" information

I acknowledge your concern. In short, we do have metadata and we do keep
track of changes. Since the detailed discussion will be probably
off-topic here, I will write you in more detail privately; or we can
move discussion to Cod-users, Cod-dev or Opencryst mailing lists at
http://lists.crystallography.net/cgi-bin/mailman/listinfo...

>     _chemical_melting_point -173.15C
> 
>     Easy?
> 
> Afraid not. CIF is a flat syntax and cannot easy manage complex objects
> with facets and attributes.

Well, CIFs anyway have values with more or less complex internal
structure: author names, _chemical_formula_*, etc. For
_chemical_formula_*, we would even need a formal grammar in BNF/Yacc
syntax to describe them precisely! Even floating-point numbers, with
ESUs, are pretty complex. Adding an optional non-numeric string would
not add much complexity to what already exists.

>     Backwards compatibility is granted, along with the automatic
>     conversion possibility of CIFs to be readable by older programs :) And
>     chemists could just cut-n-paste values with units from their papers.
> 
> The harder challenge is backwards compatibility for the data. Examples
> are disorder which - say 10 years old - is much harder to interpret than
> modern CIF. 

In general, yes, but for the numeric values with units, the backwards
compatibility is secured. Of course, permitting units it would add extra
complexity to CIF processors...

>     I would interpret data item with the '?' the same way as if this data
>     item was missing altogether...
> 
> I have spent years trying to see if there is semantic consensus on these
> two happy characters. My best guess is:
> 
> "?" can be ignored (or better deleted). It is simply there for human
> authors and readers, perhaps to prompt them that they should notice
> there is nothing there
> 
> "." exists to pad out tables/loops and stop the reader failing

This is exactly how I understand and treat the "?" and "." values.

A bit more tricky question is how to treat loops where *some* values are
"?". If, for instance, some occupancies are given, and some are "?" --
should we use a default guess 1.0? Sure, it depends on the task, and
sure, this should not happen, but still...

"." could also be used to indicate that "the data item has no sense in
this datablock", e.g. a crystal unit cell in an NMR-determined structure
(for macromolecules). But this usage is very rare, if it exists.

I would appreciate COMCIFS comments on this issue as well, if possible
(especially given that the question is on agenda anyway...).

Regards,
Saulius

-- 
Dr. Saulius Gražulis
Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://mailman.iucr.org/mailman/listinfo/comcifs


Reply to: [list | sender only]