Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Backus-Naur Form for CIF

  • Subject: Re: Backus-Naur Form for CIF
  • From: Nick Spadaccini <nick@xxxxxxxxxxxxx>
  • Date: Tue, 17 Oct 2000 06:07:00 +0100 (BST)

On Wed, 4 Oct 2000, Peter Murray-Rust wrote:

> What characters does CIF allow? is it 8-13, 32-127? what about Unicode? It 
> needs to be clear. [I haven't got the latest CIF spec so forgive me if this 
> is obvious.]

The specification currently supports ASCII 9-13, 32-126, though the longer
term view will have to be, "how will we adopt the extended character sets
supported by UTF-8 and UTF-16?"

> applied, and additional semantics such as the above are processed. [NB I 
> have never felt happy about the *semantics* of dot and query - it certainly 
> used to be possible to interpret them so that CIF processors were required 
> the to expand ? in DDLs  into default values in data files.]

The semantics of . and ? are often confused and interchanged but David
(below) sums them up correctly. However his use of the word "substitute"
is open to interpretation and needs some clarification.

On Wed, 4 Oct 2000, I. David Brown wrote:

> The 'Guide to CIF for Authors' makes it clear that '?' is used when the
> value is not known (i.e. it may or may not be the default).  '.' is
> interepreted as the default (if a default exists).  
> 
> In my understanding '.' is also used to indicate that the value is
> irrelevant in the current context in which case substituting the default
> value should not cause any problem.  However, '?' should never be
> substituted by the default.

So the presence of ? indicates there is a value, BUT it is unknown. What
you do after this is application specific (as far as STAR is concerned).
You may choose to use a default value that may be present in the
dictionary, or you may choose to leave the the value as "unknown". In the
newest version of the prototype STAR dictionary the presence of ? actually
invokes the methods associated with the data item, a series of evaluations
occur that results in the calculated value. Hence bond distances can be
recorded as ?, and the value is determined from the method attached to the
_bond_length_whatever dictionary item.

David correctly states the two independent interpretations of . in a data
file. It is a short hand way of identifying the default value should be
used. That default value being defined in the dictionary. Why write out a
"." rather than the actual default value? Historical reasons. When files
were written and read by eye it was deemed easier to see the exception to
the rule in a "sea of dots". In CIF this is frequently used for the symop
associated with an atom in a bond calculation. Rather than have the
majority of them recorded as 1_555, they are often stored as . with the
default value defined as 1_555. In this way it is much easier to see an
exception like 1_556 (which stands out). As tools have developed and less
of these things are done by eye, we may see a deprecation of the use of .
to mean default.

It's other use is in the presence of . where there is no dictionary
default. This is semantically different and interpreted to mean this
quantity is not relevant. This occurs in CIF where a list of atom coords
and Uijs may have the 6 U11->U23 values listed for non hydrogen atoms but
have only a single Uiso for hydrogen atoms. However CIF being CIF the 6
values for U11->U23 must be listed even for hydrogens. Hence one used . to
mean this quantity is not relevant here (since no default exists for a
Uij).

Peter MR follows up with 

On Wed, 4 Oct 2000, Peter Murray-Rust wrote:

> My worry about defaults is that they can become regarded as "hard facts" 
> after they have been inserted for the first time. XML has a tools for 
> adding default attribute values, e.g.:
>          <!ELEMENT molecule ANY>
>          <!ATTLIST molecule convention CDATA "CML">
> says that any <molecule> without a convention attribute:
>          <molecule>
>   is processed as if it were:
>          <molecule convention="CML">
> This is fine for metadata and styling, but is perhaps dangerous for 
> "experimental values".  When the file is emitted from an XML or CIF tool 
> which adds defaults there will be no way of knowing whether some data were 
> actually measured or whether they were defaulted. The same problem occurs 
> with null values - "." is often a null-like value.

I contend that the values . and ? ARE THE VALUES and should NEVER be
physically substituted for. The fact that pragmatically and operationally
you can use defaults etc, does not mean these become the recorded values.
The recorded values were and should always be . and ?. Every time you
access them you go through whatever process it is you need to.

cheers

Nick

--------------------------------
Dr Nick Spadaccini
Department of Computer Science              voice: +(61 8) 9380 3452
University of Western Australia               fax: +(61 8) 9380 1089
Nedlands, Perth,  WA  6907                 email: nick@cs.uwa.edu.au
AUSTRALIA                        web: http://www.cs.uwa.edu.au/~nick



Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.