Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A formal specification for CIF version 1.1 (Draft)

  • Subject: Re: A formal specification for CIF version 1.1 (Draft)
  • From: Brian McMahon <bm@xxxxxxxx>
  • Date: Thu, 11 Jul 2002 11:47:31 +0100 (BST)
On Thu, Jul 11, 2002 at 10:44:00AM +0100, Doug du Boulay wrote:
> Firstly syntax-33  
> "use of the global__ feature of STAR is expressly forbidden at this revision"
> I recall seeing a significant number of CIFs on the journals site containing
> data_global This is just an interim placeholder right? And everything that
> inevitably aught to go into a global_ block should temporarily be placed in 
> such a data_global block for the interim?

No. The name of the data block has *no meaning*. In the earliest days Acta
used the convention that the text of a paper that described several
structures should be in one separate data block. We called it "data_global"
(this before the global_ construct was introduced in STAR); subsequently we
preferred "data_text". In either case, it's just a label, and there is no
formal way to deduce its purpose. That has begun to be addressed by the
_audit_link_ data items introduced in version 2.0 of the Core dictionary,
but I feel much more work needs to be done on metadata relating to the
purpose and structure of the CIF and relations between its data blocks (and
data blocks in other files).

The issues with the STAR global_ construct have to do with scoping and
inheritance of values, and our feeling is that introducing it at this stage
would greatly ramp up the demands on a general-purpose CIF parser without
clear benefit.

> Is there some protocol envisaged for treating CIF comments in order to
> preserve intact the structure of the file between reading a CIF in and 
> writing it intact, back out again? Should comments be 
> associated in any formal manner with neighbouring pre or post data items or 
> in the case of comments between data blocks, with a pre or post data block
> or data_global placeholder(?). Alternatively, since it seems ambiguous, is 
> there any thought about deprecating  # delimited comments in favour of more 
> formal tag value constructs?

The prevailing view has been that comments should not be *relied upon* to
transfer any information between CIF applications. They're useful as
visual cues to readers in a text editor, and applications may preserve them
if they wish.

> Version identification (syntax 34) and Dictionary compliance (semantics 26-28)
> If you are going down the version identification path and adding an 11
> byte header, why don't you go all the way and tack on a dictionary
> compliance URI ala html/xml/sgml? Instead any generic CIF reading program 
> that wants to read the CIF, and possibly associate it with some dictionary 
> specific data structure, has to do an initial scan, probably of the entire 
> file to find the dictionary conformance tags. Sure it can be done, but it is 
> not optimal.

In line with the general philosophy on comments, useful information to CIF
applications is constrained to formal tags. The initial header is a nod in
the direction of rather generic applications that want a quick
identification of a file type (e.g. to associate a pretty icon in a
filemanager browser); its use will be considered polite but entirely

> Kind of related, can a CIF contain a data_block that is version #\#CIF_1.1
> compliant, as well a another data_block that is version #\#CIF_1.0 compliant?

We're working very hard to maintain upward compatibility. Any data block
that is 1.0 compliant will be 1.1 compliant. Of course a CIF headed
#\#CIF_1.0 with a data block that contains the extensions in 1.1 will
actually be compliant with 1.1, in which case the usefulness of the 
header comment can well be questioned.

> And can a CIF (file or data_block?) really be totally conformant with more 
> than one dictionary, i.e. why the need for item 27 loop_?  Would it not be

Yes. I could introduce a few local data names and provide a pointer to my
local dictionary so that a generic dictionary-driven validator could check
all the data items against two non-overlapping dictionaries.

> more accurate to specify a single dictionary against which the data contained 
> is completely conformant? (the dictionary in question could specify its own 
> conformance with other dictionaries more explicitly)
> Also concerning Dictionary compliance item  28,
> where and what is the detailed dictionary locating merging and overlaying 
> protocol?

Look for the moment at

This was an internal report that was approved by COMCIFS and has been around
for a while, but it needs much more prominence and a proper URL. It also
needs a reference implementation.

> Also regarding the Character set (syntax 22-23) I was under the impression 
> that by using ASCII you were already conformant with the UTF8 character set.
> Any other unicode characters are instantly available, encoded using the 7bit 
> ASCII character set  (try man -7 utf-8 on linux box). So why the need for 
> restrictions?

So of course you can import ASCII into a Unicode environment, but travel the
other way requires acceptance of the UTF-8 encoding convention as another
permissible encoding in CIF (presumably as an alternative to the existing \a
for alpha scheme in CIF). Is this a development the community would welcome?


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.