Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Comments on Pflugrath's comments

Jim Pflugrath asks a number of questions about cif rules in his response to 
Andy's draft dictionary.

The STAR file structure implies the existence of lines in the file, since
the hash mark # starts a comment which continues to the end of the line,
and the semicolon ; string delimiter for multiline character strings must
be the first character on a line.  I suppose if there were no lines one
would not need to use the semicolon convention, but in that case it would be
necessary to have a second delimiter to mark the end of comments.  The 
rule about a line not exceeding 80 characters is a cif rule.  Shorter 
lines are allowed.  This rule may eventually be relaxed, but not in the 
foreseeable future.

The restriction of data names to 32 characters is also a cif rule (there 
is no restriction in the STAR definition).  There have been a number of 
requests to relax this rule and this may happen for some of the 
dictionaries currently being written.  The problem with relaxing a rule 
of this kind is that existing software may be written only to look at the 
first 32 characters of the name.  There are a number of implications 
that need to be considered before this rule can be abandoned.

The only character that must be the first character of a line is the 
semicolon delimiting a multiline text string.  Otherwise white space at 
the beginning of a line is ignored.  More than one dataname may appear on 
a line and the datavalue need not be on the same line as it its name, 
providing that the two are only separated by blanks, end-of-lines and 
comments (which are to be treated as white space by the reading program).

The rule about having only one occurence of a dataname in a given datablock 
is required so that one knows where the datavalue is to be stored.  If 
multiple occurences of a dataname were allowed, the corresponding 
datavalues would have to be stored in different parts of the memory, but 
how would the program know where to store them?  Would it overwrite the 
information previously stored?  What may be an obvious construction for 
us may not be as easy to program for the general case without a lot of 
implied rules about the context in which the dataitem appears.  Each 
dataitem should be context independent.  This does lead to the need to 
define a lot of additional keywords and may mean using several 
datablocks within a given file since the same dataname can be used in 
different datablocks.  The dictionary (which uses the Dictionary 
Definition Language, which is not the same as cif) stores each definition 
in a separate datablock and thus can repeat names like _*_name and 
_*_definition for each definition.

The correct cif convention for expressing dates and times is the 
international convention namely:

	yyyy-mm-ddThh:mm:ss+zz

The name of such a string should be _*_datetime.  The capital letter T is 
the divider between date and time and zz is the value that needs to be 
subtracted from hh to get international time (formerly Greenwich Mean 
Time).  This string may be truncated at any level.

The cif standard was established to provide two functions: a standard 
interchange file structure and an archival file structure.  Clearly any file 
designed to be archival (i.e. readable on any equipment that may be 
created in the future as far as we are able to forsee) should perform the 
first function, but may not be optimally designed for file transfer.  
Part of the difficulty in the present discussion is that these two 
functions are being confused.  The primary concern of cif is to ensure 
that the files are safely archived, even if this is at the expense of 
efficient handling for file transfer.  The CBF is more concerned with 
file transfer.  I am not sure to what extent archiving is even a 
question, since it seems that these files are mostly deleted once the 
information has been extracted from them.  It might help the discussion 
if the archival question was addressed as distinct from the file transfer 
question which seems to be the most urgent.  Syd Hall's concerns about 
maintaining the integrity of the cif definition as an archival format 
need to be kept in mind.

			David Brown

			Chair of Comcifs


*****************************************************
Dr.I.D.Brown
Brockhouse Institute for Materials Research, 
McMaster University, Hamilton, Ontario, Canada
1-(905)-525-9140 ext 24710
*****************************************************


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.