Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF development

  • Subject: Re: CIF development
  • From: Brian McMahon <bm@xxxxxxxx>
  • Date: Mon, 8 May 2000 12:25:13 +0100 (BST)
Dear Mario

Thank your for your comments and suggestions. Although many of these are
specific to Acta Cryst. C, there are some general points that may be of
particular interest to this list.
 
     >    (1) The names of the items should never be 
     > changed. If new items are added, the possibility 
     > of accepting the old ones should be kept all the 
     > time. CIF must be stable over time!   


It has always been a principle of CIF that old data names should always
remain accessible. However, sometimes a new data name may be proposed that 
differs slightly in its meaning from an older one, and use of the older one
is then discouraged. ("Discouraged" might mean, for example, that the old
data name is no longer cited in Notes for Authors.) Nonetheless, a file
containing the "old" data name should still be able to be read and
understood. The way in which this is assured at present is to retain the old
data name in the CIF Core dictionary, with a suitable annotation. 

For example, to reflect the shift in nomenclature from estimated
standard deviations to standard uncertainties, the new data name
_diffrn_refln_intensity_u is preferred to the old
_diffrn_refln_intensity_sigma. This is expressed in version 2.1 of
the Core dictionary in the following way:

data_diffrn_refln_intensity_sigma
    _name                      '_diffrn_refln_intensity_sigma'
    _category                    diffrn_refln
    _type                        numb
    _related_item              '_diffrn_refln_intensity_u'
    _related_function            replace
    _list                        yes
    _list_reference            '_diffrn_refln_index_'
    _enumeration_range           0:
    _definition
;              Standard uncertainty (e.s.d.) of the net intensity calculated
               from the diffraction counts after the attenuator and standard
               scales have been applied.
;
 
data_diffrn_refln_intensity_u
    _name                      '_diffrn_refln_intensity_u'
    _category                    diffrn_refln
    _type                        numb
    _related_item              '_diffrn_refln_intensity_sigma'
    _related_function            alternate
    _list                        yes
    _list_reference            '_diffrn_refln_index_'
    _enumeration_range           0:
    _definition
;              Standard uncertainty of the net intensity calculated from
               the diffraction counts after the attenuator and standard
               scales have been applied.
;

The _related_item/_related_function values indicate that the _u data name
should be perceived as replacing the _sigma one (essentially a directive to
CIF writing routines), but that the _sigma value is a permitted alternative
to the _u one (a directive to CIF reading routines).

To prevent the Core dictionary from becoming too cluttered over time
with "obsolete" (though still valid) data names, I propose that the
older data names be moved to a separate dictionary. Software to read
all prior CIFs must then have access to both dictionaries. COMCIFS 
will soon release a protocol that will establish the location and access
instructions for multiple dictionaries.

Traditionally, many software authors have read the CIF dictionaries
closely and implemented their content by hand. As the number and size
of dictionaries increase, it will become more difficult to do that,
and I think we shall need a set of community tools (in all the popular
programming languages) to validate CIFs against the evolving registry
of dictionaries. When the dictionary location and retrieval protocol is
published, I shall certainly be looking for volunteeres to implement it
in standalone utilities and subroutine libraries.


     >    (2) Data in loops are frequently given in an
     > unformatted way (see e.g. SHELXL), so it is 
     > difficult to check for the presence of improper
     > characters or uncorrect sequences of data. It
     > would be advisable to have, from the checkcif
     > procedure, an ASCII output giving CIF with all 
     > its items in an ordered format so the authors 
     > can check in a easier way their files for grammar
     > errors, like improper dots, semicolons, tildes,
     > circumflexes, blanks, etc.

That's not a bad idea, and we shall have a think in Chester about
whether we can set up something along those lines. However, perhaps
a better tool for doing this is the CIF editor that Owen Johnson and his
colleagues are working on at the Cambridge Crystallographic Data
Centre. This will run on a user's own computer and indicate syntax
errors through a variety of pop-up alerts, icons, text colouring and
other visual aids. It is hoped that this will be available for beta
testing later this year.

 
     >    (3) Tables should be made possible where data
     > from different structures can be easily compared.
     > This could be particularly useful for polystructure 
     > CIF's.

Certainly that is highly desirable - it has been a longstanding project
for the Chester office that we have never been able to devote sufficient
time and effort to investigating. However, the general point here is
the need for a formalism within a CIF to relate the data blocks, so that
a program can understand the relationship between the different data blocks.

The AUDIT_LINK category (_audit_link_block_code, _audit_link_block_description)
allows this to be done in a loose way, by associating textual descriptions
with a list of related datablock names. I think we also need a machine-readable
set of codes to express the types of relationships that can be handled by
program, and any suggestions as to the relevant types of relationships that
we could tag in this way will be welcome. 

(Certain types of relationship are handled by the powder and draft modulated
structures dictionaries, but it might be be preferable to have a general
mechanism.)
 
     >    (4) Unnecessary and disturbing comments should
     > be avoided.

Noted - essentially as an editorial matter; but it might be useful to remind
developers that there is no obligation to parse or copy comments within 
CIFs, and comments should not be used to convey information between
applications.

     >    (5) Fortran routines connecting CIF with 
     > PARST97 and viceversa are available from my web page:
     >    http://www.unipr.it/~nardelli/software.html


Best wishes
Brian

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.