Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Background to our discussion

  • To: coreCIFchem <corecifchem@iucr.org>
  • Subject: Background to our discussion
  • From: "I. David Brown" <idbrown@mcmail.cis.mcmaster.ca>
  • Date: Wed, 15 Oct 2003 15:55:43 -0400 (EDT)
Dear Colleagues,

     Thank you for agreeing to be part of the coreCIFchem
project.  This email gives an outline of the work we have to do,
some information on the way CIF currently handles chemical
information and suggestions on what we need to do to get started.
Please send any comments to the 'reply to' address before
November 15.

Contents
--------
1. Purpose of the discussion
2. Background
3. Some important consideration
4. Chemical properties that need to be defined
5. The current situation in CIF
6. Getting started

1. Purpose
     The task of this group is to propose a series of coreCIF
data items that describe chemical (as opposed to
crystallographic) properties.  These will be submitted to the
coreCIF Dictionary Maintenance Group (DMG) for inclusion in the
next version of the cif_core dictionary.  In particular we need
to propose appropriate categories for these items and identify
the relationships between them.  We should also consider whether
such a chemical description can simplify the way in which rigid
groups and disorder are described in coreCIF.

2. Background
     Most of the items defined in coreCIF describe the properties
of the crystal.  The atoms listed in the _atom_site loops are
assumed to define some sort of chemical unit, but strictly
speaking the coordinates listed in the CIF only define points in
the unit cell.  The properties of these points are intrinsically
crystallographic (site symmetry, multiplicity, etc.), but we have
chosen to associate each of these points with an atom even when
no atom occupies the site!  Atomless sites are needed, for
example, to identify the center of mass of a molecule or to
define a local coordinate system.  We currently get around this
problem by the artificial and curious expedient of occupying the
site with a dummy atom.

     CoreCIF currently makes no provision for associating atoms
with particular molecules, complexes or other chemical entities,
because there is currently no way to specify these (the moiety
formula lists the moieties present and the entities are implicit
in the connectivity tables, but these are not linked).

     Recent requests for the inclusion of molecular information,
such as Z', in CIF mean that it is time to consider how to
identify molecules and other chemical entities, and how to
associate them with the positions they occupy in the unit cell of
the crystal.

3. Some important considerations.

3.1 Chemical
     Our discussion will focus on the identification of one or
more molecules and their properties as listed in 4. below.  While
such an identification is important for small molecule chemists,
the needs of inorganic chemists must be addressed as well.
Atoms, bonds and molecules are all chemical rather than
crystallographic entities.

3.2 File structures
     It is essential that CIF be kept on a convergent, rather
than a divergent, track.  We therefore need to know how mmCIF
includes chemical information (John Westbrook is a member of this
discussion list can help us here).  We also need to work towards
the time when all CIF dictionaries will use the same advanced
Dictionary Definition Language (DDL).  This will include methods,
which are machine-readable algorithms that a program can use to
calculate the value of an item missing from the CIF.  By
carefully defining the items we can simplify the eventual
conversion from DDL1 to an advanced DDL (Hall and Westbrook will
keep us on track here).

     We must also keep in touch with the chemical community
represented by an IUPAC group which is currently defining items
for universal XML chemistry schema (the XML equivalent of a CIF
dictionary).  There is much to be gained from maintaining
transparency between the different electronic forms in which
chemical information is transmitted and archived.  This requires
that CIF dictionaries and XML schema adopt compatible
definitions.  (I will be attending an IUPAC workshop on this
topic in November and will liaise with this group.  I hope to
find a member of the IUPAC group willing to help us in our
discussions.)

4. Chemical entities and properties we may wish to define:

4.1 Properties related solely to chemistry:

4.1.1 Atoms:  elemental constitution, atomic mass, oxidation
state (formal charge), electronegativity, elemental coordinates
in the periodic table (i.e., period and group), positional
coordinates in a molecular (rather than crystallographic)
coordinate system, atomic radii of various kinds.

4.1.2 Bonds:  Terminal atoms, bond order (bond valence, bond
number etc.), length, maximum bonding length (calculated, e.g.,
from the atomic radii), bond angles, torsion angles.

4.1.3 Molecules: Composition, chirality, optical rotation, formal
charge, pKa, symmetry, dipole moment.

4.2. Properties related to both the chemistry and the
crystallography:

4.2.1 Atoms: scattering factors, disorder, occupation number.

4.2.2 Bonds: Angles defined by crystallographic symmetry.

4.2.3 Rigid Groups: Scattering factors, crystallographic
symmetry, disorder.

4.2.4 Molecules: Crystallographic symmetry, crystallographic
multiplicity, Z'.

4.3 Should we keep the intrinsic chemical properties logically
separated from properties, such as colour, atom size, required
for a molecular display?  Or should the display parameters all be
set in the application?

4.4 Should we include items that might be used in modelling, or
should these all be set in the application?  We need to keep in
mind that refinement against the diffraction pattern is a form of
modelling which is increasingly being combined with refinement
against various chemical criteria, so we may wish to consider
refinement in the context of modelling.

In all the above cases the atoms used in defining the chemical
properties must be linked to the crystallographic positions they
occupy.

5. Current CIF practice.

5.1 Interatomic distances are currently reported in the geom_*
categories using _atom_site_labels defined in atom_site.  These
report distances found in the crystal, usually occurring between
atoms and therefore (inappropriately) divided into bonds and
contacts.  Since the names 'bond' and 'contact' are chemical
concepts, a given distance must arbitrarily be assigned to one or
the other category even when one of the atoms is a dummy, or when
the atoms are well separated and form neither bonds nor contacts.
While this may not present a serious problem in current CIF
usage, it might cause problems as we sharpen the definitions and
rely more heavily on computer manipulation of CIF.

5.2 Chemical information about atoms is given in the atom_type
category but the information in this category is diverse and not
logically organized, e.g.:

5.2.1 File management:
_*_symbol is the link with the atom_site category

5.2.2 Chemical:
_*_description
                                                                           _*_oxidation_number has an open definition which can be interpreted in
           many different ways
_*_radius is defined as the 'effective intra- and intermolecular
          bonding radius'.  We should be able to come up with a better
     definition and maybe define other kinds of radii suitable
     for other purposes.

5.2.3 Chemical and crystallographic
_*_analytical_meas_% is closely related to
          chemical_formula_analytical.  It, and the next item, refer
     to the whole cell, while other items in this category refer
     to individual atoms.
_*_number_in_cell
_*_scat_* consists of a variety of items that give information on
          the scattering factors assumed during the refinement.  Do
     these belong more logically in the refine categories since
     they are part of the refined model rather than intrinsic
     properties of an atom?

5.3 The chemical_conn_atom and chemical_conn_bond categories can
be used to describe the chemical connectivity.  There is no
provision for defining more than one moiety except by defining
disconnected units.  The atoms in these categories are children
of the parent _atom_site_chemical_conn_number in the atom_site
category.  Is this the logically correct hierarchy?

5.4 _chemical_optical_rotation is an elaborate parsable string
that refers explicitly to 'the optical rotation in solution of
the compound', the implication being that the crystal contains
only a single molecular species.  It is the only item in the
chemical category that is not related to the crystal.

6. Getting started
     John Westbrook may be able to give us a quick introduction to
the way this kind of information is handled in mmCIF.  The
nature of the molecules are different, but we might benefit from
mmCIF experience, and we need to maintain coherence with mmCIF.

     I will find out about the IUPAC project and report back.

     Once we are supplied with these items of information we
should be in a better state to move forward.

                    Best wishes

                         David


*****************************************************
Dr.I.David Brown,  Professor Emeritus
Brockhouse Institute for Materials Research,
McMaster University, Hamilton, Ontario, Canada
Tel: 1-(905)-525-9140 ext 24710
Fax: 1-(905)-521-2773
idbrown@mcmaster.ca
*****************************************************

_______________________________________________
coreCIFchem mailing list
coreCIFchem@iucr.org
http://scripts.iucr.org/mailman/listinfo/corecifchem

[Send comment to list secretary]
[Reply to list (subscribers only)]