Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

coreCIFchem #5

  • To: Chemical information in core CIF <corecifchem@iucr.org>
  • Subject: coreCIFchem #5
  • From: David Brown <idbrown@mcmaster.ca>
  • Date: Fri, 25 Jun 2004 10:13:24 -0400
Dear Colleagues,

     Even with an extended deadline I did not get many responses from the
last mailing to coreCIFchem (Discussion Paper #4) though the ones I did
receive had perceptive comments that have been incorporated into the present
revised proposal.  We are making considerable progress.  The framework for
reporting the chemical structure is becoming clear and we now need to turn
attention to some of the details.  Reading through the previous proposal 
(#4)
I can understand why the response was low.  This present email contains a
simplified and more flexible proposal which I hope you will find easier to
understand, even though it occupies 21 printed pages.  Much of this is,
however, explanatory notes accompanying the two sample CIFs.

     For the sake of keeping the project moving I am setting AUGUST 31, 2004
as the deadline for responses, but this deadline can be extended if you need
more time.  PLEASE SEND YOUR COMMENTS TO coreCIFchem@iucr.org

                    David
                                                           
          COMMENTS RECEIVED ON DISCUSSION PAPER #4

I received substantive comments from two people:

GREG SHIELDS wrote:
-------------------
Sorry to bring up the issue of disorder again, but I have been thinking 
about
the latest proposals, in particular in how they relate to more complex
examples of disordered molecules. Whilst I think I can understand how  they
would be applied to relatively simple cases of disorder, I am not sure how
this would be extended to larger molecules with a number of independently
disordered groups. In such cases, there could be a large number of possible
combinations of the submappings for each of the disordered groups in one
molecule, and hence a large total number of mappings for the whole 
molecule. I
am concerned that the description could become unwieldy in such cases, and
perhaps we may need to consider only storing the mapping of the atoms common
to all configurations once, along with one configuration of each of the
disordered groups, using sub-mappings to describe the other 
configurations of
the disordered groups.

I expect you have already considered these problems, and I think it would be
useful to know how you were considering dealing with more complex cases in
these proposals. I have attached a portion of a CIF describing a 
molecule with
twofold disorder in two propylene bridges (with occupancies 0.62/0.38 and
0.55/0.45 respectively) as an example slightly more complex cases (this is
taken from a CIF deposited at the CCDC - Journal : Dalton Trans. (0222), P:
2872, Y: 2003;
Authors: L.Salmon,P.Thuery,E.Riviere,J.-J.Girerd,M.Ephritikhine).
I would be interested to hear how this would be described in the proposed
framework.

IDB Reply
---------
You are right to raise this problem and it was one that had occurred to 
me.  I
think you will find the latest proposal more satisfactory since it does not
require the author to specify how the disordered atoms sites are combined in
the individual molecules, something that cannot be determined from the
diffraction experiment.  If desired, these combinations can be specified 
using
the existing items _atom_site_disorder_assembly and 
_atom_site_disorder_group
or using the proposed new molecular_geom loops.

HOWARD FLACK wrote
------------------
  I see things a little differently from David concerning

 >>GREG:
 >> - is it possible that a molecule may have a lower symmetry
 >> order than the site symmetry which it occupies in such cases ?

For me the answer is definitely yes. La coupe du Roi (a way of cutting an
apple into two homochiral halves) is just such a case. A structure 
containing
molecules which are enantiopure but disordered may well shown up an average
disordered 'molecule' which is achiral.

IDB response
------------
While symmetry is interesting subject and is important in any analysis of a
crystal structure, the CIF proposals as they are developing do not currently
include an hierarchical specification of symmetry.  I suggest that we do not
distract ourselves by spending time on the nature of the relationship 
between
crystal and molecular symmetry.

HOWARD FLACK continued
----------------------
  Somehow or other I was worried that the proposed scheme did not allow for
alternate descriptions of the topology of the same structure and that 
for the
disorderd TNT case we ended up with almost duplicating information in the
FORMULA_UNIT_ATOM and FORMULA_UNIT_BOND loops for the two molecules and then
being forced to map them on to each other in the FU2FU loop. My 
suggestion is
to remove all topology information from the FORMULA group and incorporate it
into the MOLECULAR group as the highest (first) level of the topology
description. The FORMULA group would be reduced to just the 
FORMULA_UNIT_ATOM
loop and would be an atomic description only of the formula unit. The
MOLECULAR_UNIT group remains essentially the same but contains the topology
information of the highest level which in David's text is incorporated 
in the
FORMULA unit. In the MAPPING section FU2FU is not needed but FU2MU needs an
index to identify each mapping (i.e. each set of atomic mappings) so that a
loop containing the molecular information which David put in the 
FORMULA_UNIT
loop can be implemented. It seems to me that these changes allows all of the
information that David correctly wishes to capture but in a simplified form.
In the disordered TNT example there would be just one MOLECULAR_UNIT of 
index
1 being one complete TNT molecule. There would be two mappings (each 
with its
own index), mapping this one TNT molecule onto two different set of atoms in
the FORMULA unit although these two sets need not be disjoint. These FU2MU
mappings each need an index because although the topologies of the two
molecules are identical they may have different conformations or chirality.

I suspect that these changes also make it easier to work this system in
conjunction with a database of molecules and molecular fragments which 
is the
only way that it would be used in practice. It also allows distinct
descriptions of the topology of one structure to be captured.


In a private email Howard added:
In #4 you write "The second question was whether the geometry should be
combined with the description of the graph of the molecular units and 
this too
has been adopted." I managed to convince myself that combining the topology
and the geometry information was NOT the best way to go. It seems to me that
combining the two sets of information is perfectly adequate for the
'one-structure-at-a-time' boys. However when one is engaged in a study of
several crystal structures of several related compounds (molecules), I 
can see
great merit in keeping the topoloy and the geometry information 
separate. This
is because to one topological map there may correspond several distinct 
types
of geometry. (e.g. side chain rotation, rotation about a crucial C - C 
bond so
one geometry is eclipsed, another skew, gauche or staggered etc).

IDB Response
------------------
I have tried to incorporate these ideas into the new proposal.  FORMULA_UNIT
has now disappeared entirely.  The topologies are given in the 
molecular_unit
categories and the geometries in the molecular_geom categories.  Either of
these can be mapped onto the crystal.

                 WHAT IS NEW IN VERSION #5?

Others besides Greg have also pointed out that version #4 was unwieldy and
would likely be rarely used if we approved it in that form.  Version #5 
given
below is much simpler and more flexible.  The formula_unit categories that
were included in version 4 have been deleted, making the overall structure
simpler and easier to follow. Of course the crystallographic description of
the structure will continue to be given in the _atom_site and _geom
categories.  The chemical topology is given in the molecular_unit categories
in the form of a graph with atoms connected by bonds.  Theses categories are
very flexible: It is possible to define one or more molecules, submolecular
units such as functional groups or complex ions, and the graphs may include
part or all of the contents of the crystal.  It is also possible to describe
infinitely connected structures by specifying the graph of a finite formula
unit.  A separate set of categories, _molecular_geom, is used to give the
geometries of the molecular units as recommended by Howard.  Mapping the
molecular units onto the crystallographic description however presents some
complications.  The obvious way is to include pointers to the corresponding
atom in the molecular_unit (or moleclar_geom) categories in the 
atom_site loop
(or vice versa) but this is not always possible.  For example the asymmetric
unit in the crystallographic description may contain only a part of the
molecule, disorder may require that the chemical graph be mapped onto more
than one set of atoms in the crystal, and an infinite graph contains bonds
that link the (finite) molecular units together as well as bonds that link
atoms within the molecular unit.  This means that it is not possible in
general to include the molecular mapping directly in the atom_site category
nor is it possible to point to the atoms of the crystal from the
molecular_unit categories.  Therefore a separate set of mapping categories,
mol2xtl_map, is needed to provide the required flexibility.

--------------
PREAMBLE TO THE REPORT TO THE CORE DICTIONARY MAINTENANCE GROUP

The crystallographic information given in a CIF consists of the atomic
coordinates of the asymmetric unit, the symmetry operations needed to 
generate
all the atoms in the crystal, the lattice parameters and the interatomic
distances.  The present proposal provides a means of giving a chemical
description of the structure in the form of the bonding topology (given 
in the
proposed molecular_unit categories) and the ideal geometry (given in the
proposed molecular_geom categories).  The molecular and crystallographic
descriptions are mapped on to each other using items in the mol2xtl_map
categories.

A chemical description of the contents of the crystal, distinct from the
crystallographic description, serves a variety of uses.  It describes the
contents of the crystal in the language of chemistry rather than the 
language
of crystallography thus allowing the structure determinations of crystals
containing particular molecules or fragments to be searched out and the 
atomic
coordinates of the relevant atoms to be retrieved.  Further the crystal
structure determination does not itself identify which atoms are bonded.  A
topological description of the bonding network supplements the information
given in the _geom_bond loop which, in spite of its name, gives interatomic
distances without necessarily requiring that they correspond to chemical
bonds.  The topological description is useful to identify different 
molecules
in a crystal, or crystallographically distinct copies of the same molecule.
Finally the proposed chemical description allows the ideal geometry and
conformation of the molecular units to be specified - information which 
can be
used during the refinement of the crystal structure of for validating the
experimental bond distances and angles. 

The set of molecular_unit categories gives a description of one or more 
ideal
chemical structures (molecular units) in the form of a topological graph,
i.e., a list of atoms and the connections that the bonds form between them.

  MOLECULAR_UNIT         Lists the different molecular units described
  MOLECULAR_UNIT_ATOM    Lists the atoms in each molecular unit
  MOLECULAR_UNIT_BOND    Lists the bonds between the atoms

This allows a flexible description for defining one or more molecular units
(e.g., molecules, formula units, charged or neutral complexes).  Chemical
properties can be assigned to each atom, and bonds can be assigned as links
between two atoms.  However, the topological description given here does not
include the geometry which is given in the molecular_geom categories.  
None of
the information given in the molecular_unit categories is derived directly
from the crystal structure: it is supplied by the author by way of a 
chemical
interpretation.  It is not necessary that the molecular units account 
for all
the atoms found in the crystal structure, nor that the crystal structure
contain all the atoms specified in the molecular units.  The contents of the
crystal may be described in terms of more than one molecular unit, and a
hierarchy of molecular units may be defined, with, for example, one 
molecular
unit describing a functional group such as a carboxyl group while another
molecular unit specifies a complex, such as an acetate ion, that 
contains the
functional group.

The decision as to what constitutes a MOLECULAR UNIT is left entirely to the
author but in the case of the infinitely bonded solids typically found in
inorganic compounds the molecular unit would normally be the formula unit.
This is the smallest group of atoms that contains all the chemical 
elements in
the same proportions as they are found in the crystal.  The chemical formula
of this unit will normally contain only integer multipliers, but in cases
where this is not possible, e.g., in non-stoichiometric crystals such as
minerals, the size of the formula unit is necessarily arbitrary.  It must be
at least as large as the asymmetric unit of the crystal and normally no 
larger
than the primitive unit cell. 

The molecular_unit categories give only the topology of the molecules.  The
conformation and geometry are given in the molecular_geom categories because
it is possible for a given topology to correspond to more than one 
conformer,
e.g., cis and trans isomers:

  MOLECULAR_GEOM
  MOLECULAR_GEOM_ATOM
  MOLECULAR_GEOM_BOND
  MOLECULAR_GEOM_ANGLE
  MOLECULAR_GEOM_TORSION

The geometry may be given by specifying atomic coordinates in a rectangular
Cartesian coordinate system of arbitrary orientation, or by giving bond
lengths, angles and torsion angles.  The atoms and bonds of the molecular
units are mapped directly onto the descriptions of the geometry.

The MOL2MOL_MAP_ATOM category allows the atoms of different 
molecular_units to
be mapped onto each other.  This feature will likely not be used often.

Mapping the graph of a molecular unit onto the crystal structure is less
straightforward because of problems that result from disorder,
crystallographic symmetry and infinitely bond graphs.  For this reason a
special set of MOL2XTL_MAP categories is defined to allow some or all of the
atoms in the molecular_unit or molecular_geom categories to be mapped 
onto the
atom_site categories.

  MOL2XTL_MAP   
  MOL2XTL_MAP_ATOM   Maps the atoms of the molecular unit to the crystal
  MOL2XTL_MAP_BOND   Maps the bonds of the molecular unit to the crystal

Details of the definitions of the molecular units and their mappings are
illustrated by two sample CIFs, one of an organic molecule, the other an
infinitely connected inorganic solid.

                            SAMPLE CIFS
                            -----------
The first CIF describes the structure of the molecule trinitrotoluene, TNT.
It shows how a finite molecular graph is handled when the molecule lies on a
Wyckoff special position and two of the nitro groups are disordered.  By way
of illustration, several subunits of the molecule are also defined and are
mapped onto the molecule itself. 

The second example describes the structure of CaCrF5 which has an infinite
bond graph, and a formula unit that includes more than one asymmetric unit.

[Editorial comment: The sample CIFs are intended only to show the 
organization
of the information.  Data names may be changed in the final report and
dictionary definitions will eventually be needed but it is simpler to 
discuss
the CIF structure in terms of annotated examples.  Suggestions for better
names are welcome.  Items marked as 'list reference' are required for the
management of the relational file structure and must be unique for each line
in the list.  The list reference item in one loop is frequently parent to
similarly named items in other loops.]

                        FIRST EXAMPLE
                        -------------

                       TRINITROTOLUENE

                         O    CH3  0
                         |    |    |
                   O --- N2   C1   N6 --- O
                          \  /  \ /
                            C2  C6
                            |    |
                       H -- C3  C5 -- H
                             \   /
                              C4
                               |
                               N4
                              / \
                             O   O

In the fictitious crystal structure I have invented for the purposes of this
illustration, there is a crystallographic mirror plane perpendicular to the
benzene ring that includes the methyl group and the N4 nitro group.  The N2
and N6 nitro groups are related by the mirror plane and are disordered with
the two components having occupation numbers of 0.5.  Because of the 
disorder.
the crystallographic result does not define the point group of an individual
molecule.  By choosing one combination of the disordered nitro groups the
molecule would have Cs symmetry, but by choosing the opposite 
combination the
individual molecules would have C1 symmetry.  Both or either combination may
of course be present in the real crystal but x-ray diffraction cannot
distinguish between them.

############# Beginning of first CIF #############
#
#
data_disordered_TNT
#
# The first set of loops define the topology of the TNT molecule
# (molecular_unit 1) and two fragments of the molecule (molecular_units 
2 and
# 3).  The fragment definitions likely would not often be used but are
# included here by was of illustration. 
#
# If a crystal contained molecules of more than one compound, each would be
# described as a separate molecular unit.  If the crystal contained more 
than
# one copy of the same molecule in the asymmetric unit (Z'>1) the 
topology of
# the molecular unit would be given only once. 
#
# The items in each loop belong to the same category whose name forms the
# first part of the datanames of all items in the loop.
#
# The list reference items in each loop are unique for each line and are 
here
# given sequential numbers which is satisfactory for computer analysis but
# makes a visual inspection of the mappings more difficult.  However, 
the list
# reference items could be constructed from, e.g., the 
_molecular_unit_id and
# the _molecular_unit_atom_label since the contents of the _*_id character
# string may have any value so long as it is unique within the list.
#
# The part of the CIF describing the crystallographic structure is omitted,
# but its description should be self-evident in the mapping loops.
#
############################################################
#          DEFINING THE MOLECULAR UNITS
#
# The first loop lists the different molecular units that are being defined
# together their properties.  We may wish to define other properties besides
# those shown here.
#
loop_
_molecular_unit_id          # List reference
_molecular_unit_name
_molecular_unit_formula
_molecular_unit_point_group
_moelcular_unit_Zprime
_molecular_unit_details
 1  'trinitrotoluene' 'C7 H5 N3 O6' m   1 'This is the whole molecule'
 2  'benzene ring'    'C6 H2'       mm2 1 'A portion of the TNT molecule'
 3  'nitro group'     'N O2'        mm2 3
               'A group that appears three times in the TNT molecule'
#
# The atoms that form the molecular units are listed in the
# MOLECULAR_UNIT_ATOM category.  Atomic properties relevant to the topology
# are listed here, but properties related to the geometry are listed in the
# molecular_geom_atom category. 
#
# What I have called atom_valence represents the number of valence electrons
# used in bonding.  ******* What is the best name for this?  Formal 
oxidation
# state? **********
#
# The dictionary already contains instructions for drawing a 2-D molecular
# diagram in the set of chemical_conn categories.  Although the 
chemical_conn
# categories also describe the topology of a molecule they are not a
# substitute for the molecular_unit categories because they are 
restricted to
# organic molecules, they are designed only to display a molecular 
diagram and
# the atoms are not mapped onto the atom sites in the crystal.  It would,
# however, be possible to include an item
# _molecular_unit_atom_conn_atom_number as a child of
# _chemical_conn_atom_number in the following loop to allow the 
molecular unit
# to be mapped to chemical_conn and hence plotted as a 2-D diagram.
#
loop_
_molecular_unit_atom_id              # List reference
_molecular_unit_atom_mu_id           # Child of _molecular_unit_id
_molecular_unit_atom_label
_molecular_unit_atom_atom_type_symbol     # Child of _atom_type_symbol
_molecular_unit_atom_valence
_molecular_unit_atom_coord_number
_molecular_unit_atom_details
1  1   C1    C    4   3  ?
2  1   C2    C    4   3  ?
3  1   C3    C    4   3  ?
4  1   C4    C    4   3  ?
5  1   C5    C    4   3  ?
6  1   C6    C    4   3  ?
7  1   C7    C    4   4  ?
8  1   H71   H    1   1  ?
9  1   H72   H    1   1  ?
10 1   H73   H    1   1  ?
11 1   N1    N    3   3  ?
12 1   O1    O    2   1  ?
13 1   O2    O    2   1  ?
14 1   N1    N    3   3  ?
15 1   O1    O    2   1  ?
16 1   O2    O    2   1  ?
17 1   N1    N    3   3  ?
18 1   O1    O    2   1  ?
19 1   O2    O    2   1  ?
#
# The above items define all the atoms in the molecule.  The remaining items
# in this list show how parts of the molecule might be described as separate
# molecular units.
#
20 2   C1    C    4   3  Benzene
21 2   C2    C    4   3  Benzene
22 2   C3    C    4   3  Benzene
23 2   C4    C    4   3  Benzene
24 2   C5    C    4   3  Benzene
25 2   C6    C    4   3  Benzene

26 3   N1    N    3   3  'Nitro group'
27 3   O1    O    2   1  'Nitro group'
28 3   O2    O    2   1  'Nitro group'
#
# The next loop defines the bonds in each of the molecular units, again 
giving
# just the topological properties of the bond, not the geometry.  In some
# cases, e.g., in polar compounds, the order of the atoms may be important
# (see the next example).  This problem has not been addressed in the 
current
# proposal.  ******** How would one indicate that the direction was or 
was not
# important? ******
#
# In the MOLECULAR_UNIT_BOND categories the atoms are referred to by their
# _*_atom_id which in this example are sequential numbers. 
# However, _molecular_unit_bond_atom_id can be
# composed of any characters and the user could choose to construct
# _molecular_unit_atom_id out of the _molecular_unit_id and the atom
# label to make the lists easier for humans to understand (the computer does
# not care which system is used).  In this case the first three rows of the
# previous table might look like:
#   1C1  1   C1    C    4   3  ?
#   1C2  1   C2    C    4   3  ?
#   1C3  1   C3    C    4   3  ?
# and the first three rows of the following table might look like:
#    1C1C2  1C1   1C2   1.5  delocalized     # TNT Benzene ring
#    1C2C3  1C2   1C3   1.5  delocalized
#    1C3C4  1C3   1C4   1.5  delocalized
#    etc.
# The definition of bond order is left to the user, but we may wish to
# define other items corresponding to particular definitions of bond order
# based on the method by which the bond order is determined.  For
# example bond orders derived using Kirchhoff-like network equations can be
# derived directly form the topology and would therefore be appropriate to
# include here.  Other definitions are based on quantum mechanical
# calculations which normally require a knowledge of the geometry and are
# therefore less appropriate for inclusion here but still useful.
#
loop_
_molecular_unit_bond_id            # List reference
_molecular_unit_bond_atom_id_1     # Child of _molecular_unit_atom_id
_molecular_unit_bond_atom_id_2     # Child of _molecular_unit_atom_id
_molecular_unit_bond_order
_molecular_unit_bond_type
1   1   2   1.5  delocalized     # TNT Benzene ring
2   2   3   1.5  delocalized
3   3   4   1.5  delocalized
4   4   5   1.5  delocalized
5   5   6   1.5  delocalized
6   6   1   1.5  delocalized
7   1   7   1.0  single          # TNT Methyl group
8   7   8   1.0  single
9   7   9   1.0  single
10  7  10   1.0  single
11  2  11   1.5  delocalized     # TNT N2 nitro group
12  11 12   2.0  double
13  11 13   2.0  double
14  4  14   1.5  delocalized     # TNT N4 nitro group
15  14 15   2.0  double
16  14 16   2.0  double
17  6  17   1.5  delocalized     # TNT N6 nitro group
18  17 18   2.0  double
19  17 19   2.0  double
#
# The rest of this loop lists the bonds in the benzene ring (20-25) and 
nitro
# group (26-27) molecular units.
#
20  20 21   1.5  delocalized      # Benzene ring
21  21 22   1.5  delocalized
22  22 23   1.5  delocalized
23  23 24   1.5  delocalized
24  24 25   1.5  delocalized
25  25 20   1.5  delocalized
26  26 27   2.0  double           # Nitro group
27  26 28   2.0  double
#
###########################################################
#       DEFINING THE MOLECULAR CONFORMERS AND GEOMETRY
#
# The disordered nitro groups can be combined in four different ways, 
a-a and
# b-b (both with Cs symmetry), and a-b and b-a (both having C1 symmetry, one
# being the enatiomer of the other).  For illustrative purposes only the a-b
# and a-a conformers are described here.  It is, of course, not necessary to
# identify which conformers are present if this is not known.  The 
crystal can
# be mapped directly to molecular_unit rather than the conformer given
# in molecular_geom.
#
# The ideal geometries of the conformers differ only in the torsion angles,
# but the complete ideal geometry for each conformer is defined here. 
#
# The geometries of molecular units 2 and 3 are not given.
#
# The source of the bond lengths and angles could be given in the
# _molecular_geom_details field.
#
loop_
_molecular_geom_id                   # List reference
_molecular_geom_point_group               # Point group of molecule
_molecular_geom_mu_id                # Child of _molecular_unit_id
_molecular_geom_details
1  C1 1 'Expected geometry of TNT C1'
2  Cs 1 'Expected geometry of TNT Cs'
#
# In this example the geometry of the benzene ring is defined by atomic
# coordinates, the remaining geometries are defined by their bonds and
# angles.
#
# The basis for the orthogonal coordinates is given in Angstroms but its
# orientation is arbitrary.  It is up to the programmer to decide the 
best way
# to use this information. 
#
loop_
_molecular_geom_atom_id                   # List reference
_molecular_geom_atom_geom_id              # Child of _molecular_geom_id
_molecular_geom_atom_mu_atom_id      # Child of _molecular_unit_atom_id
_molecular_geom_atom_mu_atom_coord_x # Coordinates of atom in Angstrom
_molecular_geom_atom_mu_atom_coord_y
_molecular_geom_atom_mu_atom_coord_z
_molecular_geom_atom_mu_atom_details

1  1 1 0.037  0.146  -0.124 Benzene
2  1 2 1.378  0.562   0.134 Benzene
3  1 3 1.846  1.421   0.204 Benzene
4  1 4 2.567  1.834   0.304 Benzene
5  1 5 1.745  1.563   0.245 Benzene
6  1 6 0.962  0 498   0.103 Benzene
7  1 7 ? ? ? methyl
8  1 8 ? ? ? methyl
9  1 9 ? ? ? methyl
10 1 10 ? ? ? methyl
11 1 11 ? ? ? N2_nitro
12 1 12 ? ? ? N2_nitro
13 1 13 ? ? ? N2_nitro
14 1 14 ? ? ? N4_nitro
15 1 15 ? ? ? N4_nitro
16 1 16 ? ? ? N4_nitro
17 1 17 ? ? ? N6_nitro
18 1 18 ? ? ? N6_nitro
19 1 19 ? ? ? N6_nitro
#
# The geometry of the second conformer follows
#
20 2 1 0.037  0.146  -0.124 Benzene
21 2 2 1.378  0.562   0.134 Benzene
22 2 3 1.846  1.421   0.204 Benzene
23 2 4 2.567  1.834   0.304 Benzene
24 2 5 1.745  1.563   0.245 Benzene
25 2 6 0.962  0 498   0.103 Benzene
26 2 7 ? ? ? methyl
27 2 8 ? ? ? methyl
28 2 9 ? ? ? methyl
29 2 10 ? ? ? methyl
30 2 11 ? ? ? N2_nitro
31 2 12 ? ? ? N2_nitro
32 2 13 ? ? ? N2_nitro
33 2 14 ? ? ? N4_nitro
34 2 15 ? ? ? N4_nitro
35 2 16 ? ? ? N4_nitro
36 2 17 ? ? ? N6_nitro
37 2 18 ? ? ? N6_nitro
38 2 19 ? ? ? N6_nitro
#
# Ideal bond lengths are given for each of the conformers defined above.
# Since the benzene rings are defined by their coordinates, their bond 
lengths
# are not given here.  The distances here are not those derived from the
# crystal structure determination but are those expected by the author.  As
# above, the definition of the bond orders would be left to the author's
# discretion or could be omitted if given in the description of the topolgy
# (They are, of course, optional in both places).
#
_molecular_geom_bond_id              # List reference
_molecular_geom_bond_atom1_id        # Child of _molecule_geom_atom_id
_molecular_geom_bond_atom2_id        # Child of _molecule_geom_atom_id
_molecular_geom_bond_distance        # Bond distance in Angstroms
_molecular_geom_bond_order
_molecular_geom_bond_details
1  1   2 ?     1.5  delocalized     # TNT1 Benzene ring
2  2   3 ?     1.5  delocalized
3  3   4 ?     1.5  delocalized
4  4   5 ?     1.5  delocalized
5  5   6 ?     1.5  delocalized
6  6   1 ?     1.5  delocalized
7  1   7 1.54  1.0  single          # TNT1 Methyl group
8  7   8 1.05  1.0  single
9  7   9 1.05  1.0  single
10 7  10 1.05  1.0  single
11 2  11 1.43  1.5  delocalized     # TNT1 N2 nitro group
12 11 12 1.18  2.0  double
13 11 13 1.18  2.0  double
14 4  14 1.43  1.5  delocalized     # TNT1 N4 nitro group
15 14 15 1.18  2.0  double
16 14 16 1.18  2.0  double
17 6  17 1.43  1.5  delocalized     # TNT1 N6 nitro group
18 17 18 1.18  2.0  double
19 17 19 1.18  2.0  double
#
# The bonds in the second conformer follow.  The bond lengths are the 
same as
# in the first conformer.
#
20 20 21 ?     1.5  delocalized     # TNT2 Benzene ring
21 21 22 ?     1.5  delocalized
22 22 23 ?     1.5  delocalized
23 23 24 ?     1.5  delocalized
24 24 25 ?     1.5  delocalized
25 25 20 ?     1.5  delocalized
26 21 26 1.54  1.0  single          # TNT2 Methyl group
27 26 27 1.05  1.0  single
28 26 28 1.05  1.0  single
29 26 29 1.05  1.0  single
30 21 30 1.43  1.5  delocalized     # TNT2 N2 nitro group
31 30 31 1.18  2.0  double
32 30 32 1.18  2.0  double
33 23 33 1.43  1.5  delocalized     # TNT2 N4 nitro group
34 33 34 1.18  2.0  double
35 33 35 1.18  2.0  double
36 25 36 1.43  1.5  delocalized     # TNT2 N6 nitro group
37 36 37 1.18  2.0  double
38 36 38 1.18  2.0  double
#
# The bond angles follow.  Again these are the same for both conformers.
# Sufficient angles should be given to define the geometry uniquely.  
Probably
# not enough angles are given in this example
#
loop_
_molecular_geom_angle_id             # List reference
_molecular_geom_angle_bond1_id       # Child of _molecular_unit_bond_id
_molecular_geom_angle_bond2_id       # Child of _molecular_unit_bond_id
_molecular_geom_angle_angle          # Bond angle in degrees
1  8  9  109             # TNT1 Methyl group
2  8  10 109
3  9  10 109
4  7  8  109
5  7  9  109
6  7  10 109
7  11 12 117             # TNT1 N2 nitro group
8  11 13 117
9  12 13 125
10 14 15 117             # TNT1 N4 nitro group
11 14 16 117
12 15 16 125
13 17 18 117             # TNT1 N6 nitro group
14 17 19 117
15 18 19 125
#
# The second conformer follows
#
16 27 28 109             # TNT2 Methyl group
17 27 29 109
18 28 29 109
19 26 27 109
20 26 28 109
21 26 29 109
22 30 31 117             # TNT2 N2 nitro group
23 30 32 117
24 31 32 125
25 33 34 117             # TNT2 N4 nitro group
26 33 35 117
27 34 35 125
28 36 37 117             # TNT2 N6 nitro group
29 36 38 117
30 37 38 125
#
# The two conformers, formed by taking different combinations of the two
# disordered nitro groups, differ in their torsion angles.  I have 
defined the
# torsion angles in terms of the three bonds.  *******Would it be better to
# define them in terms of the four atoms?  Is there an ambiguity about the
# direction of the bonds? *******
#
loop_
_molecule_geom_torsion_id       # List reference
_molecule_geom_torsion_bond1_id # Child of _molecule_geom_bond_id
_molecule_geom_torsion_bond2_id # Child of _molecule_geom_bond_id
_molecule_geom_torsion_bond3_id # Child of _molecule_geom_bond_id
_molecule_geom_torsion_angle         # Torsion angle in degrees
1 12 11  1   10.5          # TNT1 N2-N6 C1 conformer
2 11  1  2    0
3  1  2 17    0
4  2 17 18   10.5
5 31 30 20   10.5          # TNT2 N2-N6 Cs conformer
6 30 20 21    0 
7 20 21 36    0
8 21 26 37  -10.5
#
############################################################
#           MAPPING THE TOPOLOGIES ON TO EACH OTHER
#
# The next loop maps the atoms of molecular units 2 and 3 onto molecular 
unit
# 1.  This will not often be needed but is included to show that it is
# possible.
# It is only necessary to map the atoms, since there is no ambiguity
# about where the bonds occur as long as the molecular_units are finite.
# See the second example for an infinitely connected crystal.
#
# Since this mapping essentially equivalences two atoms, the order of the
# _*_ids is not important.
#
loop_
_mol2mol_map_id_1              # List reference
_mol2mol_map_atom_id_1         # Child of _molecular_unit_atom_id
_mol2mol_map_atom_id_2         # Child of _molecular_unit_atom_id
#
1  1  20       # mapping TNT onto the benzene ring
2  2  21
3  3  22
4  4  23
5  5  24
6  6  25
7  11 26       # mapping the TNT N2 group onto the nitro group
8  12 27
9  13 28
10 14 26       # mapping the TNT N4 group onto the nitro group
11 15 27
12 16 28
13 17 26       # mapping the TNT N6 group onto the nitro group
14 18 27
15 29 28
#
##############################################################
#          MAPPING THE CRYSTAL TO THE MOLECULAR UNITS
#
# The next loop maps the crystal to the two conformers.  If the actual
# conformer were not known the crystal structure could be mapped
# directly to the molecular_unit using the item _mol2xtl_map_atom_mu_atom_id
# in place of _mol2xtl_map_atom_mg_atom_id (See example 2).
#
# The crystallographic mirror operation that relates the two halves of 
the TNT
# molecule is assumed to have the _space_group_symop_id of 2.  Lattice
# translations of the symmetry operations are not needed and are 
therefore not
# included here (but see example 2).  The letters a and b distinguish 
the two
# disordered nitro groups in the crystal each having an occupancy of 
0.5.  The
# mapping of two conformers onto the crystal is allowed provided their
# occupation numbers do not exceed 1.0.  However, in this version of the
# proposal the occupation number is not given. ****** Should it go in the
# molecular_geom loop? ******
#
loop_
_mol2xtl_map_atom_id               # List reference
_mol2xtl_map_atom_mg_atom_id       # child of _molecular_geom_atom_id
_mol2xtl_map_atom_site_label       # child of _atom_site_label
_mol2xtl_map_atom_symop_id         # child of _space_group_symop_id
1   1   C1   1
2   2   C2   1
3   3   C3   1
4   4   C4   1
5   5   C3   2
6   6   C2   2
7   7   C7   1
8   8   H71  1
9   9   H72  1
10 10   H71  2
11 14   N4   1
12 15   O41  1
13 16   O42  1
#
# In the next six lines the N2 nitro group of the molecular_unit is mapped
# onto the two disordered crystallographic nitro groups.  Conformer 1 
(C1) is
# obtained by combining 'N2a 1' with 'N2b 2'.
#
14  11  N2a  1
15  12  O21a 1
16  13  O22a 1
17  17  N2b  2
18  18  O21b 2
19  19  O22b 2
#
# The second conformer (Cs) follows
#
20  20   C1  1
21  21   C2  1
22  22   C3  1
23  23   C4  1
24  24   C3  2
25  25   C2  2
26  26   C7  1
27  27   H71 1
28  28   H72 1
29  29   H71 2
30  33   N4  1
31  34   O41 1
32  35   O42 1
#
# In the next six lines the N2 and N6 nitro groups of conformer 2 are mapped
# onto the crystal by combining 'N2a 1' with 'N2a 2'
#
33 30   N2a  1
34 31   O21a 1
35 32   O22a 1
36 36   N2a  2
37 37   O21a 2
38 38   O22a 2
#
#
############ End of first CIF ################

                        SECOND EXAMPLE
                        --------------
CaCrF5 consists of chains of corner-linked CrF6 octahedral running along 
the c
axis of a crystal belonging to space group C2/c.  The Cr and the linking F
atom (F3) reside on 2-fold axes that are perpendicular to c.  The Ca 
atoms lie
between the chains also on 2-fold axes.
 
######### Beginning of second CIF #############
#
#
# EXAMPLE OF A STRUCTURE WITH AN INFINITE BOND GRAPH
#
# CaCrF5 is chosen to illustrate how infinite graphs are treated.
#
# The crystal structure of CaCrF5 is represented by an array of atoms linked
# by bonds into an infinitely connected network with translational 
symmetry.
# A finite graph, which retains all the local properties of the atoms, 
can be
# extracted from the infinite graph as follows:
# one first extracts one formula unit (in this case the seven atoms in the
# chemical formula).  This requires that fourteen bonds linking the formula
# unit to the rest of the infinite network be broken, but such broken bonds
# always occur in pairs since they are necessarily related in pairs by 
one of
# the translational symmetry operations of the space group (translations,
# glides or screws).  Each pair of broken bonds is then connected together,
# adding (in this case) seven further bonds to the finite bond graph.
# Therefore in some cases a pair of atoms in the graph will be linked by 
more
# than one bond, indicated in the graph by a double or triple line, etc.  In
# CaCrF5 three such pairs of atoms are linked by two bonds as shown in the
# bond graph below.  The inclusion of two lines between a pair of atoms 
in the
# graph does NOT indicate a double bond (a bond of order 2), but
# rather two different bonds whose bond order is not specified.  Where 
two (or
# more) bonds are shown as linking the same two atoms in the finite graph,
# they connect two different pairs of atoms in the infinite graph and the
# crystal structure. 
#
# Information on the long-range order is lost when the infinite graph is
# reduced to a finite graph, but the short-range order, i.e., the nearest
# neighbour environment that contains the chemical bonds, is preserved.
# A crude representation of the finite graph showing the bonds between 
Cr and
# F, and between Ca and F, is given below.  In the crystal F1 and F4 are
# related by crystallographic symmetry, as are F2 and F5.
#
#           |------------ F2 -------------|
#           |                             |
#           |------------ F1 =============|
#           |                             |
#      Cr1 -|============ F3 -------------|- Ca
#           |                             |
#           |------------ F4 =============|
#           |                             |
#           |------------ F5 -------------|
#
#
data_Ca_Cr_F5
#
# In this example a complete CIF is given including the symmetry operations
# and the atomic coordinates.  The description of the molecular unit is
# followed by the mapping between the molecular unit and the crystal
# structure.  As there is only one conformer, the crystal structure is 
mapped
# directly to the topology in molecular_unit.
#
###############################################################
#      DEFINITION OF THE CRYSTALLOGRAPHIC STRUCTURE
#
#   Based on Wu and Brown (1973) Mat. Res. Bull. 8, 593-8.
#
_chemical_formula_sum   'Ca Cr F5'
_cell_length_a                      9.0050
_cell_length_b                      6.4720
_cell_length_c                      7.5330
_cell_angle_alpha                    90.00
_cell_angle_beta                    115.85
_cell_angle_gamma                    90.00
_cell_formula_units_Z                 8
_space_group_name_H-M_alt            C2/c
_space_group_name_Hall              -C 2yc
loop_
         _space_group_symop_id
         _space_group_symop_operation_xyz
1         ' X, Y, Z'
2         '-X, Y,-Z+1/2'
3         '-X,-Y,-Z'
4         ' X,-Y, Z+1/2'
5         ' X+1/2, Y+1/2, Z'
6         '-X+1/2, Y+1/2,-Z+1/2'
7         '-X+1/2,-Y+1/2,-Z'
8         ' X+1/2,-Y+1/2, Z+1/2'

loop_
_atom_site_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_U_iso_or_equiv
_atom_site_adp_type
Ca1      0.50000   0.04260   0.25000   0.10000  Uiso
Cr1      0.00000   0.00000   0.00000   0.10000  Uiso
F1       0.00970  -0.29340  -0.02910   0.10000  Uiso
F2      -0.22730  -0.02300  -0.11740   0.10000  Uiso
F3       0.00000  -0.07210   0.25000   0.10000  Uiso
#
loop_
 _geom_bond_atom_site_label_1
 _geom_bond_atom_site_label_2
 _geom_bond_distance
 _geom_bond_site_symmetry_1
 _geom_bond_site_symmetry_2
Ca1   F1    2.391   1_555   5_555
Ca1   F1    2.391   1_555   6_555
Ca1   F1    2.292   1_555   7_545
Ca1   F1    2.292   1_555   8_545
Ca1   F2    2.215   1_555   3_555
Ca1   F2    2.215   1_555   4_655
Ca1   F3    2.494   1_555   5_555
Cr1   F1    1.918   1_555   1_555
Cr1   F1    1.918   1_555   3_555
Cr1   F2    1.848   1_555   1_555
Cr1   F2    1.848   1_555   3_555
Cr1   F3    1.940   1_555   1_555
Cr1   F3    1.940   1_555   3_555
#
#############################################################
#       DEFINITION OF THE FORMULA UNIT (IN MOLECULAR_UNIT)
#
# The next loop lists the molecular units, in this case the formula unit is
# the only molecular unit defined.
#
loop_
_molecular_unit_id          # list reference
_molecular_unit_formula
_molecular_unit_details
1 'Ca Cr F5' 'The formula unit'
#
# The next loop lists the seven atoms that compose the molecular unit and
# gives their chemical properties.  Note that the atom_site_ list in the
# crystallographic items given above only contains five atoms because the
# molecular unit occupies two asymmetric units.
#
loop_
_molecular_unit_atom_id                # List reference
_molecular_unit_atom_mu_id             # Child of _molecular_unit_id
_molecular_unit_atom_label             # Optional
_molecular_unit_atom_atom_type_symbol  # Child of _atom_type_symbol
_molecular_unit_atom_valence
_molecular_unit_atom_coord_number
_molecular_unit_atom_details
1 1 Ca1 Ca  2 7  ?
2 1 Cr1 Cr  3 6  ?
3 1 F1  F  -1 3  ?
4 1 F2  F  -1 2  ?
5 1 F3  F  -1 3  ?
6 1 F4  F  -1 3  ' Related to F1 by crystallographic symmetry'
7 1 F5  F  -1 2  ' Related to F2 by crystallographic symmetry'
#
# The next loop lists the bonds in the molecular unit.  Some bonds appear
# twice (e.g. bonds numbered 5 and 6).  The atoms of the molecular unit
# specified in these cases (e.g., atoms 2 and 5 for bonds 5 and 6) map onto
# different atom pairs in the crystal (see the bond mapping loop below).
#
# The bond order (more strictly the bond valence) given here is calculated
# from the topology and is used to calculate the ideal bond lengths in
# molecular_geom.
#
loop_
_molecular_unit_bond_id        # list reference
_molecular_unit_bond_atom_id_1 # Child of _molecular_unit_atom_id
_molecular_unit_bond_atom_id_2 # Child of _molecular_unit_atom_id
_molecular_unit_bond_order     # Predicted bond valence
_molecular_unit_bond_type
1   2 3  0.48  ?
2   2 6  0.48  ?
3   2 4  0.61  ?
4   2 7  0.61  ?
5   2 5  0.41  ?
6   2 5  0.41  ?
7   1 3  0.26  ?
8   1 3  0.26  ?
9   1 6  0.26  ?
10  1 6  0.26  ?
11  1 4  0.39  ?
12  1 7  0.39  ?
13  1 5  0.18  ?
#
############################################################
#        DEFINITION OF THE GEOMETRY OF THE MOLECULAR UNIT
#
# The atoms in the molecular_geom section are now defined.  In this case the
# definition is trivial and atomic coordinates are omitted as they are not
# used to define the geometry of the molecular unit.
#
loop_
_molecular_geom_atom_id                   # List reference
_molecular_geom_atom_mu_atom_id      # Child of _molecular_unit_atom_id
_molecular_geom_atom_mu_atom_details      # Optional

1 1  Ca
2 2  Cr
3 3  F1
4 4  F2
5 5  F3
6 6  F4
7 7  F5
#
# The bond distances predicted from the bond orders are given here. 
# They can be compared with the distances given in the crystallographic
# _geom_bond list above.
#
loop_
_molecular_geom_bond_id              # List reference
_molecular_geom_bond_atom1_id        # Child of _molecule_geom_atom_id
_molecular_geom_bond_atom2_id        # Child of _molecule_geom_atom_id
_molecular_geom_bond_distance        # Ideal bond distance in Angstroms
_molecular_geom_bond_order
_molecular_geom_bond_details
1   2 3  1.93  0.48  ?
2   2 6  1.93  0.48  ?
3   2 4  1.84  0.61  ?
4   2 7  1.84  0.61  ?
5   2 5  1.99  0.41  ?
6   2 5  1.99  0.41  ?
7   1 3  2.34  0.26  ?
8   1 3  2.34  0.26  ?
9   1 6  2.34  0.26  ?
10  1 6  2.34  0.26  ?
11  1 4  2.19  0.39  ?
12  1 7  2.19  0.39  ?
13  1 5  2.48  0.18  ?
#
# Similar angle and torsion loops could also be given but are omitted 
here for
# brevity.
#
############################################################
#       MAPPING THE ATOMS ONTO THE CRYSTAL
#
# The next loop maps the atoms of the molecular unit onto the atoms of the
# crystal.  Note that atoms 6 and 7 (F4 and F5) in the molecular unit 
map onto
# symmetry-generated copies of F1 and F2 in the crystal.
# Alternatively the atoms of the molecular_geom, rather than the
# molecular_unit, could have been mapped onto the crystal.
#
loop_
_mol2xtl_map_atom_id               # List reference
_mol2xtl_map_atom_mu_atom_id       # child of _molecular_unit_atom_id
_mol2xtl_map_atom_site_label       # child of _atom_site_label
_mol2xtl_map_atom_symop_id         # child of _space_group_symop_id
_mol2xtl_map_atom_trans_x
_mol2xtl_map_atom_trans_y
_mol2xtl_map_atom_trans_z
1   1  Ca1 1 0 0 0
2   2  Cr1 1 0 0 0
3   3  F1  1 0 0 0
4   4  F2  1 0 0 0
5   5  F3  1 0 0 0
6   6  F1  3 0 0 0
7   7  F2  3 0 0 0

#
# The next loop maps the bonds from the molecular unit onto the 
crystal.  This
# is only needed for infinitely connected structures because these are the
# only structures in which there can be more than one bond between the same
# pair of atoms in the molecular unit. 
#
# The bond in the molecular unit is identified here by its
# _molecular_unit_bond_id, but the bond in the crystal must be defined fully
# in terms of atom_site_labels and symmetry operations.  The listing of 
bonds
# in geom_bond cannot be used to identify the crystal bonds because the
# molecular unit assumed in geom_bond is not necessarily the same as the
# molecular unit assumed in molecular_geom_bond.
#
# For the same reasons, even though the observed bond distances are given in
# geom_bond, they should be repeated here.  They can be recalculated 
using the
# _atom_site_labels and symmetry operations given in this loop.
#
# Note that the bonds numbered 5 and 6 map onto different pairs of atoms in
# the crystal (see the bond list of the molecular_unit above).  The bonds
# labelled 'link' are those that link the chosen molecular unit (formula 
unit)
# to other molecular units in the infinite graph, the remaining bonds are
# formed between the atoms belonging to the molecular unit. 
#
loop_
_mol2xtl_map_bond_id                # List reference
_mol2xtl_map_bond_mu_bond_id_1      # Child of _molecular_unit_bond_id
_mol2xtl_map_bond_atom_site_label_1 # Child of _atom_site_label
_mol2xtl_map_bond_symop_1           # Child of _space_group_symop_id
_mol2xtl_map_bond_trans_x_1
_mol2xtl_map_bond_trans_y_1
_mol2xtl_map_bond_trans_z_1
_mol2xtl_map_bond_atom_site_label_2 # Child of _atom_site_label
_mol2xtl_map_bond_symop_2           # Child of _space_group_symop_id
_mol2xtl_map_bond_trans_x_2
_mol2xtl_map_bond_trans_y_2
_mol2xtl_map_bond_trans_z_2
_mol2xtl_map_bond_distance          # Experimental distance in the crystal
_mol2xtl_map_bond_details
1   1   Cr1 1 0 0 0  F1 1 0 0 0    1.918   ?
2   2   Cr1 1 0 0 0  F4 1 0 0 0    1.918   ?
3   3   Cr1 1 0 0 0  F2 1 0 0 0    1.848   ?
4   4   Cr1 1 0 0 0  F5 1 0 0 0    1.848   ?
5   5   Cr1 1 0 0 0  F3 1 0 0 0    1.940   ?
6   6   Cr1 1 0 0 0  F3 3 0 0 0    1.940   link

7   7   Ca1 1 0 0 0  F1 5 0 0 0    2.391   link
8   8   Ca1 1 0 0 0  F1 6 0 0 0    2.292   link
9   9   Ca1 1 0 0 0  F4 5 0 -1 0   2.391   link
10  10  Ca1 1 0 0 0  F4 6 0 -1 0   2.292   link
11  11  Ca1 1 0 0 0  F5 1 0 0 0    2.215   ?
12  12  Ca1 1 0 0 0  F2 4 1 0 0    2.215   link
13  13  Ca1 1 0 0 0  F3 5 0 0 0    2.494   link
#
################# End of second CIF ####################


COMPARISON OF THE ABOVE PROPOSAL WITH mmCIF:

mmCIF has a chemical description which is designed for biological 
molecules.
The contents of the crystal are divided into a small number of ENTITIES 
which
are classified as either polymers (e.g. a protein molecule), 
non-polymers, or
water.  A category called struct_asym describes which entities are found in
the asymmetric unit.

Polymeric entities are typically composed of monomers or COMPONENTS 
which are
described in the category CHEM_COMP.  The definitions in this set of
categories are very similar to our definitions in the molecular_unit and
molecular_geom categories.  Chem_comp is designed to give the contents and
geometries of the individual monomers that compose the macromolecules.  It
describes the ideal geometry of the monomers either in terms of Cartesian
coordinates or in terms of bond lengths and angles.  Unlike our proposal 
which
uses mol2xtl_map to map the molecular units onto the crystal structure, the
atom_site loop itself contains pointers to the corresponding atom in
chem_comp, an arrangement that does not work for small molecules where a 
given
atom in the crystal may map onto more than one atom in the molecular unit,
e.g., if the molecular unit contains crystallographic symmetry. 

We should make the definitions of items in the molecular_unit and
molecular_geom categories correspond exactly to those used in chem_comp to
allow direct translation between the two categories.  chem_comp defines 
a very
large number of additional properties such as the chirality of individual
atoms and planes of atoms, as well as properties that are of interest 
only in
biological structures.  We may wish to add some of these to our lists.


PLEASE SEND YOUR COMMENTS TO coreCIFchem@iucr.org before AUGUST 31, 2004

################# END OF PROPOSAL #5 ###############################

-- 
Dr. I.D.Brown, Professor Emeritus,
Department of Physics and Astronomy
McMaster University, Hamilton
Ontario, Canada


_______________________________________________
coreCIFchem mailing list
coreCIFchem@iucr.org
http://scripts.iucr.org/mailman/listinfo/corecifchem

[Send comment to list secretary]
[Reply to list (subscribers only)]