Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CoreCIFchem Discussion #6

At 17:37 21/12/2004 +0100, Howard Flack wrote:
>Season's Greetings

Also greetings,

I have been quiet but not completely idle since the last emails...

First some responses... and then another mail

><snip/>
>
>#6L83>CIF.  Every list (i.e., loop) in a CIF must have a list-reference
>
>    I think that is not correct. For proof, from #6
>L1844>loop_
>L1845> _geom_bond_atom_site_label_1
>L1846> _geom_bond_atom_site_label_2
>L1847> _geom_bond_distance
>L1848> _geom_bond_site_symmetry_1
>L1849> _geom_bond_site_symmetry_2
>     and the current cif_core.dic version 2.3, and many cif's on 
> journals.iucr.org concur. This means that the following data items and 
> the corresponding data values are NOT required as they serve no purpose.
>   _tecton_conformer_id
>   _tecton_geom_dist_id
>   _tecton_geom_angle_id
>   _tecton_geom_torsion_id
>   _map_tecton_atom_map_id
>   _map_tecton2crystal_atom_id
>   _map_tecton2crystal_bond_id
>
>   _tecton_topology_bond_id is NOT required in the TNT example but IS 
> required in the CaCrF5 example. I hope my list is complete and correct.
>
>
>
>#6L98>The implications of these two views are brought out clearly in the 
>examples we
>This paragraph reads like just one more battle in the war of 
>crystallographers versus
>IT specialists. Essentially what is designed to be easy to programme is 
>difficult to use for the crystallographer and what is designed to be easy 
>to use by the crystallographer is difficult to programme. Certainly we 
>have to find a middle implementable ground.

I agree with this analysis! FWIW I am gently introducing a link/reference 
syntax into parts of CML. But this is based on the likelihood of having 
tools that can help with this. Things that looked hard 10 years ago are now 
easier.

A DOM-like approach to CIF will help considerably in resolving links and we 
can at least help with the Java implementation. Note that use of links 
comes very close to involving save_ frames. If the community can agree that 
software will become available to support save_ then much of the current 
discussion becomes more positive. (A save_ can be represented as an 
additional node in a DOM and referenceable by id-based addressing)




<snip/>


>#6L289>(N.B.
> >isomers differ at the topological level, conformers have the same 
> topology but
> >differ at the geometry level).
>also
>#6L375>The topological description does not include any information on the 
>geometry
> >of the tecton but it does distinguish between isomers.
>
>    Here and in quite a few other places in the text we have mention of 
> molecules, isomers and conformers. This is due to the nature of the TNT 
> example. However one needs also to be precise on where and how the 
> following sub-categories of molecules fit in:
>     (1) enantiomers (coming orginally from 'enantiomorphous isomer') 
> including enantiomers of known absolute configuration, enantiomers of 
> unknown or relative configuration and racemates.
>     (2) diastereoisomers
>     (1) and (2) above have the same topology but are not considered as 
> being conformers.
>
>    The question arises as to the best place and method to specify the 
> chirality of the molecules. I recommend that we do it the way that things 
> are set up in the IUPAC dictionnary of stereochemistry. Chiral molecules 
> of unknown or relative configuration and racemates are treated as an 
> extension of the nomenclature of enantiopure compounds.
>    A very common case is of chiral molecules containing chiral centres. 
> Clearly the best place to include the specification of the chirality of 
> these atom-based centres is either in _tecton_topology_atom_chirality or 
> in the _tecton_geom_atom_ loop as a data value with name 
> _tecton_geom_atom_chirality taking a value from one of the following 
> taken from the IUPAC dictionnary: R or S for an enantiopure enantiomer of 
> known absolute configuration, R* or S*  for an enantiopure enantiomer of 
> unknown or relative absolute configuration, RS or SR for a racemate. What 
> should one do if an atom is not a chiral centre i.e. it is achiral? 
> Clearly one needs a data value meaning 'this atom is not a chiral 
> centre'. This value does not mean the same as 'chirality unknown'.
>   Another very common case occurs where a single symbol is used to 
> indicate the chirality of a molecule with or without chiral centres. The 
> ones one sees all the time are D or L for carbohydrates and amino acids. 
> There are others as well. [HDF should make a list of possible values]. 
> These indications of chirality go naturally in _tecton_chirality and 
> _tecton_conformer_chirality as values D, L, DL, rac, rel, 
> {traditionalists might like to add +, - and +or- which personally I 
> detest totally}, and maybe some others for values like 'enantiopure', 
> 'unknown' [HDF needs to think more about this].
>   The information that is currently coded in 
> _chemical_absolute_configuration needs to be included in the _tecton_* 
> and _tecton_conformer _loop_s. _chemical_absolute_configuration needs to 
> be deprecated.
>   [Need to think more about the racemate because this may still be a 
> problem because the IUPAC stereochemistry dictionnary insists on having R 
> as the first CIP symbol. The crystallographer may not have chosen the 
> opposite enantiomer in the asymmetric unit. Also the chemical diagram 
> needs carefull attention. I think there are specific ways of drawing a 
> chemical diagram to indicate a racemate rather than just one of its 
> enantiopure components.]
>
>[NB to PMR: As you are a member of the IUPAC stereochemistry committee, 
>I'm depending on you informing us of any relevant proposed changes to the 
>nomenclature.]

Indeed. The group is primarily working on structure representation and the 
stereochemistry is as annotations to diagrams rather than as nomenclature 
as such (actually that is probably at least as valuable). Also the InChI is 
tackling the computer formalisation of stereochemistry in a systematic and 
responsible way. So I think IUCr will benefit greatly from following these 
two efforts.



>#6L355>and conformation of a the tectons to be specified
>   and to:
>        and conformation of the tectons to be specified
>
>
>
>#6L467>illustration, the molecule contains a crystallographic mirror plane 
>that
>   should better be:
>    illustration, the average disordered molecule contains a 
> crystallographic mirror plane that
>
>
>
>#6L471>numbers of 0.5.  Because of the disorder the crystallographic 
>structure does
>   should better be:
>numbers of 0.5.  Because of the disorder the average crystallographic 
>structure does
>
>
>
>#6L483># The first set of loops define the topology of the TNT molecule
>   should be:
># The first set of loops defines the topology of the TNT molecule
>
>
>
>#6L488># If a crystal contained molecules of more than one compound, or 
>more than one
> ># isomer of a compound, each would be described by a separate tecton.
> ># If the crystal contained more than one copy of the same molecule in the
> ># asymmetric unit (Z'>1) the topology of the tecton would be given only once
> ># but it would be mapped onto all the crystallographically distinct 
> copies.
>   should better be
># If a crystal contains different types of molecules (isomers or 
>diastereoisomers or enantiomers other than the racemate)
># each would be described by a separate tecton.
># If the crystal contained more than one copy of the same molecule in the
># asymmetric unit (Z'>1) the topology of the tecton would be given only once
># and then this single topology would be mapped onto all the 
>crystallographically distinct copies.
>
>
>
>#6L511># together their properties.  We may wish to define other 
>properties, such as
>    should better be:
># together with their properties.  We may wish to define other properties, 
>such as
>
>
>
>#6L526># _tecton_chirality   # time-averaged if no geometry given
>   ADD after this line
># _atomSet_absolute_configuration   # defined like current 
>_chemical_absolute_configuration
>
>
>
>#6L582># The CIF dictionary already contains instructions for drawing a 
>2-D molecular
> ># diagram in the group of chemical_conn categories.  Although the
> ># chemical_conn categories also describe the topology of a molecule they are
>
>   Do I understand correctly that you are thinking of deprecating 
> chemical_conn items?
>
>
>
>#6L627># I have added an item _tecton_topology_atom_chirality which is not 
>needed in
># this example, but is needed in chiral structures to identify any atom that
># serves as a chiral center.  Chirality is not captured by the topology, but
># it is, like topology, a feature of the structure that can only be changed by
># breaking and making bonds.  It is included here because it is more closely
># related to the topology than to the geometry which can be changed without
># breaking any bonds.  I will defer to others what values should be associated
># with this item - presumably some letter like R or S.
>
>   I don't at all like this discussion about breaking bonds etc. The real 
> reason for me that one needs _chirality here is for the case where there 
> is only one 'conformer' and you don't want to give any geometry 
> information. Thus it turns out to be convenient to give it here. If you 
> have several 'conformers' you give the chirality information in 
> _tecton_atom_chirality. The possible values are given above.
>

It sounds here as if there is a need to map atoms from one domain to 
another - e.g. topology (2d) to geometry (3d). This is not generally not 
well supported in traditional chemistry programs which tend to concentrate 
on either the topology or the 3D structure. CML supports both (and 
fractionals as well). Thus if you have one instance of a molecule it can be 
represented as "8D" - conn+cartesian+fract without loss of information. You 
can even add hydrogens atoms for which the 2D information but not the 3D is 
known.

The problem comes when you have more than one instance of a molecule - 
disorder, conformations, dynamics snapshots, etc. for which the 2D 
structure is the same and for which the 3D structure changes. I don't think 
anyone has yet created a happy representation that does not use implicit 
semantics (e.g. atom order assumes identity). We have been wrestling with 
this in CML and have a prototype design where 2D information is described 
once and then instances of the different 3D sets are described with just 
the relevant changed information. Everything is linked through persistent 
atom_ids. Something like:

_atom_site_label
_atom_site_occupancy
O1 1.0
N1 1.0

and then (say) disordered groups
_atom_site_label_ref
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
O1 0.1 0.2 0.3
N1 0.2 0.3 0.4

_atom_site_label_ref
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
O1 0.12 0.21 0.35
N1 0.23 0.31 0.43

This is invalid CIF as names cannot be repeated but shows two groups which 
reference a "parent" group with the unchanging information (here the 
occupancy). Linking is (here) through the label. The two groups have 
different fractional coordinates so that it would be possible to 
instantiate both of them. For chemical concepts the charge, hydrogen count, 
etc. will be inherited by the "children"


>#6L714>loop_
>_tecton_topology_bond_id
>_tecton_topology_bond_atom1_id     # Child of _tecton_topology_atom_id
>_tecton_topology_bond_atom2_id     # Child of _tecton_topology_atom_id
>_tecton_topology_bond_type
>1     T.C1   T.C2   arom      # TNT benzene ring
>2     T.C2   T.C3   arom
>3     T.C3   T.C4   arom
>4     T.C4   T.C5   arom
>5     T.C5   T.C6   arom
>6     T.C6   T.C1   arom
>7     T.C3   T.H3   sing
>8     T.C5   T.H5   sing
>  etc, etc
>
>  should be:
>
>loop_
>_atomSet_topology_bond_atom1_id     # Child of _atomSet_topology_atom_id
>_atomSet_topology_bond_atom2_id     # Child of _atomSet_topology_atom_id
>_atomSet_topology_bond_type
>T.C1   T.C2   arom      # TNT benzene ring
>T.C2   T.C3   arom
>T.C3   T.C4   arom
>T.C4   T.C5   arom
>T.C5   T.C6   arom
>T.C6   T.C1   arom
>T.C3   T.H3   sing
>T.C5   T.H5   sing
>  etc, etc
>   as the _tecton_topology_bond_id serves no purpose.
>
>
>
>#6L834>loop_
>_tecton_conformer_id            # List-reference
>_tecton_conformer_tecton_id     # Child of _tecton_topology_id
>_tecton_conformer_point_group   # Schoenflies point group symbol of conformer
>_tecton_conformer_chirality     # We need to define allowed symbols
>_tecton_conformer_details
>
>   should be
>
>loop_
>_atomSet_conformer_id            # List-reference
>_atomSet_conformer_tecton_id     # Child of _tecton_topology_id
>_atomSet_conformer_point_group   # Schoenflies point group symbol of conformer
>_atomSet_conformer_chirality     # We need to define allowed symbols
>_atomSet_conformer_absolute_configuration   # values as per 
>_chemical_absolute_configuration
>_atomSet_conformer_Zprime       #
>_atomSet_conformer_occupation   # occupation number of this conformer in 
>the crystal
>_atomSet_conformer_details
>
>  With several 'conformers' you need these additional values to be given here.
>  I do not agree at all with David's proposal of putting copies of the 
> occupation number with the individual atom information of the conformer. 
> The occupation applies to the whole conformer and must go here. Putting 
> occupation values on the individual atoms leaves the gate wide open for 
> those who might be tempted to 'doctor' their results to get around an 
> error message from a checking programme.

FWIW the disorder reported in the CIFs I have seen recently seems to be 
well organised. Each disordered group should have a constant occupancy for 
all its atoms - the question is whether it should be normalised onto the 
disorder group/conformer or whether it should be spelt out for each atom 
(denormalized). Much comes down to how the software is written. If there is 
general agreement that substructures must be supported then the normalized 
approach is acceptable - if not it is possible that the occupancy could get 
"lost" - i.e. assumed to be 1.0.



>#6L893>loop_
>_tecton_geom_atom_id    # List-reference, child of _tecton_topology_atom_id
>_tecton_geom_atom_conformer_label  # Child of _tecton_conformer_equiv_label
>_tecton_geom_atom_coord_x          # Coordinates of atom in Angstrom
>_tecton_geom_atom_coord_y          #
>_tecton_geom_atom_coord_z          #
>_tecton_geom_atom_details
>
>  should be
>
>_atomSet_geom_atom_id    # List-reference, child of _tecton_topology_atom_id
>_atomSet_geom_atom_conformer_label  # Child of _tecton_conformer_equiv_label
>_atomSet_geom_atom_coord_x          # Coordinates of atom in Angstrom
>_atomSet_geom_atom_coord_y          #
>_atomSet_geom_atom_coord_z          #
>_atomSet_geom_atom_chirality        #  chirality of chiral centre on atom 
>as per CIP
>_atomSet_geom_atom_details

Forgive me if I have already discussed CIP. CIP is primarily for humans to 
communicate stereochemistry to other humans who - for some reason - cannot 
see a structural diagram. It is also a necessary part of the formal name as 
used by patent lawyers. It is very difficult for a machine to understand 
and there are chiral compounds for which CIP is too complicated (i.e. the 
algorithm doesn't terminate). Atom parity is designed for machines to 
understand chirality and works by labelling the 4 (or 3) ligands of a 
chiral atom. As long as the labels are explicit the algorithm is relatively 
simple (CML uses the one proposed for MIF).

Pure conformers do not normally differ in atom chirality. This normally 
only happens with restricted rotation as in biphenyls. If there are 
conformers of differing atomic parities then they should really be 
enantiomers or diastereomers.
(I note this is addressed below)



>#6L919>loop_
>_tecton_geom_dist_id              # List-reference
>_tecton_geom_dist_conformer_label # Child of _tecton_geom_equiv_label
>_tecton_geom_dist_atom1_id        # Child of _tecton_topology_atom_id
>_tecton_geom_dist_atom2_id        # Child of _tecton_topology_atom_id
>_tecton_geom_dist_distance        # Distance atom1-atom2 in Angstroms
>1  all   T.C7   T.C1    1.54                # TNT methyl group
>2  all   T.C7   T.H71   1.05
>3  all   T.C7   T.H72   1.05
>4  all   T.C7   T.H73   1.05
>5  all   T.N4   T.C4    1.43                # TNT N4 nitro group
>6  all   T.N4   T.O41   1.18
>7  all   T.N4   T.O42   1.18
>8  all   T.N2   T.C2    1.43                # TNT N2 nitro group
>9  all   T.N2   T.O21   1.18
>10 all   T.N2   T.O22   1.18
>11 all   T.N6   T.C6    1.43                # TNT N6 nitro group
>12 all   T.N6   T.O61   1.18
>13 all   T.N6   T.O62   1.18
>
>  should be as follows since _dist_id serves no purpose.
>
>loop_
>_atomSet_geom_dist_conformer_label # Child of _tecton_geom_equiv_label
>_atomSet_geom_dist_atom1_id        # Child of _tecton_topology_atom_id
>_atomSet_geom_dist_atom2_id        # Child of _tecton_topology_atom_id
>_atomSet_geom_dist_distance        # Distance atom1-atom2 in Angstroms
>all   T.C7   T.C1    1.54                # TNT methyl group
>all   T.C7   T.H71   1.05
>all   T.C7   T.H72   1.05
>all   T.C7   T.H73   1.05
>all   T.N4   T.C4    1.43                # TNT N4 nitro group
>all   T.N4   T.O41   1.18
>all   T.N4   T.O42   1.18
>all   T.N2   T.C2    1.43                # TNT N2 nitro group
>all   T.N2   T.O21   1.18
>all   T.N2   T.O22   1.18
>all   T.N6   T.C6    1.43                # TNT N6 nitro group
>all   T.N6   T.O61   1.18
>all   T.N6   T.O62   1.18
>
>
>
>#6L942>loop_
>_tecton_geom_angle_id              # List-reference
>_tecton_geom_angle_conformer_label # Child of _tecton_geom_equiv_label
>_tecton_geom_angle_atom1_id        # Child of _tecton_topology_atom_id
>_tecton_geom_angle_atom2_id        # Child of _tecton_topology_atom_id
>_tecton_geom_angle_atom3_is        # Child of _tecton_topology_atom_id
>_tecton_geom_angle_angle           # Angle in degrees
>1  all    T.C1   T.C7   T.H71  109     # TNT Methyl group
>2  all    T.C1   T.C7   T.H72  109
>3  all    T.C1   T.C7   T.H73  109
>4  all    T.H71  T.C7   T.H72  109
>5  all    T.H72  T.C7   T.H73  109
>6  all    T.H73  T.C7   T.H71  109
>7  all    T.O41  T.N4   T.C4   117     # TNT N4 nitro group
>8  all    T.O42  T.N4   T.C4   117
>9  all    T.O41  T.N4   T.O42  126
>10 all    T.O21  T.N2   T.C2   117     # TNT N2 nitro group
>11 all    T.O22  T.N2   T.C2   117
>12 all    T.O21  T.N2   T.O22  126
>13 all    T.O61  T.N6   T.C6   117     # TNT N6 nitro group
>14 all    T.O62  T.N6   T.C6   117
>15 all    T.O61  T.N6   T.O62  126
>
>   should be as _angle_id serves no purpose
>
>loop_
>_atomSet_geom_angle_conformer_label # Child of _tecton_geom_equiv_label
>_atomSet_geom_angle_atom1_id        # Child of _tecton_topology_atom_id
>_atomSet_geom_angle_atom2_id        # Child of _tecton_topology_atom_id
>_atomSet_geom_angle_atom3_is        # Child of _tecton_topology_atom_id
>_atomSet_geom_angle_angle           # Angle in degrees
>all    T.C1   T.C7   T.H71  109     # TNT Methyl group
>all    T.C1   T.C7   T.H72  109
>all    T.C1   T.C7   T.H73  109
>all    T.H71  T.C7   T.H72  109
>all    T.H72  T.C7   T.H73  109
>all    T.H73  T.C7   T.H71  109
>all    T.O41  T.N4   T.C4   117     # TNT N4 nitro group
>all    T.O42  T.N4   T.C4   117
>all    T.O41  T.N4   T.O42  126
>all    T.O21  T.N2   T.C2   117     # TNT N2 nitro group
>all    T.O22  T.N2   T.C2   117
>all    T.O21  T.N2   T.O22  126
>all    T.O61  T.N6   T.C6   117     # TNT N6 nitro group
>all    T.O62  T.N6   T.C6   117
>all    T.O61  T.N6   T.O62  126
>
>
>
>#6L975>loop_
>_tecton_geom_torsion_id              # List-reference
>_tecton_geom_torsion_conformer_label # Child of _tecton_geom_equiv_label
>_tecton_geom_torsion_atom1_id        # Child of _tecton_topology_atom_id
>_tecton_geom_torsion_atom2_id        # Child of _tecton_topology_atom_id
>_tecton_geom_torsion_atom3_id        # Child of _tecton_topology_atom_id
>_tecton_geom_torsion_atom4_id        # Child of _tecton_topology_atom_id
>_tecton_geom_torsion_angle           # Torsion angle in degrees
>1 all  T.C3   T.C4   T.N4   T.O41   90
>2 aa   T.C1   T.C2   T.N2   T.O21   10.5
>3 aa   T.C1   T.C6   T.N6   T.O61   10.5
>4 bb   T.C1   T.C2   T.N2   T.O21  -10.5
>5 bb   T.C1   T.C6   T.N6   T.O61  -10.5
>6 ab   T.C1   T.C2   T.N2   T.O21   10.5
>7 ab   T.C1   T.C6   T.N6   T.O61  -10.5
>8 ba   T.C1   T.C2   T.N2   T.O21  -10.5
>9 ba   T.C1   T.C6   T.N6   T.O61   10.5
>
>should be since _torsion_id serves no purpose
>
>loop_
>_atomSet_geom_torsion_conformer_label # Child of _tecton_geom_equiv_label
>_atomSet_geom_torsion_atom1_id        # Child of _tecton_topology_atom_id
>_atomSet_geom_torsion_atom2_id        # Child of _tecton_topology_atom_id
>_atomSet_geom_torsion_atom3_id        # Child of _tecton_topology_atom_id
>_atomSet_geom_torsion_atom4_id        # Child of _tecton_topology_atom_id
>_atomSet_geom_torsion_angle           # Torsion angle in degrees
>all  T.C3   T.C4   T.N4   T.O41   90
>aa   T.C1   T.C2   T.N2   T.O21   10.5
>aa   T.C1   T.C6   T.N6   T.O61   10.5
>bb   T.C1   T.C2   T.N2   T.O21  -10.5
>bb   T.C1   T.C6   T.N6   T.O61  -10.5
>ab   T.C1   T.C2   T.N2   T.O21   10.5
>ab   T.C1   T.C6   T.N6   T.O61  -10.5
>ba   T.C1   T.C2   T.N2   T.O21  -10.5
>ba   T.C1   T.C6   T.N6   T.O61   10.5
>
>
>
>#6L1045>loop_
>_map_tecton_atom_map_id        # List reference
>_map_tecton_atom_atom1_id      # Child of _tecton_topology_atom_id
>_map_tecton_atom_atom2_id      # Child of _tecton_topology_atom_id
>1   B.C1    T.C1    # mapping 1,2,4,6 benzene moiety onto TNT
>2   B.C2    T.C2
>3   B.C3    T.C3
>etc etc
>
>should be since _map_id serves no purpose
>
>loop_
>_map_atomSet_atom_atom1_id      # Child of _tecton_topology_atom_id
>_map_atomSet_atom_atom2_id      # Child of _tecton_topology_atom_id
>B.C1    T.C1    # mapping 1,2,4,6 benzene moiety onto TNT
>B.C2    T.C2
>B.C3    T.C3
>etc etc
>
>
>
>#6L1085># The occupation number indicates how much of each conformer (or 
>isomer) is
># present. The occupation numbers of the atoms in the crystal are defined in
># the atom_site loop and must not be less than the sum of the corresponding
># occupation numbers of the conformers.
>
>   should be
>
># The occupation number indicates how much of each conformer (or isomer) is
># present. The occupation numbers of the atoms in the crystal are defined in
># the atom_site loop and must approximately equal within 2 or 3 standard
># uncertainties to the sum of the corresponding occupation numbers of the 
>conformers.
>
This seems practicable. I have found that there is good agreement between 
the sum of occupancies and overall formulae.


>#6L1118>loop_
>_map_tecton2crystal_atom_id           # List-reference
>_map_tecton2crystal_atom_atom_id      # Child of _tecton_topology_atom_id
>_map_tecton2crystal_atom_conformer_label
>                                    # Child of _tecton_conformer_equiv_label
>_map_tecton2crystal_atom_occup_number # Occupation number of tecton atom
>_map_tecton2crystal_atom_atom_site_label # child of _atom_site_label
>_map_tecton2crystal_atom_symop_id     # child of _space_group_symop_id
>1  T.C1  all 1   C1   1
>2  T.C2  all 1   C2   1
>3  T.C3  all 1   C3   1
>4  T.C4  all 1   C4   1
>5  T.C5  all 1   C3   2
>6  T.C6  all 1   C2   2
>7  T.H3  all 1   H3   1
>8  T.H5  all 1   H3   2
>9  T.C7  all 1   C7   1
>10 T.H71 all 1   H71  1
>11 T.H72 all 1   H72  1
>12 T.H73 all 1   H71  2
>13 T.N4  all 1   N4   1
>14 T.O41 all 1   O41  1
>15 T.O42 all 1   O42  1
># SIDE CHAINS
>16 T.N2  aa 0.5  N2a  1
>17 T.O21 aa 0.5  O21a 1
>18 T.O22 aa 0.5  O22a 1
>19 T.N6  aa 0.5  N2a  2
>20 T.O61 aa 0.5  O21a 2
>21 T.O62 aa 0.5  O22a 2
>22 T.N2  bb 0.5  N2b  1
>23 T.O21 bb 0.5  O21b 1
>24 T.O22 bb 0.5  O22b 1
>25 T.N6  bb 0.5  N2b  2
>26 T.O61 bb 0.5  O21b 2
>27 T.O62 bb 0.5  O22b 2
>
>   should be  because (a) _atom_id serves no purpose and (b) the 
> occupations do not belong here
>
>loop_
>_map_atomSet2crystal_atom_atom_id      # Child of _atomSet_topology_atom_id
>_map_atomSet2crystal_atom_conformer_label
>                                    # Child of _atomSet_conformer_equiv_label
>_map_atomSet2crystal_atom_atom_site_label # child of _atom_site_label
>_map_atomSet2crystal_atom_symop_id     # child of _space_group_symop_id
>T.C1  all C1   symop1
>T.C2  all C2   symop1
>T.C3  all C3   symop1
>T.C4  all C4   symop1
>T.C5  all C3   symop2
>T.C6  all C2   symop2
>T.H3  all H3   symop1
>T.H5  all H3   symop2
>T.C7  all C7   symop1
>T.H71 all H71  symop1
>T.H72 all H72  symop1
>T.H73 all H71  symop2
>T.N4  all N4   symop1
>T.O41 all O41  symop1
>T.O42 all O42  symop1
># SIDE CHAINS
>T.N2  aa  N2a  symop1
>T.O21 aa  O21a symop1
>T.O22 aa  O22a symop1
>T.N6  aa  N2a  symop2
>T.O61 aa  O21a symop2
>T.O62 aa  O22a symop2
>T.N2  bb  N2b  symop1
>T.O21 bb  O21b symop1
>T.O22 bb  O22b symop1
>T.N6  bb  N2b  symop2
>T.O61 bb  O21b symop2
>T.O62 bb  O22b symop2
>
>
>
>#6L1163>                        4.2 SECOND SAMPLE CIF
>   4.2 should be 3.2
>
>
>
>#6L1240>loop_
>          _space_group_symop_id
>          _space_group_symop_operation_xyz
>1         ' X, Y, Z'
>2         '-X, Y,-Z+1/2'
>3         '-X,-Y,-Z'
>4         ' X,-Y, Z+1/2'
>5         ' X+1/2, Y+1/2, Z'
>6         '-X+1/2, Y+1/2,-Z+1/2'
>7         '-X+1/2,-Y+1/2,-Z'
>8         ' X+1/2,-Y+1/2, Z+1/2'
>
>   would be nicer as
>
>loop_
>          _space_group_symop_id
>          _space_group_symop_operation_xyz
>symop1         ' X, Y, Z'
>symop2         '-X, Y,-Z+1/2'
>symop3         '-X,-Y,-Z'
>symop4         ' X,-Y, Z+1/2'
>symop5         ' X+1/2, Y+1/2, Z'
>symop6         '-X+1/2, Y+1/2,-Z+1/2'
>symop7         '-X+1/2,-Y+1/2,-Z'
>symop8         ' X+1/2,-Y+1/2, Z+1/2'

I agree. We have relied heavily on the identification of symops in building 
chemistry from CIFs.



>#6L1291>loop_
>_tecton_topology_id          # List reference
>_tecton_topology_formula
>_tecton_topology_special_details
>1 'Ca Cr F5' 'The formula unit'
>
>   would be nicer as:
>
>loop_
>_atomSet_topology_id          # List reference
>_atomSet_topology_formula
>_atomSet_topology_special_details
>atomSet1 'Ca Cr F5' 'The formula unit'
>
>
>
>#6L1309># _tecton_topology_atom_label is included for the benefit of the 
>user.  It has
># no parent or child and is not required for CIF management.  The CIF
># identifies the atom by _tecton_topology_atom_id.
>
>   I don't see what possible benefit this _atom_label is for the user. In 
> fact I think things are clearer if you leave it out.
>
>
>
>#6L1318>loop_
>_tecton_topology_atom_id            # List-reference
>_tecton_topology_atom_tecton_id     # Child of _tecton_topology_id
>_tecton_topology_atom_label
>_tecton_topology_atom_type_symbol   # Child of _atom_type_symbol
>_tecton_topology_atom_valence
>_tecton_topology_atom_coord_number  # Number of bonds formed by this atom
>_tecton_topology_atom_details
>Ca 1 Ca1 Ca  2 7  ?
>Cr 1 Cr1 Cr  3 6  ?
>F1 1 F1  F  -1 3  ?
>F2 1 F2  F  -1 2  ?
>F3 1 F3  F  -1 3  ?
>F4 1 F4  F  -1 3  ' Related to F1 by crystallographic symmetry'
>F5 1 F5  F  -1 2  ' Related to F2 by crystallographic symmetry'
>
>looks nicer as
>
>loop_
>_atomSet_topology_atom_id            # List-reference
>_atomSet_topology_atom_tecton_id     # Child of 
>_tecton_topology_id
>_atomSet_topology_atom_type_symbol   # Child of _atom_type_symbol
>_atomSet_topology_atom_valence
>_atomSet_topology_atom_coord_number  # Number of bonds formed by this atom
>_atomSet_topology_atom_details
>Ca atomSet1 Ca  2 7  ?
>Cr atomSet1 Cr  3 6  ?
>F1 atomSet1 F  -1 3  ?
>F2 atomSet1 F  -1 2  ?
>F3 atomSet1 F  -1 3  ?
>F4 atomSet1 F  -1 3  ' Related to F1 by crystallographic symmetry'
>F5 atomSet1 F  -1 2  ' Related to F2 by crystallographic symmetry'
>
>
>
>#6L1393>## the finite bond graph, i.e. that atoms in the tecton from which the
>   should be
>## the finite bond graph, i.e. those atoms in the tecton from which the
>
>
>
>#6L1403>loop_
>_tecton_geom_dist_id            # List-reference
>_tecton_geom_dist_atom1_id      # Child of _tecton_topology_atom_id
>_tecton_geom_dist_atom2_id      # Child of _tecton_topology_atom_id
>_tecton_geom dist_distance      # Ideal bond distance in Angstroms
>_tecton_geom_dist_valence           # Same as _tecton_topology_bond_valence
>_tecton_geom_dist_details
>A  Cr F1 1.93  0.48  'Bond distances calculated from bond valences'
>B  Cr F4 1.93  0.48  'Bond distances calculated from bond valences'
>C  Cr F2 1.84  0.61  'Bond distances calculated from bond valences'
>D  Cr F5 1.84  0.61  'Bond distances calculated from bond valences'
>E  Cr F3 1.99  0.41  'Bond distances calculated from bond valences'
>F  Cr F3 1.99  0.41  'Bond distances calculated from bond valences'
>G  Ca F1 2.34  0.26  'Bond distances calculated from bond valences'
>H  Ca F1 2.34  0.26  'Bond distances calculated from bond valences'
>I  Ca F4 2.34  0.26  'Bond distances calculated from bond valences'
>J  Ca F4 2.34  0.26  'Bond distances calculated from bond valences'
>K  Ca F2 2.19  0.39  'Bond distances calculated from bond valences'
>L  Ca F5 2.19  0.39  'Bond distances calculated from bond valences'
>M  Ca F3 2.48  0.18  'Bond distances calculated from bond valences'
>
>  since _dist_id serves no purpose should be
>
>loop_
>_atomSet_geom_dist_atom1_id      # Child of _atomSet_topology_atom_id
>_atomSet_geom_dist_atom2_id      # Child of _atomSet_topology_atom_id
>_atomSet_geom dist_distance      # Ideal bond distance in Angstroms
>_atomSet_geom_dist_valence           # Same as _atomSet_topology_bond_valence
>_atomSet_geom_dist_details
>Cr F1 1.93  0.48  'Bond distances calculated from bond valences'
>Cr F4 1.93  0.48  'Bond distances calculated from bond valences'
>Cr F2 1.84  0.61  'Bond distances calculated from bond valences'
>Cr F5 1.84  0.61  'Bond distances calculated from bond valences'
>Cr F3 1.99  0.41  'Bond distances calculated from bond valences'
>Cr F3 1.99  0.41  'Bond distances calculated from bond valences'
>Ca F1 2.34  0.26  'Bond distances calculated from bond valences'
>Ca F1 2.34  0.26  'Bond distances calculated from bond valences'
>Ca F4 2.34  0.26  'Bond distances calculated from bond valences'
>Ca F4 2.34  0.26  'Bond distances calculated from bond valences'
>Ca F2 2.19  0.39  'Bond distances calculated from bond valences'
>Ca F5 2.19  0.39  'Bond distances calculated from bond valences'
>Ca F3 2.48  0.18  'Bond distances calculated from bond valences'
>
>
>#6L1434># Note that atoms F4 and F5 in the molecular unit map onto
>   should be
># Note that atoms F4 and F5 in the atomSet map onto
>
>
>
>#6L1441>loop_
>_map_tecton2crystal_atom_id              # List reference
>_map_tecton2crystal_atom_atom_id         # Child of _tecton_topology_atom_id
>_map_tecton2crystal_atom_atom_site_label # Child of _atom_site_label
>_map_tecton2crystal_atom_symop_id        # Child of _space_group_symop_id
>_map_tecton2crystal_atom_trans_x
>_map_tecton2crystal_atom_trans_y
>_map_tecton2crystal_atom_trans_z
>1   Ca  Ca1 1 0 0 0
>2   Cr  Cr1 1 0 0 0
>3   F1  F1  1 0 0 0
>4   F2  F2  1 0 0 0
>5   F3  F3  1 0 0 0
>6   F4  F1  3 0 0 0
>7   F5  F2  3 0 0 0
>
>  should be nicer as:
>
>loop_
>_map_atomSet2crystal_atom_atom_id         # Child of _atomSet_topology_atom_id
>_map_atomSet2crystal_atom_atom_site_label # Child of _atom_site_label
>_map_atomSet2crystal_atom_symop_id        # Child of _space_group_symop_id
>_map_atomSet2crystal_atom_trans_x
>_map_atomSet2crystal_atom_trans_y
>_map_atomSet2crystal_atom_trans_z
>Ca  Ca1 symop1 0 0 0
>Cr  Cr1 symop1 0 0 0
>F1  F1  symop1 0 0 0
>F2  F2  symop1 0 0 0
>F3  F3  symop1 0 0 0
>F4  F1  symop3 0 0 0
>F5  F2  symop3 0 0 0
>
>
>
>#6L1477>loop_
>_map_tecton2crystal_bond_id                # List reference
>_map_tecton2crystal_bond_bond_id           # Child of _tecton_topology_bond_id
>_map_tecton2crystal_bond_atom_site_label_1 # Child of _atom_site_label
>_map_tecton2crystal_bond_symop_1           # Child of _space_group_symop_id
>_map_tecton2crystal_bond_trans_x_1
>_map_tecton2crystal_bond_trans_y_1
>_map_tecton2crystal_bond_trans_z_1
>_map_tecton2crystal_bond_atom_site_label_2 # Child of _atom_site_label
>_map_tecton2crystal_bond_symop_2           # Child of _space_group_symop_id
>_map_tecton2crystal_bond_trans_x_2
>_map_tecton2crystal_bond_trans_y_2
>_map_tecton2crystal_bond_trans_z_2
>_map_tecton2crystal_bond_dist               # Observed distance (optional)
>_map_tecton2crystal_bond_details
>1  Cr.F1       Cr1 1 0 0 0  F1 1 0 0 0    1.918   ?
>2  Cr.F4       Cr1 1 0 0 0  F4 1 0 0 0    1.918   ?
>3  Cr.F2       Cr1 1 0 0 0  F2 1 0 0 0    1.848   ?
>4  Cr.F5       Cr1 1 0 0 0  F5 1 0 0 0    1.848   ?
>5  Cr.F3.1     Cr1 1 0 0 0  F3 1 0 0 0    1.940   ?
>6  Cr.F3.2     Cr1 1 0 0 0  F3 3 0 0 0    1.940   link
>
>7  Ca.F1.1     Ca1 1 0 0 0  F1 5 0 0 0    2.391   link
>8  Ca.F1.2     Ca1 1 0 0 0  F1 6 0 0 0    2.292   link
>9  Ca.F4.1     Ca1 1 0 0 0  F4 5 0 -1 0   2.391   link
>10 Ca.F4.2     Ca1 1 0 0 0  F4 6 0 -1 0   2.292   link
>11 Ca.F2       Ca1 1 0 0 0  F5 1 0 0 0    2.215   ?
>12 Ca.F5       Ca1 1 0 0 0  F2 4 1 0 0    2.215   link
>13 Ca.F3       Ca1 1 0 0 0  F3 5 0 0 0    2.494   link
>
>   would be nicer as:
>
>loop_
>_map_atomSet2crystal_bond_bond_id           # Child of 
>_tecton_topology_bond_id
>_map_atomSet2crystal_bond_atom_site_label_1 # Child of _atom_site_label
>_map_atomSet2crystal_bond_symop_1           # Child of _space_group_symop_id
>_map_atomSet2crystal_bond_trans_x_1
>_map_atomSet2crystal_bond_trans_y_1
>_map_atomSet2crystal_bond_trans_z_1
>_map_atomSet2crystal_bond_atom_site_label_2 # Child of _atom_site_label
>_map_atomSet2crystal_bond_symop_2           # Child of _space_group_symop_id
>_map_atomSet2crystal_bond_trans_x_2
>_map_atomSet2crystal_bond_trans_y_2
>_map_atomSet2crystal_bond_trans_z_2
>_map_atomSet2crystal_bond_dist               # Observed distance (optional)
>_map_atomSet2crystal_bond_details
>Cr.F1       Cr1 symop1 0 0 0  F1 symop1 0 0 0    1.918   ?
>Cr.F4       Cr1 symop1 0 0 0  F4 symop1 0 0 0    1.918   ?
>Cr.F2       Cr1 symop1 0 0 0  F2 symop1 0 0 0    1.848   ?
>Cr.F5       Cr1 symop1 0 0 0  F5 symop1 0 0 0    1.848   ?
>Cr.F3.1     Cr1 symop1 0 0 0  F3 symop1 0 0 0    1.940   ?
>Cr.F3.2     Cr1 symop1 0 0 0  F3 symop3 0 0 0    1.940   link
>Ca.F1.1     Ca1 symop1 0 0 0  F1 symop5 0 0 0    2.391   link
>Ca.F1.2     Ca1 symop1 0 0 0  F1 symop6 0 0 0    2.292   link
>Ca.F4.1     Ca1 symop1 0 0 0  F4 symop5 0 -1 0   2.391   link
>Ca.F4.2     Ca1 symop1 0 0 0  F4 symop6 0 -1 0   2.292   link
>Ca.F2       Ca1 symop1 0 0 0  F5 symop1 0 0 0    2.215   ?
>Ca.F5       Ca1 symop1 0 0 0  F2 symop4 1 0 0    2.215   link
>Ca.F3       Ca1 symop1 0 0 0  F3 symop5 0 0 0    2.494   link
>
>
>
>#6L1545>              5. SAMPLE CIFS WITH COMMENTS REMOVED
>   Parts of section 5 are already out of date with respect to the content 
> of section 3
>
>
>
>
>PMR>One of the most important contributions would be to require that EVERY 
>atom is reported.
>
>    Yes I agree with that.
>
>PMR>Are conformers only relevant for disordered structures or might a 
>species such as TNT have one NO2 tecton with three conformations (I would 
>argue against that)
>
>    In my view the word 'conformer' is badly chosen. One topology might 
> well correspond to the two opposite enantiomers and perhaps several 
> diastereoisomers.

Agreed fully.




>PMR>It would be useful to see an example of a simple structure without 
>problems, and perhaps one without disorder but either symmetry or multiple 
>molecules. I think the present example is trying to tackle too many 
>problems at once
>
>    I agree that the final document should contain more including simpler 
> examples. I'm prepared to provide some concerned with chiral molecules. 
> At the moment, I think it would not be too helpful to overload an already 
> long text with more examples at the moment.
>
>PMR> There are many groups that are not isomorphic to a point group. They 
>include permutation groups and products. I spent some time many years ago 
>looking at whether such groups could usefully be represented geometrically.
>
>    It sounds as though we should drop the automorphism group.
>
>
>
>PMR>> # 3) only one molecule can be described
>
>PMR Response
>-------------------
>CML can store multi-molecules - e.g. hexane+urea. The problem seems to 
>come from conformers
>
>    There are problems with racemates as well. Of course the racemate is 
> not really 'one molecule' although it is often incorrectly treated as 
> though it were. (i.e. despite the fact that every molecule in a racemate 
> is chiral, most chemists think of the racemate as being achiral!)

Racemates are more difficult to describe than appears at first sight. IUPAC 
has wrestled with this. At least in crystallography the results are usually 
well defined!



>PMR>CML is only just starting to tackle the problem of describing 
>molecules as assemblies of fragments. Do your fragments have unfilled 
>valences, dummy atoms, etc.?
>
>    I've come to the opinion that describing the molecules in terms of 
> fragments i.e. TNT formed of substituted benzene, nitro group, etc is the 
> part of the spec which has the least potential practical application. It 
> looks like a chemical decomposition of the molecule. I wondered whether 
> it would not be better to shelve it at least for the time being. I can't 
> think of what practical application I would use it for.

Agreed. The fragment approach is similar to the chemist's Markush approach. 
This describes a molecule as (say) R1-CO-NH-R2 where R1 and R2 have a list 
of values. Even if the list is only length 1 it can be quite difficult to 
work out what the molecule is. There is pressure from the patents offices 
to drop Markush structures in favour of explicit enumeration. I certainly 
think that an explicit formula for the molecule should always be present.

More in next mail.


Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069

_______________________________________________
coreCIFchem mailing list
coreCIFchem@iucr.org
http://scripts.iucr.org/mailman/listinfo/corecifchem

[Send comment to list secretary]
[Reply to list (subscribers only)]