Discussion List Archives

[Date Prev][Date Next][Date Index]

(40) F(000); neutron diffraction; Uequiv; symmetry-generated sites; MIF

  • To: COMCIFS@iucr.ac.uk
  • Subject: (40) F(000); neutron diffraction; Uequiv; symmetry-generated sites; MIF
  • From: bm
  • Date: Mon, 20 May 1996 16:04:58 +0100
Dear Colleagues

I apologise yet again for the long delay between mailings. I hope to be
more active in the next few weeks, as we prepare to have a number of 
dictionaries formally approved before or at the Seattle Congress.

First I welcome Mark Spackman (University of New England, Armidale,
Australia) as new member of COMCIFS and chair of the working group on
an electron density CIF dictionary.

Ongoing discussions
===================
Some remarks from Brian Toby:

D37.1 Lengths of data names
---------------------------
B> (D37.1) Yes we should relax the length limits (on lines & data items) in CIF,
B> but we have to see that we don't break CIF in the process -- there will
B> certainly be software that will fail when these long-discussed limits are
B> removed. CIF is a standard in current usage and COMCIFS needs to consider the
B> implications of changes on current usage of the standard. Allowing
B> creation of core-compliant CIF's that are not readable with existing
B> software will impede acceptance of CIF. The best way to make this
B> transition is to bundle it into the DDL2 migration.  Many other aspects
B> of CIF change with that transition, so it is a good point to relax
B> old restrictions.

D39.1 Introductory data blocks
------------------------------
B> (39.1) I do not have an problems with use of
B> 
B> data_include_dependent_dictionaries
B>   #\include  http://www.iucr.ac.uk/cif/ddl_core.dic
B> within a dictionary but I do have a problem with use of URL's as file-to-file
B> pointer mechanism for CIFs. My guess is that few organizations are prepared
B> to make their file servers into online archival sites. The IUCr can be
B> trusted to
B> keep a file called "http://www.iucr.ac.uk/cif/ddl_core.dic" available for the
B> forseeable future despite changes in computer systems, etc. How many data
B> providers will be able to make this promise? We must remember that CIF
B> is both an interchange and an archival standard.
B> 
B> I would prefer to see
B>   #\include  http://www.iucr.ac.uk/cif/ddl_core_1.4.dic
B> (note the addition of a DDL version number). Otherwise, an old dictionary
B> file will "break" when the DDL is updated from 1.4 to 2.x somewhere in
B> the future.  It is not necessary to put version numbers on all included
B> files -- but again we need to consider these files as archival standards
B> and realize that anywhere we don't include a version, we restrict
B> future change.

This is another old chestnut which we have never resolved entirely
satisfactorily. But I think it's correct to have file version numbers if
the files are referenced by URL for just the reasons Brian advances.

D38.1 A standard for image data
-------------------------------
B> (38.1) What is the level of contact in the imageNCIF effort with other
B> communities? There is an effort in the neutron community to develop
B> standardized binary formats and HDF will be used. (Contacts: Ray Osborn,
B> rosborn@anl.gov or Przemek Klosowski, przemek.klosowski@nist.gov)
B> 
B> Option 4: encapsulating CIF files inside HDF (see the previous discussion of
B> file-to-file pointers) makes the most sense to me.

I hope to report on new developments with the image project in the near
future.

New discussions
===============

D40.1 F(000) with non-integer electrons
---------------------------------------
The original definition of F(000) as given in the core dictionary for
_exptl_crystal_F_000 was "The number of electrons in the crystal unit
cell F(000)." This was subsequently modified to "The effective number of
electrons in the crystal unit cell contributing to F(000). It may contain
dispersion contributions." David Watkin has commented (of the first
definition) that it "tells nothing": he prefers a definition as "the sum
of the real parts of the scattering factors at theta = 0". Clearly the
intention of the modification was to permit non-integer values; should
the definition include David's formulation, or is another data name
required?

D40.2 F(000) in barns
---------------------
An inquiry from an author of a neutron diffraction paper asks
"How can I indicate that the units which are used for F000 are
(barns)1/2 and not electrons?" he also asks "How can I indicate
that (delta-rho)max and (delta-rho)min are arbitrary units
(neutron diffraction) and not e.A-3?", and "How can I indicate that
the absorption coefficient is the coefficient for the neutrons and not
for the X-rays? and with no absorption correction, must I indicate Tmin
and Tmax as 1.0 and 1.0?" Suggestions on the best treatment of these
problems will be welcome.

D40.3 Another Uequiv
--------------------
David Watkin also requests the facility to specify an equivalent U value
as  
     "Uequiv = (Ui*Uj*Ul)**1/3, where Ux are principal components of 
                orthogonalised U[ij] - You see the value over the
                arithmetic mean if, for example, Ul= -.0001. It's more
                sensitive to crazy adps."
This is therefore a different formulation of Uequiv. David calculates
this value within his CRYSTALS program, and wishes to output it to his
CIFs, even if he must employ a private dataname to do so. My question is:
does the alternative formulation have sufficiently wide application to
be included in the Core under a new data name?

D40.4 Symmetry-generated atoms in a site list
---------------------------------------------
I have had an inquiry from the Cambridge Crystallographic Data Centre
regarding inclusion of atoms in the atom_site list that are derived from
symmetry operations on atoms in the asymmetric unit. Their desire is to
write an atom_site list that can be used by non-crystallographic modelling
software, and thus might include atoms that would be generated from
symmetry operations applied to members of the asymmetric unit. The
proposal is to use a couple of data names local to the CCDC, and thus have
entries such as 

     loop_
         _atom_site_label
         _atom_site_fract_x
         _atom_site_fract_y
         _atom_site_fract_z
         _ccdc_atom_site_symmetry
         _ccdc_atom_site_base

   N1    -.1234(5)   0.4567(7)   -.6789(1)       1_555       .
   N1D   0.1234(5)   0.4567(7)   0.6789(1)       7_456       N1

(where the '.' for _ccdc_atom_site_base means that the atom is generated
from itself; alternatively N1 could be used in both cases, but it's needed
as a pointer to the 'parent' atom, i.e. that in the asymmetric unit).
There is a much more sophisticated way of describing symmetry generation
of portions of a complete structure in mmCIF, but CCDC consider that too
complex for the small-molecule modelling applications they have in mind.
Two questions:

(1) Does this approach break crystallographic CIF software? It runs
counter to the idea that you describe the crystal structure by specifying
an asymmetric unit in the highest possible symmetry group, and populate 
the lattice by applying all applicable symmetry operations. On the other
hand (perhaps if you assign occupancies of 0 to the symmetry-generated
sites) it should be possible to end up with a consistent description of
all occupied sites.

(2) Should it be restricted to CCDC with their private datanames, or
should the approach be tacitly encouraged through adoption of similar data
names in the core?

D40.5  CIF/MIF application
--------------------------
CCDC also wish to extend their usage of CIF output to include chemical
connectivity information with the MIF conventions, rather than by using
the _chemical_connectivity data names. In a MIF (molecular information
file), a simple saturated hydrocarbon can be described like this:

      data_cyclohexane

      _molecule_name_common        cyclohexane

      loop_
          _atom_id              # identifier
          _atom_type            # chemical type of atom
          _atom_attach_h        # number of attached hydrogens

               1        C          2
               2        C          2
               3        C          2
               4        C          2
               5        C          2
               6        C          2

CCDC wish to associate this description with a 3-dimensional
crystallographic coordinate set, linking the atom with site label C1 to
the atom in the above list with id "1", and so on - the usual sort of
cross-table pointer. Consequently they want to introduce a new dataname
"_atom_site_atom_id" to achieve this:

  loop_
      _atom_site_label
      _atom_site_fract_x
      _atom_site_fract_y
      _atom_site_fract_z
      _atom_site_atom_id
             C1   .1234(5)  .3456(7)   .5678(9)    1
             C2   .2341(5)  .4563(7)   .6785(9)    2

Formally this should cause no problems. However, we haven't properly
addressed the protocol for merging data names from different dictionaries:
should we have some agreed protocol for merging categories from different
applications, or can we get by with introducing datanames such as this
"_atom_site_atom_id" on an ad hoc basis? 

Again: should we adopt _atom_site_atom_id as a Core definition, or should
we leave it for CCDC to implement locally as _ccdc_atom_site_atom_id ?

=====

That's it for now - short and sweet, for a change. I'll be away from Chester
22-31 May inclusive, but hope to get back to COMCIFS work on my return.
Incidentally, the latest working version of the Core is in the comcifs
ftp area as the file "cifdic.c96". I need to add some examples, and agree
with Paula on some minor formatting details, but I expect to circulate
this for approval in the near future. If you want to look at it in the mean
time, ftp to agate.iucr.ac.uk as user comcifs, password wheatear.

Best wishes
Brian