Discussion List Archives

[Date Prev][Date Next][Date Index]

(78) Core extensions; electron density CIF; absolute structure

  • To: COMCIFS@iucr.org
  • Subject: (78) Core extensions; electron density CIF; absolute structure
  • From: bm
  • From: bm
  • Date: Tue, 9 Dec 1997 09:50:38 GMT
Dear Colleagues

Existing discussion threads
===========================
D76.1 Core extensions
---------------------
Helen Berman telephoned to say that she was taken aback by the raft of
apparently new data names listed in circular 76. These were originally
posted in circular 54 (following Syd Hall's wish to introduce them into
the Acta C Notes for Authors at the beginning of this year), and their
formal approval was moved to the back burner while we concentrated on the
pdCIF and mmCIF dictionaries. A couple of points arise from this
conversation:

(1) the records of previous discussions within COMCIFS should be available
for perusal (they are; at the URL http://www.iucr.org/cif/comcifs/minutes/,
which I periodically re-post), but that it would be helpful if they were
indexed or somehow better structured (see D78.4 below);

(2) the core dictionary once again is beginning to diverge from the image of
the core in DDL2 formalism within the mmCIF dictionary. For now, this can be
managed manually (since many of the "new" datanames are effectively
re-namings of existing entries, these are rather easily handled within DDL2
by the alias mechanism). However, in the longer term, it would be better to
cleave off the core from the body of the mmCIF dictionary, and have it
maintained in both DDL1 and DDL2 formalism by the same COMCIFS subcommittee.
(This is one of the objectives of the dictionary management working group,
who will soon be in a position to present their report to COMCIFS.)


On a separate topic, this mailing (as the few preceding ones) contains
technical debate about items for inclusion in the core dictionary itself. It
is our intent to establish a working party (or subcommittee) with
responsibility for the core (as likewise for the other dictionaries), but I
hope that you will bear with the continuation of the existing modus operandi
until that has been accomplished. I hope that we can tie up the loose ends
in the present discussions so as to bring the dictionary back into line with
the practice of Acta C, and leave the introduction of substantial new topics
to the anticipated core subcommittee.

D77.1 Multipole population coefficients
---------------------------------------
In response to my request for a summary of the electron density CIF work to
date, Mark Spackman sent the following note:


M> Like Peter, I too *do* read the postings, but am only prepared to comment
M> where I think my contribution may be worthwhile - and meaningful!  In this
M> case my response is needed, but it may be a lengthy one.
M> 
M> Progress on electron density CIF terms has been negligible - and there are
M> several reasons for this.
M> 
M> I was drafted onto COMCIFS (Jan 1996) after Dirk Feil (then Chair of the
M> IUCr Commission on Charge, Spin & Momentum Densities - CCSMD) was
M> approached by COMCIFS regarding extension of the CIF dictionary to include
M> electron density and possibly spin density items.  Dirk mailed the CCSMD
M> members, and I think I was the only one who responded - and indicated that
M> it was a matter that needed urgent attention, but I wasn't convinced that
M> archival of results of an electron density analysis was a desirable thing.
M> 
M> Anyway, I've been here in the background all this time, trying to get the
M> community to pay some attention to the matter.  I flagged it at the CCSMD
M> Sagamore Meeting in Brest in 1994, and Bev Robertson asked me to organise a
M> workshop at the next Sagamore Meeting held in Canada this year.   For a
M> number of reasons I couldn't do that, and after some email discussion, Ted
M> Maslen agreed to organise such a session/workshop.  Of course, after Ted's
M> untimely death, nothing was organised for the meeting, and precious little
M> got discussed on the matter either.
M> 
M> Now, I must confess to a few biases:
M> 
M> 1.	The issue I've always felt most strongly about is the appropriate
M> archival of DATA!  To me, the most useful information for charge density
M> purposes is the experimental observations and the associated conditions of
M> the experiment.  The present dictionary takes care of all of that I think
M> (although it is clear to me that too few of the charge density community
M> actually deposit their data in CIF format anyway).
M> 
M> 2.	I'm still not convinced that the results of a charge density
M> analysis of x-ray data should be archived - anywhere - excepting
M> publication of appropriate material in journals.  And it has been amazingly
M> difficult to get a sensible debate going on this issue.  At the last IUCr
M> Congress I stuck my neck out and spoke against the publication of tables of
M> multipole coefficients  (which is what Paul Mallinson is discussing in his
M> email).  I got NO feedback, least of all from the XD community who were
M> well represented.
M> 
M> These comments of course don't address the full issue: that is, what charge
M> density material _should_ be included in CIF format, and _then_ how best to
M> do that.  And this touches on the question: For what purposes do we
M> presently include results in CIF format?  I'm not at all sure if there's a
M> concise and all-encompassing answer to that one, but here are my comments
M> pertaining to charge density analysis:
M> 
M> *	Charge density analysis (ie multipole refinement) is an offshoot of
M> a normal least-squares refinement of x-ray data, and as such should use as
M> much of the current CIF dictionary as possible.
M> 
M> *	There are several different types of multipole refinement
M> (Hirshfeld, Hansen-Coppens, Stewart, Craven) and all make different
M> assumptions.  They have some things in common, but not many.  They use
M> different types and numbers of radial and angular functions.  They use
M> different coordinate systems, and some use a different coordinate system on
M> _each_ atom.  They all use different implicit normalizations on the
M> coefficients which are published.  (The latter two points render most
M> present tables of published multipole coefficients almost useless).
M> 
M> *	Do _not_ expect any standardization in the near future.  In my
M> mind, standardization of multipole methods is akin to standardization of
M> basis sets in quantum chemistry: impossible and highly undesirable!
M> 
M> *	There are a number of outcomes of a multipole refinement:
M> 	*  multipole coefficients;
M> 	*  radial exponents and kappas;
M> 	*  electron density maps of various kinds;
M> 	*  critical points of the electron density;
M> 	*  gradient and laplacian of the electron density at a number of
M> points;
M> 	*  electrostatic potential maps;
M> 	*  electric field maps;
M> 	*  electric field gradients at nuclear (and other) sites;
M> 	*  electric moments of molecules;
M> 	*  energies of interaction betweeen molecules.
M> 
M> *	For "maps" above (usually contour maps) you could also substitute
M> 3-D colour graphics (isosurfaces etc) these days.
M> 
M> These are the main reasons for my self-confessed biases above.
M> 
M> I have enormous sympathy for the call from Paul Mallinson, but the issue is
M> far from simply one of including multipole coefficients into a CIF file.  I
M> would greatly appreciate other comments on this matter.  I would also
M> happily relinquish my membership of COMCIFS in favour of a more appropriate
M> person to get this matter going  - but I'd also like to be involved in the
M> discussion!

And a comment from David Brown: 

D> 	This request shows just how much more crystallography there is
D> still waiting to be defined in cif dictionaries.  I hope we can raise some
D> interest from the electron density group to propose a way of dealing with
D> this request.  Another area where we still lack definitions is magnetic
D> structures (these require the definition of magnetic space groups and
D> dipole vectors for each atom).  There also seems to be an opportunity for
D> the writers of structure refinement packages to allow raw intensities to
D> be read and written in cif so that different refinement packages can be
D> used. Not all refinement packages are equally good at dealing with all the
D> possible problems in a structure, and users are increasingly maintaining a
D> suite of refinement packages so that they can select the program best
D> suited to the problem, but there is no universal language for transfering
D> the information between packages. With few additions, cif would perform
D> this function well.  I am surprised that no one seems to have thought of
D> this application.

I think it's fairly clear that in most cases of extending the area of
discourse of CIF, the issue is rarely one of including a few extra items
into a CIF file. Even where the desiderata seem very simple, it is usually
seen that many other considerations are relevant to meeting what seem at
first straightforward objectives. I was struck at the imgCIF workshop by the
extent to which the "simple" requirement of defining a protocol for
describing two-dimensional raster images rapidly blossomed (if that is the
word) into a very detailed framework for describing multiple images within
multiple datasets from multiple detectors from multiple experiments. Yet the
labour involved in establishing this framework looks as though it will lead
to a comprehensive description of multiple detectors, an area in which the
original core CIF dictionary was weak. So the CIF effort is leading towards
an ever more complete description of crystallography, and we do need input
from other specialised communities.

D77.2 Absolute structure (and related data items)
-------------------------------------------------
D> 	Howard's comments show us how important it is to get not only the
D> definitions right but also the names.  At some point we will need to do
D> some further housekeeping in revising names that are at best misleading
D> and at worst incorrect.
D> 
D> 	The answer to Howard's problem with the enumeration range is for
D> the writers of software to recognise that a number given as 1.3(2) does
D> lie in the numeration range 0 to 1 within the experimental uncertainty.
D> If this interpretation is allowed, there is no need to change the
D> enumeration range to something that we are not yet able to handle
D> properly.  This convention could then be applied to any experimentally
D> determined number that was given with its uncertainty.  An author who
D> failed to report the uncertainty would then risk having the cif rejected
D> (and quite rightly too).

I asked Howard for some more suggestions on improving the reporting of
absolute structure, and he has come up with some modifications to
_reflns_number_total, _reflns_number_Friedel and _reflns_number_gt, as well
as a new data name for specific optical rotation.

H> (4) Suggested improvement for core extensions/amendments for:
H>  
H>  _reflns_number_total 
H>  _reflns_number_Friedel
H>  _reflns_number_gt (for which I have no specific proposal)
H> 
H> Explanation:
H> 
H>   Using the example of a _refln_ list containing four reflections h,k,l;
H> -h,-k,-l; -h,k,-l; and h,-k,l in space group P2 where:
H>    h,k,l is equivalent to -h,k,-l, and  
H>   -h,-k,-l is equivalent to h,-k,l 
H>   under the operations of the crystal point group.
H>  
H>   We always use "Friedel pairs", or "Friedel opposites" to apply only to
H> the pairs {h,k,l and -h,-k,-l} and {-h,k,-l and h,-k,l} whereas we use
H> the term "Bijvoet pairs" to apply to {h,k,l and h,-k,l}, {-h,-k,-l and
H> -h,k,-l}. This leaves a little doubt about what is meant by "Friedel
H> equivalent" reflections in the proposed definition. But taking it to
H> mean reflections that are equivalent under the Laue symmetry of the
H> crystal, the four reflections in my list are Friedel equivalent and the
H> appropriate value of _reflns_number_Friedel is 4 (four). Likewise 
H> _reflns_number_total takes a value of 4 for my list. 
H> 
H>   On the other hand I have the impression that it is intended that 
H> _reflns_number_total applies to "unique" reflections in some way that is
H> not defined anywhere in the dictionary. For example Acta
H> Crystallographica C takes the value of _reflns_number_total and
H> qualifies it by "independent reflections". So I suppose the idea is that
H> the reflections should have been averaged or grouped into sets of
H> reflections symmetry equivalent under the crystal point group. In this
H> way I come to 2 for the desired value of _reflns_number_total and 1 for
H> _reflns_number_Friedel for my list. If these are not the desired values
H> then I would appreciate knowing what was intended.
H> 
H>  Further in the circulated definition of _reflns_number_Friedel the
H> words "although related by symmetry" are most confusing. There are
H> several symmetries coming into play:
H>  (1) the crystal point group (crystal class)
H>  (2) the Laue symmetry (crystal class + a centre)
H>  (3) the symmetry of the Fourier transform of a real positive-definite
H> function. I wonder which symmetry is implied here.
H> 
H> So I propose for the _definitons for 
H> 
H> (i)  data_reflns_number_total 
H>         with the regret that the data name does not clearly indicate
H> that a crystal-class unique set is intended:
H> 
H> data_reflns_number_total
H>     _name                      '_reflns_number_total'
H>     _category                    reflns
H>     _type                        numb
H>     _enumeration_range           0:
H>     _definition
H> ;              '_reflns_number_total' counts the number of sets of
H> reflections in the _refln_ list where each set contains reflections that
H> are equivalent under the point-group symmetry of the crystal (crystal
H> class), the sets being necessarily disjoint.
H> ;
H>    
H> 
H> (ii) data_reflns_number_Friedel 
H>         with the regret that the data name does not clearly indicate
H> that a Laue-symmetry unique set is intended:
H> 
H> data_reflns_number_Friedel
H>     _name                      '_reflns_number_Friedel'
H>     _category                    reflns
H>     _type                        numb
H>     _enumeration_range           0:
H>     _definition
H> ;              '_reflns_number_Friedel' counts the number of sets of
H> reflections in the _refln_ list where each set contains reflections that
H> are equivalent under the Laue symmetry of the crystal, the sets being
H> necessarily disjoint.
H> ;
H> 
H> (iii) NB data_reflns_number_gt needs modifying as well.
H> 
H>  The above definitions would be most useful for understanding whether
H> adequate data had in fact been measured to determine the absolute
H> structure of a non-centrosymmetric structure. The two values should take
H> identical values for a centrosymmetric structure.
H> 
H>  In the print version of Acta Crystallographica C the values of 
H> _reflns_number_total  and _reflns_number_Friedel could be qualified by
H> "crystal-class independent reflections" and "Laue-symmetry independent
H> reflections" respectively.  
H>
H>
H> (5) Proposed new data names _chemical_absolute_configuration and
H> _chemical_optical_rotation.
H> 
H> data_chemical_absolute_configuration
H>     _name                      '_chemical_absolute_configuration'
H>     _category                    chemical
H>     _type                        char
H>     loop_ _enumeration
H>           _enumeration_detail    rm     'absolute configuration
H> established by the structure determination of a compound containing a
H> chiral reference molecule of known absolute
H> configuration.'
H>                                  ad     'absolute configuration
H> established by anomalous dispersion effects in diffraction
H> measurements on the crystal.'
H>                                  rmad   'absolute configuration
H> established by the structure determination of a compound containing a
H> chiral reference molecule of known absolute configuration
H> and confirmed by anomalous dispersion effects in
H> diffraction measurements on the crystal.'
H>                                  .      'inapplicable'
H>     _definition
H> ;              In 'Enantiomers, Racemates, and Resolutions' by
H> Jean-Jacques Andre Collet and Samuel H. Wilen: John Wiley & Sons, New
H> York, it is stated that "the absolute configuration of a chiral
H> substance is known when an enantiomeric structure can be assigned to an
H> optically active sample of a given sign." It is thus always recommended
H> to report the optical activity in solution of the molecule(s) using
H> _chemical_optical_rotation.
H> 
H>                Sufficient conditions for the assignment of
H> _chemical_absolute_configuration are as follows in which Set_E is
H> defined to be the set of non-centrosymmetric crystal classes { 1, 2, 3,
H> 4, 6, 222, 32, 422, 622, 23, 432}, x(u) is the value of the Flack(1983)
H> parameter as given by  _refine_ls_abs_structure_Flack, NEAR = 1.6 and
H> FAR = 5.0. 
H> If the crystal class is NOT in Set_E then
H> _chemical_absolute_configuration is 'inapplicable' and must take the
H> value '.' if present. 
H> For 'rm' to be valid the crystal class must be in Set_E and the source
H> of the chiral reference substance of known absolute configuration must
H> be reported.
H> For 'ad' to be valid the crystal class must be in Set_E and |x/u| < NEAR
H> and |(1-x)/u| > FAR.
H> For 'rmad' to be valid the conditions of both 'rm' and 'ad' must be
H> fulfilled.
H> ;
H> 
H> data_chemical_optical_rotation
H>     _name                      '_chemical_optical_rotation'
H>     _category                    chemical
H>     _type                        char
H>     _example                   '[\a]^25^~D~ = +108 (c = 3.42, CHCl~3~)'
H>     _definition
H> ;              The optical rotation in solution of the compound is
H> specified in the following format:
H>                '[\a]^TEMP^~WAVE~ = SORT (c = CONC, SOLV)' 
H>                where:
H>                  TEMP is the temperature of the measurement in degrees
H> Celsius,
H>                  WAVE is an indication of the wavelength of the light
H> used for the measurement,
H>                  CONC is the concentration of the solution given as the
H> mass of the substance in g in 100 ml of solution,
H>                  SORT is the signed value (preceded by a + or a - sign)
H> of 100.\a/(l.c), where \a is the signed optical rotation in degrees
H> measured in a cell of length l in dm and c is the value of CONC in g,
H> and
H> 
H>                  SOLV is the chemical formula of the solvent.
H> ;

I've posted the latest version of the core dictionary version 2.1beta3,
incorporating these changes and additions, at
ftp://ftp.iucr.org/cifdics/cif_core_2.1beta.dic
and now publicly on the web under the "Beta version" heading at
http://www.iucr.org/cif/cif-core/


D77.3 Management of CIF DDLs
----------------------------
D> 	We can, of course, officially announce that we are taking control
D> over the definition of DDL's 1 and 2 in the same way that we control cif
D> dictionaries, but to do this formally would require that the IUCr obtain
D> ownership.  This may be desirable, but those developing the DDL's may
D> prefer to go a different route.  This is a matter for negotiation.  Can
D> the DDL developers let us know if they are interested in exploring IUCr
D> protection to maintain the integrity of their language?
D> 
D> 	What COMCIFS can do without any further authority is to determine
D> what DDL features are allowed to be used in cif dictionaries.  We already
D> have determined the STAR features that we accept and those that we do not
D> allow, and there is nothing to prevent us from selecting the features of
D> DDL that we are prepared to implement.



New topics
==========

D78.1 CIF dictionary ownership
------------------------------
D>      Now that we have a number of dictionaries in use and many others
D> in preparation, it is becoming important to ensure that everyone is aware
D> that the cif is the property of the IUCr.  Many of the people who are
D> now starting to use cif, or who will be in the future, will be unaware of
D> the history of how cif developed or how it is managed.  I would like to  
D> propose that the ownership of cif be mentioned clearly in any publicity 
D> that is circulated, particularly on the web pages that are becoming the 
D> major route for dissemination.  A mention of COMCIFS as the organisation
D> that manages cif would also be appropriate.
 
Most of the CIF web pages that are mirrored at Chester have very clear 
declarations along these lines, and can be quoted as examples for new web 
authors to view. It may well be worth devising some standard line of text 
(perhaps as a page footer) on the IUCr pages that other people might copy on
their own pages, something like the standard copyright line at the foot of  
all our pages:
       CIF Copyright (c) International Union of Crystallography 


D78.2 imgCIF web pages
----------------------
On the related topic of web pages as foci for project development, Andy
Hammersley has put together some very nice pages on imgCIF and the CBF
project at http://www.esrf.fr/computing/Forum/imgCIF. I hope to be able to
mirror these on the IUCr servers in the near future.


D78.3 CIF and XML
-----------------
You will all have seen Peter Murray-Rust's mailing, which I repeat here for
two reasons, the first of which is that these circulars at present
constitute the complete record of COMCIFS business.

P> 	I *do* read the postings, though I haven't got detailed comments
P> on the latest drafts.  I hope the following isn't too much off-topic....
P> 
P> 	I am getting increasingly asked to give talks on e-publishing and
P> the globalisation of scientific information and I usually manage to
P> highlight the very significant achievements of the IUCr/CIF activity. By
P> comparison with crystallographers, the chemists have an enormous way to go
P> before they can capture data reliably in primary or secondary
P> publications. The physicists are more interested in abstracts than data.
P> 	I also feel strongly that terminology (and/or data dictionaries)
P> are going to be increasingly important in the metadata for the WWW, and
P> that again CIF has made an important contribution here. It is the first
P> example I go to to show how semantics can be linked to syntactic
P> information.
P> 	I have been asked to introduce the next CCP4 meeting (Jan 1998)
P> whose theme is databases and I hope to be able to give demos of how CIFs
P> can be used with the next generation of WWW languages (especially XML).
P> CIF can be converted into XML and this then gives rise to a wide range of
P> tools that can be used to manipulate/render it. I have done this for core
P> CIF and powder, but not yet for mmCIF - if there are some small
P> simple stable examples I would be happy to have a look at whether their 
P> structure can be represented in this system (XML has a powerful system for
P> representing links (==pointers) between data items).
P> 
P> 	P.
P> 
P> If this is of interest to individuals I'd be happy to discuss by e-mail
P> Peter Murray-Rust (PeterMR, http://www.nottingham.ac.uk/~pazpmr)

Thanks Peter - it is good to know that this effort can be broadcast as a
(good) example to others. We're also - at Chester - interested in the
development of XML and may well come to you for advice in due course as we
consider the next step in accommodating CIF to the requirements of the
wider e-publishing world.

The second reason that I wanted to include Peter's mailing is to draw
attention to the difference in impact between a direct mailing to the group
and the regular circulars that you receive from me. This leads on to...


D78.4 Reorganisation of the mailing procedures
----------------------------------------------
It can be a substantial job to put together some of the circulars for this
group, and the time taken to do so can sometimes blunt the point of a
discussion. At other times, the volume of material posted in a single
circular can be overwhelming, and important details may get lost from view.
I am therefore suggesting the possibility of moving away from this mechanism
towards a more open discussion list, where messages on single topics are
posted to the list and responded to in individual threads. This is the means
of discourse used in the mmCIF, DDL2 and imgCIF communities, and it seems to
have served those groups well. It has the advantage that rapid discussion
can spring up over a contentious issue. It has the further advantage that,
if the participants exercise discipline in the use of their e-mail Reply
functions and in the subject lines of their messages, self-documenting
archives of discussions can be established, either in the recipients' own
mailboxes or on dedicated archive servers. This could help to address the
problem Helen Berman has drawn attention to of retrieving information from
past postings.

To demonstrate the way in which such a discussion group would work, I have
been testing list software with the other members of the Electronic
Publishing Committee. The result of our trials so far can be seen at
http://www.iucr.org/lists/, which includes links to the mmCIF, DDL2 and
imgCIF archives. The imgCIF materials are here on an experimental basis, and
all these URLs should be regarded for now as privileged, and not for further
propagation. It is intended, however, to offer the functionality of these
lists and archives to IUCr Commissions and Committees in due course.

Please scan these examples, especially the epc-l list. Note that much of the
content is of little general interest, since it represents the early tests
of the system manager and list owner; but nonetheless the way in which the
system functions can be clearly seen.

I should be grateful for your comments and opinions on the following issues:

(1) is it preferable to move from the current system of circulars to such a
    discussion list?

(2) should membership of the list be open to anyone; or to anyone with the
    approval of the list owner; or to a restricted group, broadly equivalent
    to the existing mailing list?

(3) should the record of the discussions be visible to anyone, or only to
    members of the discussion group (so that a password would be needed to
    view the archive)?

(4) would it be useful to have such a service available for each of COMCIFS
    and its component subcommittees (large groups such as mmCIF and imgCIF
    have their own facilities; Chester could provide the facility to smaller
    groups)?

(5) is it useful to have a "digest" option, where subscribers can elect to
    receive not the individual messages as they are posted, but periodic
    longer messages containing the complete record of the discussions 
    during that period? This would allow members with 'Observer' status
    to keep an eye on proceedings on a weekly or fortnightly basis, for
    example.


D78.5 mmCIF Editorial Board
---------------------------
Following the proposal detailed in circular 76, the mmCIF maintenance group
has appointed an editorial board to review and approve new mmCIF data names.
For the interest of those who don't subscribe to the mmciflist mailing list,
I append the announcement:

> Dear Colleagues,
> 
> We are pleased to announce the membership of the mmCIF Editorial Board
> whose role will be to review proposed new data items. The members are:
> 
> Frank Allen, Phil Bourne, Kim Henrick, Andy Howard, Joel Sussman,
> and Dale Tronrud.
> 
> Helen Berman
> Paula Fitzgerald

That's all for now.

Best wishes
Brian