Discussion List Archives

[Date Prev][Date Next][Date Index]

(26) Powder draft; enhancements to core; new dictioanries

  • To: COMCIFS@uk.ac.iucr
  • Subject: (26) Powder draft; enhancements to core; new dictioanries
  • From: bm@uk.ac.iucr (Brian McMahon)
  • Date: Fri, 5 Aug 94 13:04:10 BST
Dear Colleagues

Items up for approval
=====================

A24.1 CIF powder dictionary
---------------------------
The COMCIFS review period for this dictionary has expired, without adverse
comment, and I therefore propose to release Brian's draft to the community at
the beginning of next week. However, it is important that we give our active
consent to the dictionaries we sponsor, and I would therefore urge those of
you have yet to do so to send me a brief e-mail recording your consent for
this release.

A25.1 Non-use of _local_ prefix for datanames in public dictionaries
--------------------------------------------------------------------
S> D25.1: Yes, I agree.
D> 	I approve of the three items presented for approval (25.1...

A25.2 _diffrn_radiation_xray_symbol
-----------------------------------
S> D25.2: Yes, OK (I am relieved that we have allowed for a H target ;<:o ).
D> 	I approve of the three items presented for approval (... 25.2 

A25.3 _pd_instr_radiation_probe to move to Core
-----------------------------------------------
S> D25.3: No, I do not agree to the addition of _pd_instr_...to the core.
S> 
S> With the addition of _diffrn_radiation_xray_symbol to the core (which 
S> specifies the xray wavelength class), I could be persuaded to change the 
S> enumeration of _diff....type to x-ray, neutron and electron but thats
S> it as far as I am concerned. The other proposal does not make sense.

D> 	25.3: There is overlap between _diffrn_radiation_type and
D> _pd_instr_radiation_probe but the latter is more specific and more useful
D> if, for example, one wanted to search only for neutron diffraction
D> experiments.  This would be difficult to do reliably with _type since the
D> text is free and might, for example, read 'time of flight' where 'neutron'
D> is implied.  Ideally we would drop *_type and encourage people to use the
D> more explicit form.  I would like to see *_probe moved into core. 

Before these comments arrived, Brian had sent me a clarification of his
intentions, which I append below. Sorry not to post this sooner. I am in
favour of this approach, which provides for greater specificity than the
existing loose definition under '_diffrn_radiation_type'. Other comments on
this resubmission?

B>   I am afraid that I did not make my proposal very clearly in 25.3. I
B> propose that _diffrn_radiation_type retain its most common usage which
B> is to indicated the x-ray wavelength(s) used for the diffraction
B> study. I would recommend the examples be  
B>   loop_ _example   CuK\a  CuK\a~1~  white-beam 
B> Perhaps neutron folks might wish to use *_type to indicate the neutron
B> moderator type (as in '100K CH~4~ moderator') though it is not 
B> usually very important, except perhaps in time-of-flight.
B> 
B> On the other hand, _diffrn_radiation_probe (if accepted to the core)
B> would have enumerations of
B>      loop_ _enumeration    x-ray neutron electron gamma
B> And this would indicate the kind of radiation used for the study in a 
B> computer-readable form. This is very important to the user of the
B> structural study.
B> 
B> Thus they would end up with very different information. On the other
B> hand, _diffrn_radiation_type and _diffrn_radiation_xray_target and 
B> _diffrn_radiation_xray_symbol (if accepted) do have exactly the same
B> information in different forms. Eventually as Siegbahn notation fades
B> into obscurity, _diffrn_radiation_type could be retired.


A25.7 Structuring the front-of-dictionary
-----------------------------------------
S> D25.7: Excellent proposal. May I add my vote to the unanimous decision!
D> 	I approve of the three items presented for approval ( ... 25.7)


Other topics under discussion
=============================

25.4  MIME types
----------------
S> D25.4: OK, we are definitely on the same wavelength on this one. I agree
S> that application CIF's should have a consistent extension of .cif, but
S> believe strongly that DDL dictionaries should have the extension .dic
S> (interestingly, it was Peter that requested this distinction!). I would
S> go further and ask that validation/enumeration definition files be given
S> the extension .val.

D> 	25.4: I agree that filename.cif would be an inappropriate symbol
D> for MIME.  The only thing that would be necessary would be for mime to
D> recognise a star file since this invokes its own dictionary and with it,
D> in principle, everything it needs to know about the file and the
D> information it contains.  A more general suffix such as .str (for star)
D> would be more appropriate.  The only thing a reading program needs to know
D> is that the file is a star file, since such a file is self-defining and
D> would know how to access the information it needs for its own
D> interpretation.  At this point, in the ultimate version of cif, it would
D> also know how to plot a picture on the screen and how to calculate
D> molecular geometry since the required expressions will (eventually) be
D> coded into the dictionary!  If we settle for anything less in the long
D> term, we loose all the flexibility that the STAR construction offers. 

I think David goes a little too far with his proposal, at least at this early
stage. The idea behind MIME is to classify files which can be handled by
various application programs (or classes of program). Hence, a CIF will
typically contain information on crystal structures that a user wishes to
employ in some way (typically, he might wish to look at ORTEP
representations). So he can use the MIME classification scheme to associate
"run ORTEP" with any file he downloads via WWW that has a ".cif" extension. He
may also be interested in chemistry, and tells his WWW browser to launch a 2-d
chemical scheme plotting package on any ".mif" file he downloads. It's a way
of linking, or binding, applications of choice to a class of information; and
as such, there is merit in differentiating between CIFs, MIFs, dics, nmrIFs,
and whatever other flavour we devise. When David has written his universal
STAR interpreter, we can map all of cif, mif and the rest to use this
wonderful program!


D25.6  _type_construct
----------------------
S> (a) Just for the record, my distaste for indented script is because I
S> believe they disguise rather than enhance the interpretation. OK, its 
S> probably just another example of "different strokes for different folks" 
S> but my choice of layout is:
S> 
S>     _type_construct
S> ;
S>        (_chronology_year)\                 # year must occur
S>     ((-(_chronology_month))?\              # month only if year
S>     ((-(_chronology_day))?\                # day only if month
S>     ((T(_chronology_hour))?\               # hour only if day
S>     ((:(_chronology_minute))?\             # minute only if hour
S>      (:(_chronology_second))?)?\           # second only if minute
S>    [+-](_chronology_timezone))?)?)?)?      # timezone if any time
S> ;
S> 
S> OR, if blanks are permitted adjacent to the names, this is even better:
S> 
S>     _type_construct
S> ;      (_chronology_year    )\             # year must occur
S>     ((-(_chronology_month   ))?\           # month only if year
S>  ... etc

I'm not sure that allowing blanks would be a good idea (explicit blanks would
need to appear within yet more delimiters), but that's a detail to quibble
over when we settle for keeps on the way to express this.

S> Onto the more serious point you raised about the non-data aspect of
S> these components. Perhaps the use of "null" for this type of category 
S> is one way to handle this...... though perhaps these items are in the
S> same class as *_[] in that it cannot be used as an application data item.
S> 
S> But I must confess to feeling very uneasy about this whole approach.
S> I am not sure exactly why -- probably because I cannot see beyond 
S> the definition stage and one really needs to use these data in combat
S> to see if they will actually work.
S> 
S> As you know my first inclination was to construct _audit_creation_date with 
S> 
S>     _type_construct
S> ;      (_audit_creation_year    )\             # year must occur
S>     ((-(_audit_creation_month   ))?\           # month only if year
S>     ((-(_audit_creation_day     ))?\           # day only if month
S>     ((T(_audit_creation_hour    ))?\           # hour only if day
   ...etc
S> ;
S> and then these components are real, definable and usable data items. Of
S> course this means that other *_date definitions in applications such
S> as _pd_ would require the individual definition  of the components. Apart
S> from some added verbosity I do not see a problem with this.
S> 
S> At the moment I see the universal _chrono... components in star_core.dic
S> as been a bit too cute because they require special considerations to make 
S> sure they are used correctly. Such cuteness has in the past come back to 
S> haunt me so perhaps I am more wary than the young and fearless.

I have already sent Syd my views on this, along the following lines.
Certainly it will work. It's a question of whether the added verbosity is
excessive, and the extent to which you wish to atomise (?) datanames. And
now, everywhere you have an _audit_creation_date which an archiving program
might pick up and store, you might have _audit_creation_year etc., all of
which need to be swept up and stored (and translated into _audit_creation_date
or vice versa, according to the program). It's been pointed out in
the past that _citation_date_second is not likely to be a very useful data
name, but a _type_construct for _citation_date that allows for the
incorporation of a _second value does allow you to handle such cussed
(but legitimate) cases if they arise.

Think of symops. Now we have
    loop_ _name                '_geom_bond_site_symmetry_1'
                               '_geom_bond_site_symmetry_2'
    _category                    geom_bond
    _type                        char
    _list                        yes
    _list_reference            '_geom_bond_atom_site_label_'
    loop_ _example    . 4 7_645 
    _definition
;              The symmetry code of each atom site as the symmetry equivalent
               position number 'n' and the cell translation number 'mmm'.
               These numbers are combined to form the code 'n mmm' or ...etc
;

Here, I would propose to insert something like
   _type_construct
   (_symmetry_equiv_pos_as_xyz)(_(_sym_transl_x)(sym_transl_y)(_sym_transl_z))?

and now we need to define 3 new generic data names (_sym_transl_) which are
not actually usable. But the definition applies to both _symmetry_1 and _2.
In [Syd's] scheme, you need to split up the definition (or loop the
_type_construct) Then you need to define at least
_geom_bond_site_symmetry_1_translation_x (suitably truncated to 32
characters), _y and _z, and _geo...try_2_translation_x etc. Not to mention
the _geom_torsion_site_symmetry_ things.

Has anyone else strong substantive views on this?

New topics
==========

26.1 Revision of definition for _diffrn_standards_decay_%
---------------------------------------------------------
This should be read in conjunction with D23.2.

S> I have had requests to provide better definitions for two existing
S> core data items.
S> 
S> (1) data_diffrn_standards_decay_%
S>     _name                      '_diffrn_standards_decay_%'
S>     _category                    diffrn
S>     _type                        numb
S>     _enumeration_range           0.0:
S>     _definition
S> ;          The percentage variation of the mean intensity for all standard
S>            reflections.
S> ;
S> 
S> It has been suggested that this does not adequately define what decay means.
S> In submissions to Acta decay is either interpreted as the reduction in 
S> intensity from the start to the finish of the measurements, or as the 
S> variation of the standard intensities during the time of measurement (i.e.
S> the maximum dispersion of the standard intensities). The question has also
S> been asked: What if the std int increases; should this value be negative?
S> This is prohibited by the enumeration range but it still gets submitted.
S> 
S> I propose the following change in description
S> 
S>     _definition
S> ;              The percentage decrease in the mean intensity for standard
S>                reflections which indicates an overall decay in crystal
S>                quality.
S> ;

Is there agreement on this proposed rewording? And is there any favour
for the idea presented by David in D23.2 to introduce
_diffrn_standards_correction_max and *_min?

D26.2 Revision of definition for _diffrn_reflns_number
--------------------------------------------------------------
S> (2) data_diffrn_reflns_number
S>     _name                      '_diffrn_reflns_number'
S>     _category                    diffrn_reflns
S>     _type                        numb
S>     _enumeration_range           0:
S>     _definition
S> ;              The total number of measured diffraction data.
S> ;
S>         
S> I have been asked by a co-editor to clearly state that this number should
S> NOT include the measurement of systematically absent reflections as this
S> distorts the intent of this data item - which is to give some measure of
S> the redundancy of intensities measured when compared to _reflns_number_total
S> and _reflns_number_observed. I am not too keen on hard and fast rules about
S> this but I do see the problem - especially when it comes to some auto control
S> software that measures all data points independent of the primitivity of
S> the cell. So I tentatively propose the following:
S> 
S>    _definition
S> ;              The total number of measured diffraction intensities, 
S>                excluding reflections that are classed as systematically
S>                absent due to the non-primitivity of the crystal unit cell.
S> ;

Yes? No?


D26.3 New entry for neutron scattering lengths
----------------------------------------------
B>   I propose adding an entry for neutron scattering lengths (c.f. Int.
B> Tables B p. 383) to _atom_type_[]
B> 
B> _atom_type_scat_length_neutron
B> 
B> "The bound coherent scattering length for the atom type at the isotopic
B> composition used for the diffraction experiment (in fm)"

Does anyone disagree?


D26.4 Should concatenation of enumerated codes be allowed?
----------------------------------------------------------
B>    In going through the current core dictionary, I noticed that for
B> _atom_site_refinement_flags, the enumeration has a list of single
B> letter codes, but the documentation says that these codes may be
B> concatenated. Is this a correct usage? I think it should not be.


D26.5  Modulated Structures Database (for information)
------------------------------------------------------
You might like to know that David has been approached by Dr Gotzon Madariaga
(wmpmameg@es.ehu.lg) with a proposal to construct a database of modulated
structures in CIF format. We shall in due course liaise with him on ensuring
that the extension dictionary constructed by his working party conforms with
existing dictionaries (volunteers welcome!).


D26.6  Symmetry dictionary
--------------------------
Those of us who were in Atlanta may have seen Donald Ward's tables of
Patterson peak positions for the ACA monograph series, which he is preparing
in the style of International Tables. Donald has asked the Union whether there
is any preferred format for supplying the data in his tables in
machine-readable form, and that naturally set me to thinking about the
possibility of devising a CIF dictionary of datanames for general symmetry
descriptions. Such a project would of course broaden to the scope of existing
tabulations in the International Tables series. Syd and I talked about this
at some length, and this seems to be a plausible approach. Here is the
approach Syd sketched out:

S> I would be interested to see how this data should be encapsulated in a CIF.
S> Presumably each space group setting would be a data block -- and separation
S> of the symmetry sites (equiv posns) and Harker peaks [by the way, why aren't
S> they called this?] will need two cross referenced loop structures, perhaps 
S> three... Here's an example for P21/c....
S> 
S> data_patt_P_1_21/c_1
S>    _symmetry_space_group_name_H-M          P_1_21/c_1
S>    _symmetry_space_group_name_Hall         -P_2yc
S>    _symmetry_Int_Tables_number             14
S> 
S> loop_
S>    _symmetry_equiv_pos_Wyckoff
S>    _symmetry_equiv_pos_as_xyz
S>       e x,y,z       e -x,-y,-z     e -x,1/2+y,1/2-z     e x,1/2-y,1/2+z
S>       d 1/2,0,1/2   d 1/2,1/2,0
S>       c 0,0,1/2     c 0,1/2,0
S>       b 1/2,0,0     b 1/2,1/2,1/2
S>       a 0,0,0       a 0,1/2,1/2
S> 
S> loop_
S>    _symmetry_Wyckoff_group_notation
S>    _symmetry_Wyckoff_group_count
S>    _symmetry_Wyckoff_group_point
S>       a 2 -1    b 2 -1    c 2 -1    d 2 -1    e 4 1
S> 
S> loop_
S>    _symmetry_patt_harker_Wyckoff
S>    _symmetry_patt_harker_count   
S>    _symmetry_patt_harker_xyz
S>       e 8 0,0,0    e 4 2x,2y,2z   ..................... etc. 
S> 
S> and so on. By the way I have not worked out the Harker peaks for
S> P21/c but you get the picture of one approach.
S> 
S> It will take a lot more CAREFUL definitions but I think it is feasible
S> if you want to tabulate this info (computing it would be easier and
S> more concise...but that's another matter).

David is also well disposed towards the underlying principle, but more
cautious about using Donald's work as a springboard. I include below his
detailed comments on this proposal. Some of you will already be aware that
the Executive Committee is keen to have machine-readable versions of
International Tables (as are many of the people on the IT Commission), and
this is one of the concerns of the Electronic Publishing Committee. Your
views on this matter are welcomed.

D>      Ward's proposal is most interesting, as is the discussion that
D> followed.  I have a number of pertinent comments on the proposal.
D>      I agree that International Tables is part of crystallography
D> and ripe for bringing into the electronic age.
D>      I agree that CIF is the obvious route to go.
D>      BUT if we are going to prepare something for Don to use, we
D> should start at the beginning and make sure that we do all of space
D> groups and symmetry correctly.
D> 
D>      The mechanism by which we do this is far from clear and we
D> should tread this one VERY carefully.  I have some small experience
D> in this field, since one of my postdocs wrote a program to generate
D> the tables of Vol A and we use this, rather than table look-up,
D> routinely in our calculations here. 
D> 
D>      There are at least two different dichotomies that need to be
D> considered.  
D>      The first might be described as the Germanic/Anglo-saxon
D> dichotomy.  When people like me and Donald Ward approach space
D> groups we do so in a strictly pragmatic way - we want to use them
D> to help us interpret crystal structures.  When people like Theo
D> Hahn and Hans Burzlaff approach space groups they do so in a
D> strictly theoretical way - they want to exploit the rigour of the
D> theory, whether or not this may have direct practical use.  Clearly
D> both veiwpoints are valid and important, but if we are going to
D> start defining terms, it is absolutely essential that we have the
D> theoreticians on board.  People like me, and I suspect Don, if left
D> to ourselves, would make a real mess of the job.
D>      The second is the computer/human dichotomy.  International
D> Tables was developed largely without the aid of computers and its
D> conventions were developed to assist crystallographers and not to
D> make life easier for computer programmers.  When we developed our
D> program for analysing crystal structures, we decided to use the
D> space group symbol to generate the symmetry operators
D> algorithmically.  We soon discovered that if we used the Hermann-
D> Mauguin symbol, there were ambiguities between certain settings,
D> e.g. the H-M symbol does not indicate the choice of origin.  As a
D> result, we use the Hall symbol.  However, there are serious
D> problems with the algorithmic approach, particularly when it comes
D> to the human interface.  Syd's symbols are not nearly as easy for
D> people to interpret as the H-M symbols, and it is impossible to
D> calculate the Wyckoff symbols algorithmically, because any sorting
D> algorithm will order the special positions in a different way when
D> the setting is changed.  I mention this detail to illustrate just
D> one of the problems involved in the use of algorithmic, as opposed
D> to table look-up, approaches.  A more trivial, but just as
D> pertinent, problem is the traditional approach of giving symmetry
D> operators in the form x,y,z.  This is great for people to read but
D> if, as we seem to agree, cifs are for computers, not people, the
D> logical representation is in the form of the 4x3 matrix (1 0 0 0 0
D> 1 0 0 0 0 1 0).
D> 
D>      Even though the conventions of International Tables have been
D> essentially unchanged for the last 60 years, the field of symmetry
D> is now rapidly developing.  Some of the more esoteric aspects of
D> space group theory (e.g. space groups in more than 3 dimensions)
D> are now becoming routinely useful.  The people working on the
D> dictionary of aperiodic (modulated) structures will need much more
D> elaborate cif representations than are currently available.  
D> 
D>      These are not trivial questions and I suspect that the
D> creation of a symmetry dictionary will generate as many problems
D> and a great deal more heat than has been generated by the
D> macromolecular dictionary.  We need to be very careful in the
D> working group that we establish to prepare this dictionary. 
D> Unfortunately comcifs is heavily weighted to the Anglo-saxon world,
D> which makes communication easier, but tends to narrow our
D> viewpoint.  We need to be sure to draw on the expertise in space
D> group theory that is available in Germany.  For a start, I propose
D> that we open discussions with the Commission on International
D> Tables.  The working group should be small but should have people
D> on it who are experienced in space group theory and its modern
D> extensions as well as those who are familiar with computers and
D> star files.  There is some urgency about this because we are
D> already using symmetry fields in cifs that are not ideal.  The
D> sooner we get the big guns onto this problem, the sooner we will
D> have a representation of symmetry that will be robust to future
D> developments.
D> 
D>      This is an important matter, and we should consult with
D> comcifs before making the next move.  When we get their comments I
D> would propose to contact Theo Hahn to seek the views of his
D> commission.
D> 
D>      In the meantime, we should thank Don Ward for raising the
D> topic and warn him that he will have to wait a few years before we
D> will have a cif dictionary that would be suitable for his book!


That's all for this time!

Regards
Brian