Discussion List Archives

[Date Prev][Date Next][Date Index]

(29) Symmetry, R factors again, SIF, final DDL version 1.4

  • To: COMCIFS@uk.ac.iucr
  • Subject: (29) Symmetry, R factors again, SIF, final DDL version 1.4
  • From: Brian McMahon <bm>
  • Date: Tue, 24 Jan 95 16:40:11 GMT
Dear Colleagues

Once more I must apologise for a lengthy break in services. Maintaining
COMCIFS communications on a more regular basis has been one of my New Year
resolutions, so let us see what happens!

The Executive Committee has approved the appointment of Paul Edgington as
COMCIFS member in place of Frank Allen. Welcome, Paul. Paul is a programmer
at the Cambridge Crystallographic Data Centre, and is well placed to liaise
with Frank in an effort to maintain compatibility between CIF and MIF
developments. Paul has been active in the development of CCDC software for
importing CIFs, and is the author of the program we have long used in Chester
to check CIF data names and values against the original Core dictionary.
His e-mail address is pre10@chemcrys.cam.ac.uk.

Frank will continue to receive copies of our discussions, at least for the
time being, so that we may yet benefit further from his wisdom in
small(ish)-molecule and chemical applications.

In my next circular, I shall introduce (very belatedly) the new DDL worked
out by John Westbrook with Syd Hall and Nick Spadaccini following last
October's meeting in Brussels, and we can begin to discuss what implications
this has for the future development of the dictionaries.

Continuing discussions
======================

D26.6 Symmetry dictionary
-------------------------

David has reminded me that your comments were invited on commissioning a
dictionary for the representation of symmetry. Some background to this was
outlined in circular 26, and a sketch of one approach from Syd was included
then. David is proposing to contact Theo Hahn (as Chairman of the
International Tables Commission) to discuss how to proceeed, and Donald Ward,
whose work on tables of Patterson peak positions sparked off this thread, has
also indicated that he would be interested in working on this project. Have
we any volunteers to assist?

D28.2 R factors
---------------

D> _pd_proc_ls_prof_  I have only a couple of small points:
D> *_wR_factor:
D> I assume that this item is the one on which refinement is based.  This should
D> be stated explicitly in the definition.  *_R_factor is not the basis of
D> refinement, but is presumably a commonly quoted number and useful for 
D> comparisons.  See my comments below.
D> *_wR_expected: 
D> In my printed copy this definition is ungrammatical.  This may be because 
D> something has got lost in transmission.  Please check the definition.
D> 
D> _proc_ls_I_R_factor:  I favour putting this in core with the proviso that we
D> make the definitions clear that *_wR_factor is the basis for refinements
D> (on F F**2 or I as the case may be) and that *_I_R_factor is only 
D> along here for the ride.  From Brian T's correspondence, I am surprised 
D> that he has not included _proc_ls_F**2_R_factor as a term, 
D> since this seems to be what Larson wants.  
D> I would be in favour of including this term also in core with the
D> same caveat in the definitions.  That would give us one R factor (wR) that
D> is the basis of refinement (when the refinement is based on individual
D> reflections - and only to be used in this case) and then a series of R
D> factors based on F, F**2 and I that can be used for whatever purpose the
D> author thinks important.  We thus distinguish clearly between the R factor
D> that is the basis of refinement, and other kinds of R factors that might 
D> be deemed of interest.  I do not think this distinction is as explicit
D> as it should be in the current core.

D28.3 wR(obs)
-------------

I owe George an apology for my slanderous suggestion that SHELXL was not
employing the CIF data names for R factors correctly!

G> I had not realized that there was a misunderstanding here; it might have
G> been better if I had been consulted directly at an earlier stage.
G> 
G> In the example in question, there were 3704 unique reflections measured.
G> For the REFINEMENT, 3 reflections with very negative F-squared values
G> were ignored, so the refinement was based on 3701 unique data.  Because
G> of the limitations of the CIF definitions I have to explain this in
G> _refine_special_details.  However _wR_all (and indeed all quantities
G> ending in _all) were calculated using all 3704 because the CIF definitions
G> clearly require this.  It is however extremely useful to have an R value
G> based only on 'observed' data for comparison with structures published in
G> Acta before the glorious CIF revolution, so all the R-values etc. ending in
G> _obs are calculated using only reflections which have I>2sigma(I) (which
G> automatically excludes also the 3 with unreasonably negative F-squared), 
G> i.e. 3385 reflections (the value given for _reflns_number_observed).
G> 
G> The 3 abberant reflections are a problem for CIF.  This explains Mario's
G> excellent suggestion for an extra CIF item to specify the number actually
G> used in the refinement (here 3701) or the number ignored (here 3).  The 
G> reason I had to put this number in a text section is that I was unable to
G> persuade Syd to adapt the CIFDIC.C91 definitions to be more appropriate to
G> the way SHELXL-93 performs refinements.  This is not to be understood as
G> an unfriendly comment; as you know I can be just as obstinate as Syd !
G> A further consequence of the messy compromise between the XTAL and SHELX
G> philosophies is the use of F-squared for both _wR_ terms (which I would
G> have liked to define as _wR2_) and F for _R_ terms (which I would have
G> liked to have called _R1_).  This was because of the necessity of giving
G> the quantity minimized (well, almost) as well as a number (R1_obs) to compare
G> with the large majority of previously reported structures.  I note in
G> passing that ciftex still seems to have problems in working this out despite
G> checking _audit_creation_method.  It is a pity that the current CIF is so
G> restricted in this area, and I would welcome the introduction of further
G> terms which might be very useful for neutrons and/or powders; I'm afraid
G> it is too late to redefine the terms we already have.

David's comment (below) arises in part from my disinformation, but I
reprint it for completeness, and to emphasise the general concern that we
should resolve the potential ambiguities which may appear to the user of the
CIF dictionary who is not fully alert.

D> 	I could heartily wish that 'unobserved' reflections had disappeared
D> with films.  There is no need for, or significance in, arbitrarily
D> designating some reflections as observed and others as unobserved.
D> 	In the spirit of the previous definitions there should be only one
D> *_wR_factor, and that is the one that includes all the reflections used in
D> the refinement weighted as they were in the refinement.  If 3 reflections
D> were omitted because they were considered to be unreliable, they should be
D> deleted from the dataset of reflections used in the refinement, and no one
D> need know anything about them (maybe some comment under special details might
D> be appropriate, but if their observed structure factors are not reliable
D> we do not need to know what they are).  If it necessary to make a distinction
D> between R factors with 'observed' and those with 'all' reflections, this
D> should be confined to *_R_factor, *_F**2_R_factor or *_I_R_factor according
D> to the author's whim.
D> 	I suppose therefore that I am arguing that SHELX is giving the wrong
D> value for the wR factor (though in practice this is not a serious matter). 
D> Could we propose a field *_wR_factor (with no '_something_else') and try to 
D> retire the existing definitions as meaningless?  George is well ahead of the
D> field here, and in the days when refinements typically omitted 'unobserved'
D> reflections his interpretation would be the correct way to go.  But he is
D> forcing everyone into new and better ways.  We could
D> keep the '_obs' and '_all' definitions for archival purposes and to be
D> generated as long as existing software was in use, but encourage all new 
D> software to omit the final segment of the name.

I would support the proposal to introduce _refine_ls_wR_factor as the weighted
R factor (on F, F^2^ or I according to _refine_ls_structure_factor_coef)
calculated in the refinement, using all the reflections employed in the refine-
ment. Is the number of such reflections that given by _refine_ls_number_reflns
("number of reflections contributing to least-squares derivatives"), or is
that data name already used for something subtly different?

New matters
===========
D29.1 Structure Information File (SIF)
--------------------------------------
For the benefit of those who have not seen any of this correspondence, I
reproduce below some extracts from various communications of last October
discussing the adoption of the phrase 'SIF' (for 'structure information
file') to include exchange file formats based on the CIF model that might be
adopted in related disciplines (such as the NMR initiative). I can make
available more of this to anyone who wishes it. The suggestion was aired by
Keith Watenpaugh, and attracted a significant amount of comment (much of it
from COMCIFS members and consultants). Here is an extract from an e-mail from
Keith, which summarises some of the responses he obtained, as well as his own
views:

KW> Following are some of the comments that have come through concerning my
KW> suggestion of a minimal change from "CIF" to "SIF".
KW> ----------------
KW> Comment 'I tried to fight the battle against CIF a long time ago (but for
KW> different reasons) and was told that the name was already long carved in
KW> stone.  In fact the NMRif will probably not be part of CIF and will be
KW> administered differently - though there will probably be an attempt to 
KW> maintain compatibility.  There is a MIF (molecular information file) that
KW> is chemical and has nothing to do with CIF (different basic rules).
KW> I wish you well on SIF but I am not holding my breath!'
KW> -----------------
KW> Comment 'Hi Keith: I agree about the name change..'
KW> ------------------
KW> Comment 'We have gone way too far in this to consider a name change.
KW> The "C" in CIF is appropriate -- for crystallographic.'
KW> 
KW> My guess the problem is that the NMR folks (etc.) are being encouraged to
KW> write their data in CIF files. This is not correct. In fact they are being
KW> asked to create a dictionary that is compatible with CIF and write STAR
KW> files that are compatible with CIF. If the format is called nmrIF and is
KW> merely compatible with CIF, why should anyone be uncomfortable? If there
KW> are any shared dictionary definitions, put them in a dictionary and call
KW> that SIF. 
KW> 
KW> Remember that if it ain't using the core dictionary, it ain't CIF. If it
KW> is using a significant portion of the core, it is almost certainly
KW> crystallographic and not merely structural. 
KW> ---------------------
KW> Comment 'I like the idea of a SIF. I would like a coreSIF with names
KW> and definitions for data that are common to a variety of structural 
KW> techniques.  These would include a description of the molecular entity
KW> studied, sample content, sample conditions (pH, temperature, etc.),
KW> citation data, etc.  Dictionary developers would have a substantial
KW> core to begin construction of technique-specific data tags.  I think this
KW> would improve the consistency between dictionaries and reduce the
KW> proliferation of data tags with identical definitions or worse identical
KW> data tags with different definitions.'
KW> ----------------------
KW> Comment 'I see no problems with creating SIF as long as:
KW>   1) Entries in the CIF core are not changed, if they are selected 
KW>         for inclusion into the SIF dictionary.
KW>   2) Additional entries added to the SIF dictionary may not conflict 
KW>         with existing CIF entries.
KW>   3) Addition of new SIF entries requires approval of a new
KW>         multi-disciplinary group as well as COMCIFS.
KW> 
KW> If this is done then SIF becomes the "structural but non-crystallographic
KW> core" of the CIF core, plus those many materials properties that
KW> crystallographers usually don't concern themselves about (folks, I have it
KW> on good authority that there are other states of matter than solid). SIF
KW> can define entries that do not exist in CIF and will not need to carry all
KW> of the crystallographic "baggage" that is not of interest to
KW> non-crystallographers. Thus, SIF and CIF will be able to coexist without
KW> conflicts. Most importantly of all, CIF can remain CIF. 
KW> ---------------------
KW> Dear All,
KW> Now my two-bits worth:
KW> 
KW> It seems that ALL the people that are involved in biological (mm) CIF
KW> that responded, would like such a change. This is mainly because they are
KW> involved in working with structural data from crystallography,
KW> n-Dimensional NMR, and a variety of modeling communities. Programs such
KW> as X_PLOR are used by both communities. The Protein Data Bank (PDB/MSD)
KW> will archive and distribute data in the mmC(S)IF format, whether the
KW> data comes from dynamic modeling, NMR or crystallography. Maybe powder
KW> diffraction database do not have to deal with these different communities 
KW> but the biological macromolecule people do and cannot ignore them. We
KW> have fought a long battle against the statement "The community will not
KW> accept a change in the PDB format." It was far more entrenched than the
KW> CIF is! We don't even have a standard for most of the CIF definitions
KW> and the DDL is changing as we speak. The change from CIF to SIF is
KW> really minimal. There really needs to be additional enhancement to the
KW> coreC(S)IF that incorporates definitions from the mmCIF and that are
KW> generally applicable to molecular structure information, both
KW> experimental and in describing it structurally. The worst thing that
KW> could happen would be to have a SIF and CIF coexist and have separate 
KW> committees developing cores. I could go on, but I'll stop for now. I
KW> am both surprised and happy to see so much response. 

I haven't attributed the 'Comments' included in Keith's letter (where I know
the identities), because I think it sufficient that we see and recognise that
a fair spread of opinions exists. David responded to Keith at some length, as
COMCIFS Chairman but without implying the full endorsement of the Committee.
I reproduce his response below, and must say that I am in full agreement. I
do not expect any strong disagreement with this approach, and I have included
this item largely for the record. However, as always, further thoughts are
welcome.

D> As I understand your suggestion you would like to replace CIF by
D> SIF and expand its scope so as to include other forms of structure
D> determination besides crystallography.  However, some of your more
D> enthusiastic supporters seem to have a different interpretation of your
D> suggestion, namely to establish a SIF as a complement to CIF. I do not
D> agree with the first of these suggestions for reasons that I will give,
D> but the second has some merit and is worth considering.
D> 
D>         The CIF standard was established by the IUCr (and is copyright by
D> them) very specifically for providing a format for the interchange of
D> crystallographic information.  While some of this certainly deals with
D> structure, there is much that does not.  Almost all of the powder work
D> concerns itself with x-ray diffraction and crystallography (unit cells, 
D> space groups) and not structure.  Structure is only one aspect of the
D> information given in the core CIF dictionary.  Unit cells, space groups,
D> symmetry operators, structure factors are all non-structural elements, and
D> even the structural information such as atomic coordinates is very firmly
D> anchored in crystallography.  The coordinates can only be interpreted in
D> terms of the unit cell and space group.  Thus the information that lies
D> at the heart of a CIF is primarily crystallographic and not structural.
D> 
D>         Secondly, the CIF standard is the property of the IUCr, and it is
D> appropriate that it remains primarily crystallographic in its orientation.
D> The NMRIF will necessarily have some area of overlap with CIF in those
D> items that describe structure, as would many other kinds of IFs that one
D> might care to invent.  But NMRIF will be oriented to the technique of 
D> NMR and must be administered by people in that field.  Further, the way
D> that the NMRIF will describe structure (in terms of interatomic interactions)
D> is essentially different from the way that crystallographers describe
D> structure, so that these two different approaches to the description of
D> structure are both needed but are not directly compatible.
D> 
D>         Clearly, however, there are some aspects of all IFs that describe
D> molecular structure that are common - things like the description of
D> interatomic geometries and atomic properties.  Perhaps one can also include
D> a number of chemical items related to the characterisation of the material
D> whose structure is being described.  So it is indeed highly desirable for
D> these items to be defined in compatible ways in all the different IFs and
D> therefore there is plenty of scope for coordination of the work of comcifs
D> and comnmrifs and any other such organisation.  But there are some real
D> difficulties.  CIFs use the STAR structure, but have adopted additional
D> restrictions (e.g. only one level of loop) that other IFs may not be willing
D> to adopt, thereby preventing the free mixing of different dictionaries in
D> creating a file or, at the very least, preventing software designed for
D> reading CIFs from reading other files where these restrictions do not apply.
D> 
D>         It does not seem to practical to expand CIF to include all structural
D> information, even under a different name.  It is essential to have experts
D> in the field responsible for the file definition, and we do not have experts
D> in all structural field available.  Further, we are having quite enough
D> difficulty in defining a standard that will apply to all fields of 
D> crystallography without trying to take in other fields of which some of
D> us have no experience at all.  The best that we can hope is to have some
D> form of coordination between the different IFs that deal with crystal and
D> molecular structure (or perhaps just molecular structure).  I do not have
D> a clear sense of how this can be done, but as a first step we have invited
D> Eldon to act as a consultant to comcifs, so that he knows what we are doing
D> and can keep us informed of what he is doing.  In the longer term, some form
D> of comsifs drawn from members of committees like comcifs might be useful.
D> It would then be responsible for ensuring that any data items relating
D> to structure per se would be the same in all files, though I do not envy it
D> its job.  At the moment it is too early to establish such an organisation,
D> nor is it clear what its terms of reference should be or whether there
D> would be a virtue in defining a coreSIF that would be common to the various
D> IFs.  At the moment, since CIF is the only one that exists, there is nothing
D> to coordinate!  
D> 
D>         We should certainly keep an open mind on how we should organise a
D> growing interest in STAR files for the description of molecular structure,
D> and it is worth discussing the matter, but it is too early to start taking
D> definite actions.
D> 
D>         I can understand your interest in having a uniform structure for
D> describing molecular structure, but we cannot divorce the description
D> from the technique used to provide the description.  A mere change of name
D> of CIF to SIF is clearly not appropriate for many crystallographic 
D> applications, and applications in fields other than crystallography need 
D> to include much more information on technique than just structure. 

I wish only to reiterate that the CIF descriptions that we are developing
provide support for crystallography, and may well overlap into other fields
of science. We need to ensure that the core definitions, at least, allow
maximum overlap, and this may have implications for the formalism we adopt to
define them.


D29.2 DDL version 1.4 in press
------------------------------
Syd has at last submitted the DDL description paper to Journal of Chemical
Information and Computer Science, and so we may take the DDL in that paper
as the canonical version for the formalism we have been discussing intensely
over the last couple of years. I shall append this version to the current
circular. Most of the contents will be familiar; the major difference between
this version and the version 1.3 which many of you will have studied earlier
is that the "_include_file" provision has been backed out. This will have
implications on the way in which the hierarchy of dictionaries is
constructed. I understand that there is still some intention to have the
facility to include files, but by way of a preprocessor directive to a
so-called 'semantically-void object loader', rather than through a term
defined within the DDL. I'd be grateful if Syd would report on any
developments of this idea that he may know about.


Best wishes, and greetings for the New Year!
Brian

##############################################################################
#                                                                            #
#                             DDL CORE DICTIONARY                            #
#                                                                            #
##############################################################################


data_on_this_dictionary

    _dictionary_name            ddl_core.dic
    _dictionary_version         1.4 
    _dictionary_update          1994-11-16
    _dictionary_history
;
  1991-03-08 "Implementing SMD in STAR: Dictionary Definition Language"  
                 A F P Cook, ORAC Ltd., 8 March 1991. AFPC
  1991-06-25  Adjustments and refinement for CIF applications. SRH
  1991-09-02  Further refinements prior to "cifdic.c91". SRH
  1993-05-10  Additions arising from discussions with Phil Bourne, 
                 Tony Cook, Brian McMahon. SRH
  1993-05-11  Further adjustments and Cyclops tests. SRH
  1993-05-14  Proposed additional changes. PEB
  1993-05-17  Further adjustments. SRH
  1993-06-01  Refinements and additions. SRH
  1993-07-19  Some tidying up. SRH
  1993-08-10  Final checks before Beijing. SRH
  1993-12-12  Following the Cambridge meeting with FHA and AFPC. SRH
  1993-12-16  Following discussions with Brian McMahon in Chester. SRH
  1993-12-17  Further adjustments. SRH
  1994-02-18  Add _include_file provisions. SRH
  1994-08-08  Install _type_construct definitions and apply. SRH
  1994-08-24  Adjustments following Brian McMahon's comments. SRH
  1994-11-16  Changes following Brussels workshop. SRH
;


global_            

    _list                        no
    _list_mandatory              no
    _list_level                  1
    _type_conditions             none
    _type_construct              .+




data_category
    _definition
;              Character string which identifies the natural grouping of data
               items to which the specified data item belongs. If the data
               item belongs in a looped list then it must be grouped only with
               items from the same category, but there may be more than one
               looped list of the same category provided that each loop has its
               own independent reference item (see _list_reference).
;
    _name                      '_category'
    _category                    category
    _type                        char


data_definition   
    _definition                 
;              The text description of the defined item.
;
    _name                      '_definition'
    _category                    definition
    _type                        char


data_dictionary_history
    _definition
;              A chronological record of the changes to the dictionary file
               containing the definition. Normally this item is stored in the
               separate data block labelled data_on_this_dictionary.
;
    _name                      '_dictionary_history'
    _category                    dictionary
    _type                        char


data_dictionary_name          
    _definition
;              The name string which identifies the generic identity of
               dictionary. The standard construction for these names is
                       <application code>_<dictionary version>.dic
               Normally this item is stored in the separate data block
               labelled data_on_this_dictionary.
;
    _name                      '_dictionary_name'
    _category                    dictionary
    _type                        char
    loop_ _example               ddl_core.dic  cif_mm_core.dic


data_dictionary_update        
    _definition
;              The date that the dictionary was last updated.
               Normally this item is stored in the separate data block
               labelled data_on_this_dictionary.
;
    _name                      '_dictionary_update'
    _category                    dictionary
    _type                        char
    _type_construct             
                   (_chronology_year)-(_chronology_month)-(_chronology_day) 


data_dictionary_version       
    _definition
;              The dictionary version number. Version numbers cannot decrease
               with updates. Normally this item is stored in the separate data
               block labelled data_on_this_dictionary.
;
    _name                      '_dictionary_version'    
    _category                    dictionary
    _type                        numb


data_enumeration          
    _definition
;              Permitted value(s) for the defined item.
;
    _name                      '_enumeration'
    _category                    enumeration
    _type                        char
    _list                        both
    _list_mandatory              yes


data_enumeration_default  
    _definition
;              The default value for the defined item if it is not specified
               explicitly. If a data value is not declared the default is 
               assumed to be the "most-likely" or "natural" value.
;
    _name                      '_enumeration_default'
    _category                    enumeration_default
    _type                        char


data_enumeration_detail   
    _definition
;              A description of a permitted value(s) for the defined item, as
               identified by _enumeration.
;
    _name                      '_enumeration_detail'
    _category                    enumeration
    _type                        char
    _list                        both
    _list_reference            '_enumeration'


data_enumeration_range    
    _definition
;              The range of values permitted for a defined item. This can 
               apply to 'numb' or 'char' items which have a preordained 
               sequence (e.g. numbers or alphabetic characters).
               If 'max' is omitted then the item can have any permitted 
               value greater than or equal to 'min'.
;
    _name                      '_enumeration_range'  
    _category                    enumeration_range
    _type                        char
    _type_construct            (_sequence_minimum):((_sequence_maximum)?)
    loop_ _example                -4:10   a:z    B:R   0:


data_example
    _definition
;              An example value of the defined item.
;
    _name                      '_example'
    _category                    example
    _type                        char
    _list                        both
    _list_mandatory              yes


data_example_detail
    _definition
;              A description of an example value for the defined item.
;
    _name                      '_example_detail'       
    _category                    example
    _type                        char
    _list                        both
    _list_reference            '_example'       


data_list 
    _definition
;              Signals if the defined item is declared in a looped list.
;
    _name                      '_list'          
    _category                    list
    _type                        char
    loop_ _enumeration
          _enumeration_detail   yes   'can only be declared in a looped list'
                                no    'cannot be declared in a looped list'
                                both  'declaration in a looped list optional'
    _enumeration_default        no


data_list_level          
    _definition
;              Specifies the level of the loop structure in which a defined
               item, with the attribute _list 'yes' or 'both', must be declared.
;
    _name                      '_list_level'
    _category                    list
    _type                        numb
    _enumeration_range           1:
    _enumeration_default         1 


data_list_link_child
    _definition
;              Identifies data item(s) by name which must have a value which
               matches that of the defined item. These items are referred to
               as "child" references because they depend on the existence 
               of the defined item.
;
    _name                      '_list_link_child'
    _category                    list_link_child
    _type                        char
    _list                        both


data_list_link_parent
    _definition
;              Identifies a data item by name which must have a value which
               matches that of the defined item, and which must be present in
               the same data block as the defined item. This provides for a 
               reference to the "parent" data item.
;
    _name                      '_list_link_parent'
    _category                    list_link_parent
    _type                        char
    _list                        both


data_list_mandatory    
    _definition
;               Signals if the defined item must be present in the loop 
                structure containing other items of the designated _category. 
                This property is transferrable to another data item which is
                identified by _related_item and has _related_function set as
                'alternate'.
;                
    _name                      '_list_mandatory'            
    _category                    list
    _type                        char
    loop_ _enumeration
          _enumeration_detail  
                           yes  'required item in this category of looped list'
                           no   'optional item in this category of looped list'
    _enumeration_default         no


data_list_reference 
    _definition
;              Identifies the data item, or items, which must be present
               (collectively) in a looped list with the defined data item 
               in order that the loop structure is valid. The data item(s)
               identified by _list_reference provides a unique access code 
               to each loop packet. Note that this property may be trans-
               ferred to another item with '_related_function alternate'.
;            
    _name                      '_list_reference'          
    _category                    list_reference
    _type                        char
    _list                        both


data_list_uniqueness
    _definition
;              Identifies data items which, collectively, must have a unique
               values for the loop structure of the designated _category items
               to be deemed valid. This attribute is specified in the 
               definition of a data item th _list_mandatory set to 'yes'.
;
    _name                      '_list_uniqueness'
    _category                    list_uniqueness
    _type                        char
    _list                        both


data_name   
    _definition
;              The data name(s) of the defined item(s). If data items are 
               closely related, or represent an irreducible set, their names 
               may be declared as a looped sequence in the same definition.
;
    _name                      '_name'
    _category                    name
    _type                        char
    _list                        both
    loop_ _example             '_atom_site_label'
                               '_atom_attach_all   _atom_attach_ring'
                               '_index_h   _index_k   _index_l'
                               '_matrix_11 _matrix_12 _matrix_21 _matrix_22'


data_related_item
    _definition
;              Identifies data item(s) which have a classified relationship
               to the defined data item. The nature of this relationship is 
               specified by _related_function.
;
    _name                      '_related_item'
    _category                    related
    _type                        char
    _list                        both
    _list_mandatory              yes


data_related_function
    _definition
;              Specifies the relationship between the defined item and the
               item specified by _related_item. The following classifications
               are recognised.

               'alternate' signals that the item referred to in _related_item
               has attributes that permit it to be used alternately to the
               defined item for validation purposes.

               'convention' signals that the item referred to in _related_item
               is equivalent to the defined item except for a predefined
               convention which requires a different _enumeration set.

               'conversion' signals that the item referred to in _related_item
               is equivalent to the defined item except that different scaling
               or conversion factors are applied.

               'replace' signals that the item referred to in _related_item
               may be used identically to replace the defined item.
;
    _name                      '_related_function'
    _category                    related
    _type                        char
    _list                        yes 
    _list_reference            '_related_item'   
    loop_ _enumeration
          _enumeration_detail
                         alternate  'used alternatively for validation tests'
                         convention 'equivalent except for defined convention'
                         conversion 'equivalent except for conversion factor'
                         replace    'new definition replaces the current one'


data_type   
    _definition
;              The type specification of the defined item.

               Type 'numb' identifies items which must have values that are
               identifiable numbers. The acceptable syntax for these numbers
               is application dependent, but the formats illustrated by the
               following identical numbers are considered to be interchangeable.
               42   42.000  0.42E2  .42E+2  4.2E1  420000D-4  0.0000042D+07

               Type 'char' identifies items which need not be interpretable
               numbers. The specification of these items must comply with the
               STAR syntax specification of either a 'contiguous single line
               string' bounded by blanks or blank-quotes, or a 'text string'
               bounded by semi-colons as first character of a line.

               Type 'null' identifies items which appear in the dictionary
               for data definition and descriptive purposes. These items
               serve no function outside of the dictionary files.
;
    _name                      '_type'
    _category                    type
    _type                        char
    loop_ _enumeration
          _enumeration_detail   numb  'numberically-interpretable string'
                                char  'character or text string'
                                null  'for dictionary purposes only'


data_type_conditions   
    _definition
;              Codes defining conditions on the _type specification.

               'esd' permits a number string to contain an appended standard
               deviation number enclosed within parentheses. E.g. 4.37(5)

               'seq' permits data to be declared as a sequence of values
               separated by a comma <,> or a colon <:>.
                  * The sequence v1,v2,v3,. signals that v1, v2, v3, etc.
                    are alternative values.
                  * The sequence v1:v2 signals that v1 and v2 are the boundary
                    values of a continuous range of values satisfying the
                    requirements of _enumeration for the defined item.
               Combinations of alternate and range sequences are permitted.
;
    _name                      '_type_conditions'
    _category                    type_conditions
    _type                        char
    _list                        both
    loop_ _enumeration
          _enumeration_detail 
                            
                       none   'no extra conditions apply to the defined _type'
                       esd    'numbers *may* have esd's appended within ()'
                       seq    'data may be declared as a permitted sequence'


data_type_construct    
    _definition
;              String of characters specifying the construction of the data
               value for the defined data item. The construction is composed
               of two entities:
                  (1) data names
                  (2) construction characters
               The rules of construction conform to the the regular expression
               (REGEX) specificatiopns detailed in the IEEE document P1003.2
               Draft 11.2 Sept 1991 (ftp file '/doc/POSIX/1003.2/p121-140').
;
    _name                      '_type_construct'   
    _category                    type_construct
    _type                        char
    _example                   (_year)-(_month)-(_day)
    _example_detail            'a typical construction for _date'


data_units_extension
    _definition
;              A code, starting with an underscore, that may appear as an
               extension to the name of the defined data item. This code 
               signals a change to default UNITS assumed for a number value.
               Information on the new units is given by _units_description,
               and the conversion factor given by _units conversion.
;
    _name                      '_units_extension'
    _category                    units
    _type                        char
    _list                        yes
    _list_mandatory              yes


data_units_description
    _definition
;              A description of the numerical units applicable to the defined
               item when the data name has an extension code specified by 
               _units_extension.
;
    _name                      '_units_description'
    _category                    units
    _type                        char
    _list                        yes
    _list_reference            '_units_extension'


data_units_conversion
    _definition
;              The arithmetic expression for converting the defined item with
               an extension code specified by _units_extension into the default
               units.
               The format of the expression is <operator><number>.
                   Permissible <operator> codes are:
                         *  multiply
                         /  divide
                         +  add
                         -  subtract
               To convert a declared value into the default units:
               [value in default units] = [entered value]<operator><number>
;
    _name                      '_units_conversion'
    _category                    units
    _type                        char
    _type_construct            [\*\/\+\-]<number>   ###<<<<< check this
    _list                        yes
    _list_reference            '_units_extension'
 
#-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof-eof