(45) Further discussions on the submitted Core

To: [email protected]
Subject: (45) Further discussions on the submitted Core
From: bm
Date: Fri, 19 Jul 1996 12:01:20 +0100
Dear Colleagues

Warm thanks to Gotzon Madariaga, who has supplied a substantial list of
comments that I address below. I have not yet had time to implement the
changes required (but have tried to indicate what action I shall take);
but I hope to have a revised draft available for inspection in the near
future. In the notes below, "D>" is David Brown, "B>" Brian Toby, "G>"
Gotzon, and "S>" Syd.

The specific points on which I would appreciate further
advice/discussion/instruction are 44.3, 44.6, 44.13, 45.3, 45.4, 45.7, 45.8,
45.12, 45.13, 45.14, 45.16, 45.22, 45.23 and 45.28 (but this doesn't mean
you're not required to read the rest!).


D44.1 _atom_site_aniso_B_*
--------------------------
D>      If we decide that these items should be included in the core,
D> we should add the following to the definitions:
D> 'The IUCr Commission on Nomenclature recommends against the use of
D> B for reporting atomic displacement parameters.  U, being directly
D> proportional to B, is preferred.'

Yes, I'm happy to add this wording to the definitions. Is there consensus
on this?

G> _atom_site_aniso_B_*, _atom_site_aniso_U_*, _atom_site_B_iso_or_equiv,
G> _atom_site_U_iso_or_equiv. If all of them have to be preserved,
G> '_related_function constant' must be changed, according to the DDL, to
G> '_related_function conversion'.

Quite right - my apologies. 


D44.2 _atom_site_disorder_*
---------------------------
D>      I have had another look at the definitions and still cannot
D> make sense out of *_assembly.  Brian M's example clarifies the
D> intent, but I am not sure that the example is the way I would have
D> interpreted the definition.  What about the following definitions:
D> 
D> *_assembly
D> A code which identifies a cluster atoms which show long range
D> positional disorder but which are locally ordered.  Within each
D> such cluster of atoms, _atom_site_disorder_group is used to
D> identify the sites that are simultaneously occupied.  This field is
D> only needed if there is more than one cluster of disordered atoms
D> showing independent local order.
D> 
D> *_group
D> A code that identifies a group of positionally disordered atom
D> sites that are locally simultaneously occupied.  Atoms that are
D> positionally disordered over two or more sites (e.g. the H atoms of
D> a methyl group that exists in two orientations) can be assigned to
D> two or more groups.  Sites belonging to the same group are
D> simultaneously occupied, but those belonging to different groups
D> are not.  The code "-1" is used to identify sites that occur only
D> once in the unit cell, i.e. operations of the space group symmetry,
D> other than pure translation, should not be applied to this site.
D> 
D> I am not sure if these definitions are correct.  I have some
D> questions about the use of -1 as I am not sure of the context in
D> which it might be used.  Firstly, am I right in assuming that all
D> pure translational symmetry elements (including lattice centering
D> translations) should be applied to these sites?  Note that my
D> definition is not only ambiguous by inconsistent.  What about glide
D> planes and screw axes which are used to transform one molecule to
D> another?  It seems strange that molecules in the same (primitive?)
D> cell should have these sites individually identified, but not those
D> in the next cell.  Secondly, is it not possible that there may be
D> sites in more than one group that need to be so identified?  Should
D> we not just use the '-' sign as an indicator that space group
D> symmetry is to be switched off?  I assume that John Davies knows
D> what is happening here.  I would feel a lot happier if he were
D> given the opportunity to comment on these questions. (I could
D> contact him directly if you like).

I've written to John, asking for his opinions on this, and will report
on his comments.

G> _atom_site_disorder_*. I have had many difficulties to understand the
G> meaning of _atom_site_disorder_* before reading the example you included in 
G> D44.2. Perhaps to include it as example in the dictionary would be a good
G> idea. In _atom_site_disorder_group the code '-1' has a very specific
G> meaning that is only stressed in the definition. I would prefer it were
G> in a _enumeration list but I guess the difficulties since
G> _atom_site_disorder_group is a very open code. I really have no alternative.

Yes, an example would be useful in the dictionary when we've ironed out
the problems with this. I like David's idea of using the "-" prefix as a
general indicator that symmetry should not be applied to the disordered
site.

D44.3 _atom_site_scat_versus_stol_list
--------------------------------------
G> I do not like very much the definition as it is. Perhaps it could be
G> complemented with other data items. For instance
G> 
G> loop_ 
G> _atom_type_symbol 
G> _atom_type_scat_stol_min 
G> _atom_type_scat_stol_max
G> _atom_type_scat_stol_step
G> 
G> Se 0. 0.8 0.03 ...
G> 
G> Then the meaning of _atom_site_scat_versus_stol_list would be clearer.
G> However, I guess problems because with the above solution scattering
G> factor lists not regularly spaced (are there?) are not allowed. Anyway
G> I do not agree with David's proposal. Interpretable texts, given they
G> cannot be strictly fixed, go in the opposite direction of
G> 'intelligent machines'.

D44.5 _audit_contact_author_fax etc.
------------------------------------
D>      We seem to have got ourselves into a bind here by not getting
D> our first definition right.  I still maintain that we should adopt
D> a single convention or none at all.  If two conventions are to be
D> adopted, then the convention should be readily recognisable on
D> parsing the string.  The use of parentheses in two different senses
D> is rather unfortunate.  
D> 
D>      Clearly the World Directory convention is better thought out
D> and I would suggest we go for that one.  However, since a different
D> convention has been used in _publ_contact_author_fax, we need to be
D> able to accept this as well.  I would prefer a definition such as:
D> 'The recommended style includes the international dialing prefix,
D> the area code in parenthesis followed by the local number with no
D> spaces.'
D> This does not make the style manditory, but is likely to result in
D> its widespread adoption.  Under _publ_contact_author_fax we should
D> give the same definition followed by: 'The earlier convention of
D> including the international dialing prefix in parentheses is no
D> longer recommended'.
D> Clearly we cannot undo what has been done.  This definition would
D> start us off on a new road, while leaving a trail for those who may
D> have to deal with old cifs.

I agree to the suggested rewordings.

D44.6 _cell_* and _cell_measurement_*
--------------------------------------
D>      Comcifs has never formally approved the detailed assignement
D> of categories, since categories are not mentioned in the Hall,
D> Allen and Brown paper which is the only approved cif dictionary. 
D> It is therefore not too late to change the categories.  I would
D> recommend that we combine the cell and cell_measurement categories. 
D> Only failing this should we define the pointers which only lead to
D> additional and unnecessary complications.

Anyone unhappy with this? The cell_measurement_refln category is distinct, and 
should have a pointer to individual cells in the new cell category, if
there are indeed going to be cases of multiple cell parameters.


D44.7 _citation_journal_coden_astm   *_cas
------------------------------------------
D>      Since we have already used ASTM in _database_* we should stick
D> to this name.  Drop *_cas.  We are going to have other datanames
D> with obsolete formulations before we are finished (e.g. *_esd).

OK.

D44.11, D44.13 _diffrn_*
------------------------
G> I agree with David comments about _diffrn_radiation_. I have found some 
G> of them redundant and very confusing in their present form. 

B> In 44.13 which _diffrn_radiation_ items are we proposing to remove? We have
B> been bouncing some of these items back and forth between the powder and core
B> dictionary so many times that I don't remember what is where. I beg that we
B> not remove any of the following:
B> 
B> _diffrn_radiation_probe
B> _diffrn_radiation_xray_symbol
B> _diffrn_radiation_xray_target
B> _diffrn_radiation_wavelength_id
B> _diffrn_radiation_wavelength_wt
B> 
B> If removed, it would only delay the pd dictionary as we make yet another
B> revision. Further, these are really core not pd concepts, which is why they
B> were moved the core!

Sorry, I think I was being too casual in saying yes to a removal of "all"
the new items. The ones Brian lists above (except for *_wavelength_id and
*_wavelength_wt, which are already in the published dictionary) were
separately considered and approved some time ago, and I think the
rationale for retaining them remains. I propose we keep this batch in.

B> For the sake of being pedantic, a monochromator is used before the specimen
B> and an analyzer (analyser for some) is used between the specimen and the
B> detector.  Despite this, I differentiated using pre_spec and post-spec_
B> for _pd_ entries.
B> The same ambiguity as David pointed out for the monochromator exists for the
B> filter as well. My solution is to have a new category: _diffrn_detection_ to
B> replace _diffrn_radiation_detector_. With this, one could include items for
B>  x-ray optics that appear after the specimen (filter, monochromators, etc) in
B> that section.

This comes into the category of proposals that require rigorous working
through, and consequently should be considered for the next revision.

B> Having made that comment, I would like to see the following appear somewhere:
B> _diffrn_radiation_detector
B> _diffrn_radiation_detector_dtime
B> _diffrn_radiation_polarisn_ratio
B> Again if not in the core, in the pd dict.

OK, these are existing core definitions, and thus sacrosanct.

B> I like, but am less concerned with
B> _diffrn_radiation_detector_details
B> _diffrn_radiation_source_details
B> _diffrn_radiation_source_type
B> 
B> On the other hand, I don't like _diffrn_radiation_source_target. If it
B> stays, I would prefer _diffrn_radiation_source_size as a new name.
B> 
B> It is OK with me to drop _power_ but if we are going to record generator
B> settings, we should have either voltage and current or voltage and power. 50
B> kV, 35 mA is of more value than 1.75kW as the X-ray spectrum changes with the
B> voltage. There are some in the powder community who really care about
B> recording this information. I consider it a bit more important than the
B> phase of the moon, but not too much.

D>      I can sympathise with Brian T's concerns about dropping the
D> new items in this part of the dictionary.  This is only a temporary
D> dropping.  As you point out, in the future we will be able to add
D> new data items on an incremental basis.  But we do need to think
D> through what we are doing rather carefully.  For example can anyone
D> tell me what is the difference between
D> _diffrn_radiation_detector_type and
D> _diffrn_measurement_device_type?  From the examples it would seem
D> that the former refer to one dimensional, the latter to two
D> dimensional detectors.  If this is really our intent (which I do
D> not think it is) then we should make the distinction clear in the
D> name.  We have to be concerned not only with the radiation
D> (electron, x-rays, neutrons) and the radiation source (reactor,
D> synchrotron, sealed x-ray tube) but also its particular make or
D> name (phillips, NSLS) and characteristics (neutron flux, beam
D> current for synchrotrons, voltage and power for x-ray tubes etc),
D> monochromators, filters and collimation (both before and after the
D> sample) before we even get to describing the diffraction geometry,
D> and the detector (film, scintillation counter, area detector
D> (multwire or CCD)) and all this before we decide on the wavelength
D> (for monochromatic experiments, characteristic or monochromated or
D> both) or other forms of wavelength detection (time of flight,
D> calculation from Laue pattern etc.)  We are going at this like the
D> blind men and the elephant.  We need to broaden our vision to
D> include the current explosion of different techniques and, if
D> possible, leave enough flexibility to include techniques not yet
D> invented.

This is potentially an entirely fruitful line of discussion, but I think
we return to the immediate need to have a consensus on the definitions to
include on this release. Hence I see the current status of the
diffrn_measurement and *_radiation categories thus as follows ("." means
original Core definition, "+" means new and already approved entry, "-"
means candidate entry that is to be dropped for now, though without
prejudice to later deliberations).

+    	_diffrn_measurement_[]
(-    	_diffrn_measurement_details)
.    	_diffrn_measurement_device
(-    	_diffrn_measurement_device_details)
(-    	_diffrn_measurement_device_specific)
(-    	_diffrn_measurement_device_type)
.    	_diffrn_measurement_method
+    	_diffrn_radiation_[]
(-    	_diffrn_radiation_collimation)
.    	_diffrn_radiation_detector
(-    	_diffrn_radiation_detector_details)
.    	_diffrn_radiation_detector_dtime
(-    	_diffrn_radiation_detector_specific)
(-    	_diffrn_radiation_detector_type)
.    	_diffrn_radiation_filter_edge
.    	_diffrn_radiation_inhomogeneity
.    	_diffrn_radiation_monochromator
.    	_diffrn_radiation_polarisn_norm
.    	_diffrn_radiation_polarisn_ratio
+    	_diffrn_radiation_probe
.    	_diffrn_radiation_source
(-    	_diffrn_radiation_source_details)
(-    	_diffrn_radiation_source_power)
(-    	_diffrn_radiation_source_specific)
(-    	_diffrn_radiation_source_target)
(-    	_diffrn_radiation_source_type)
.    	_diffrn_radiation_type
.    	_diffrn_radiation_wavelength
.    	_diffrn_radiation_wavelength_id
.    	_diffrn_radiation_wavelength_wt
+    	_diffrn_radiation_xray_symbol
+    	_diffrn_radiation_xray_target

D> D44.17 _journal_[]
D> ------------------
D>      I recommend that we do not supply definitions for this category
D> at the present time.  If other journals need them, they can ask for
D> them.  There is no pressure, and the definitions should be easy to
D> add when they are required.

OK. 


D> D44.20 _publ_manuscript_incl
> -----------------------------
D>      I would recommend adding the following to the definition of
D> these three data items:
D> 'Although these fields are primarily intended to identify cif data
D> items that the author wishes to include in a published paper, they
D> can also be used to identify data names created so that non-cif
D> items can be included in the publication.'

Agreed.


New threads
===========

D45.1  _type_construct in current version
-----------------------------------------
G> I have read completely the new draft of the Core dictionary and I have some
G> suggestions or comments. I share with David his overall philosophy concerning
G> the 'spirit' of CIF's in the sense that they should be as
G> machine-interpretable as possible. Hence I have two general criticisms.
G> The first one is that _type_construct is never used and, I think, it could
G> be very useful for making initial syntax checking or restricting
G> ambiguities (for example, all the items related to FAX number, telephone
G> number, e-mail address, partial order of chemical symbols in
G> _chemical_formula according the rule 5 of _chemical_formula_[], etc...).

I'm very keen that we should use _type_construct (at least in due course)
- I think I had a hand in its adoption. But I'm not willing to introduce
_type_constructs into this version, because I haven't any software that I
can use to check them out. I'm not sure whether the equivalent
construction has been implemented yet in the DDL2 software tools - last I
heard, some time ago, was that it hadn't. I'm happy to begin composing
_type_construct expressions for subsequent testing the day after this
version is accepted!


Likewise,

G> _atom_site_Wyckoff_symbol. This is a clear example where _type_construct
G> would be useful. Wyckoff letters range from a-z and only in one case
G> (Pmmm) \a (alpha) is needed. Another possibility is to include
G> _enumeration_range but I do not how to handle \a in this case.

Certainly the current definition of _enumeration_range doesn't permit
this; but it could be done by giving _enumeration_range a _type_conditions
value of 'seq' in the DDL dictionary. The use of _type_construct seems
cleaner. Any views on this, Syd?

D45.2 Version of DDL for use in Core
------------------------------------
G> The second one concerns the future DDL version.
G> Will the 2.x be finally assumed for every dictionary?. Should we start
G> rewriting the Core and the extensions under this new (and at least for me)
G> very complicated DDL? If the answer (better, the present feelings of the most
G> involved members of the Commission) is yes, should we start now the migration
G> even if the new Core has to be (again) delayed?

I doubt that this will prove a popular suggestion, but I lay it on the
table for consideration at this stage. Again, my own feelings are that a
migration towards DDL2 is best driven by working applications. The mmCIF
software should be developed to a point where the benefits of DDL2
formalism are extremely clear before we consider extending it as a
requirement for existing dictionaries. Incidentally, a DDL2 version of the
Core already exists, embedded within the mmCIF dictionary, so the
mechanics of making the transition at the dictionary level are not
difficult.

G> In a more restricted context I will now list some remarks about the modified
G> Core.

D45.3 _list yes/no/both/maybe?
------------------------------
G> There is something along the whole dictionary that perhaps is not very
G> compliant with the DDL. The entry '_list yes' signals (according to the DDL)
G> that the defined item 'can only be declared in a looped list'. However in the
G> paper where the official Core was published '_list yes' specifies that the
G> 'data item may be included in a repeated list'. In this last case I
G> understand that '_list yes' applies to items that usually will appear in
G> a loop. I think that this interpretation has been used in the new Core but
G> it is closer to the DDL definition '_list both'. Therefore many of the
G> _atom_site_* items should be corrected or be allways looped even for
G> monoatomic structures. It is more clear in the definition of
G> _symmetry_equiv_pos_as_xyz where it is stated that 'Except
G> for the space group P1, this data item will be repeated in a loop'.

The constraints on looped data are tighter under the published DDL1.4
which we're using for the current Core than they were under the
unpublished DDL for the original dictionary. So '_list  yes' should be
understood in the sense that you emphasise ("can ONLY be declared in a
looped list"). Consequently, _atom_site_* items should indeed be always
looped, even for monatomic structures. It's a consequence of the way we've
defined categories that all members of the same category should have the
same value for _list. It may be that there are categories where we have
assigned "_list  yes" that should better be "_list  both" - I'll scan the
dictionary again for this, and would welcome any suggestions of where we
might have got it wrong.

Your example of _symmetry_equiv_pos_as_xyz is one such case, though it's
an unfortunate one. I would have preferred us to insist on "_list  yes"
and hence require a loop header structure even for P1, but that would
break existing files, and so we should change it to "_list  both".

D45.4 _atom_site_B_equiv_geom
-----------------------------
G> _atom_site_U_equiv_geom. I know that probably David will blame this
G> proposition but if _atom_site_B_* items are retained, should not
G> _atom_site_B_equiv_geom be included for completeness?.

I guess so (gritted teeth all round?). I'll add it unless instructed
otherwise.

D45.5 Identical definitions
---------------------------
G> _atom_sites_Cartn_tran_matrix_ and _atom_sites_Cartn_tran_vector_ have
G> identical definitions. Likewise for _atom_sites_fract_tran_matrix_ and 
G> _atom_sites_fract_tran_vector_.

I'll fix this.

D45.6 Fragmentation of address fields
-------------------------------------
G> To make uniform addresses _audit_author_address should be replaced by several
G> data items like: 
G> 
G> _audit_author_address_dept 
G> _audit_author_address_inst
G> _audit_author_address_street
G> _audit_author_address_postcode 
G> . 
G> . 
G> . 
G> 
G> Perhaps it would be useful for databases, etc.

An interesting point, this one. For a mailing list within a single country,
this is fine. For addresses from different countries and different
cultures, you find (well, Acta does) an astonishing variety of references
to postboxes, bags, buildings, apartments, islands, and entities in
foreign languages that we don't at all understand. One would need 20 or 30
different datanames, plus one for *_address_other_bits. There is a
trade-off between an effective and an ultimate level of data differentiation,
and for most of our current practical purposes, we find an all-in address
field suffices. A separate *_postcode might be useful, though. If we were
starting afresh, I might be persuaded to go into somewhat more detail; as
we already have many files containing an all-in address, I propose we
leave matters as they stand for now.


D45.7 _audit_link_block_code
----------------------------
G> Linked blocks are being used in several dictionary extensions and I will
G> propose not to match this _audit_link_block_code with the arbitrary
G> (and impossible to restrict) string of a data block declaration. Up
G> to now in the extensions the idea introduced by Brian Toby of 'univocally'
G> defined _block_id has been used. This data item (as
G> any other) can be strongly constrained and guarantees the independence of a
G> given data block in the sense that probably there is no other with the same
G> name. Perhaps the goal would be a block structure as follows
G> 
G> data_block1_of_structured_CIF 
G> _block_id		KSE 
G> loop_
G> _audit_link_block_code 
G> _audit_link_block_description 
G> .		'publication details' 
G> KSE_COM	'experimental data common to ref./mod. structures' 
G> KSE_REF	'reference structure' 
G> KSE_MOD	'modulated structure'
G> 
G> # data_block2_of_structured_CIF 
G> _block_id		KSE_COM 
G> loop_
G> _audit_link_block_code 
G> _audit_link_block_description 
G> .		'experimental data common to ref./mod. structures' 
G> KSE		'publication details' 
G> KSE_REF	'reference structure' 
G> KSE_MOD	'modulated structure'
G> 
G> #
G> 
G> Then each data block could be independently treated but each of them would
G> include the identificators of the blocks containing the rest of information
G> needed to recover the whole (in this case) modulated structure.

I could be persuaded to go along with this, and indeed my first idea was
to introduce a general _block_id (cf. the _block.id dataname in mmCIF). I held
off because I thought different applications might wish to impose their own
rules for constructing datablock identifiers (see the detailed description of
Brian T.'s _pd_block_id, for instance). But now I'm not so sure. What do
others think?


D45.8 Inheritance across data blocks
------------------------------------
G> There is a subtle (but critical) problem with blocks and DDL concerning
G> _list_link_parent.  According to its definition _list_link_parent
G> 'identifies a data item...  which matches that of the defined item,
G> AND WHICH MUST PRESENT IN THE SAME DATA BLOCK AS THE DEFINED ITEM. How
G> to interconnect list in different data blocks
G> without introducing new and specific definitions?.


D45.9 _chemical_conn_atom_type_symbol
-------------------------------------
G> If the code 'must match an
G> _atom_type_symbol code' a _list_link_parent should be included for this item.
G> However the next part of the sentence 'or be a recognisable element symbol'
G> prevents the link between chemical_conn_atom and atom_type list. I think that
G> every atom_type referenced in the CIF should be declared previously to avoid
G> this kind of ambiguities.

I suppose this illustrates a difference between the crystallographic and
chemical views of the structure. An atom site might not be associated with
a unique chemical element, while a chemical connectivity map requires known
elemental types. I am not persuaded that the existing definitions need to
be changed.


D45.10 _chemical_formula_weight and _chemical_formula_weight_meas
-----------------------------------------------------------------
G> Units should not appear in the definition.

I have tried to state the units in all of the definitions, so that they
can be read from a browser that prints only _definition. On the other
hand, there should also be a _units attribute, and this is missing in
these cases. I'll fix it.

D45.11 Broken families
----------------------
G> If _diffrn_attenuator_code is referenced by the 
G> _diffrn_refln_attenuator_code, then _list_link_child should be included.
G> 
G> _diffrn_refln_attenuator_code should include a _list_link_parent to
G> _diffrn_attenuator_code.
G> 
G> The same for _diffrn_refl_crystal_id and _exptl_crystal_id.

Will fix this.

D45.12 _diffrn_orient_refln_angle_
----------------------------------
G> _diffrn_orient_refln_angle_ should include also *_theta and *_omega.

Does anyone think differently? If not, I'll add these.


D45.13 _diffrn_refln_scan_mode
------------------------------
G> Unfortunately I have recently discovered that data
G> collection using the so-called Q-scans (scans along arbitrary reciprocal
G> directions) are also possible and they should be included in the _enumeration
G> list. On the other hand the problem is a bit more complicated since its
G> inclusion implies new items. For example the scan range is expressed for each
G> h,k,l in terms of steph,stepk,stepl and number of steps. I know that at
G> HASYLAB this kind of data collection is possible. What I do not known is if
G> this method of expressing the scan range is 'universal'.

I shall add 'q' to the enumeration list. Further information on the mechanics
of Q-scans would clearly be useful. How much detail about the experimental
scanning needs to be recorded? Does anyone know of people with sufficient
expertise to enlarge the definitions to include all current practice?

D45.14 _diffrn_reflns_number
----------------------------
G> Why are the systematic extinctions due to screw axes and
G> glide planes included in this number but not those due to centring
G> translations?. Sorry, perhaps it has already been discussed.

This was the discussion in D26.2:

S> I have been asked by a co-editor to clearly state that this number should
S> NOT include the measurement of systematically absent reflections as this
S> distorts the intent of this data item - which is to give some measure of
S> the redundancy of intensities measured when compared to _reflns_number_total
S> and _reflns_number_observed. I am not too keen on hard and fast rules about
S> this but I do see the problem - especially when it comes to some auto control
S> software that measures all data points independent of the primitivity of
S> the cell. So I tentatively propose the following:
S> 
S>    _definition
S> ;              The total number of measured diffraction intensities, 
S>                excluding reflections that are classed as systematically
S>                absent due to the non-primitivity of the crystal unit cell.
S> ;

D45.15 _diffrn_scale_group_[]
-----------------------------
G> The items under this category do not record 'details
G> about the reflections'. Only the scale factors and their code are given.

Fair enough. How about "Data items in the _diffrn_scale_group_ category
record details of the scaling factors applied to place all intensities in
the reflection lists on a common scale."

D45.16 _diffrn_standard_decay_%
-------------------------------
G> I think that the sentence: 'This value is intended....overall decay
G> in CRYSTAL QUALITY...' is too strong. There are some
G> cases (e.g. in some modulated structures) where this decay for some
G> reflections can point out some structural relaxation without crystallinity
G> (long-range order) losses.

But I suppose it is most often used to monitor the overall crystal
quality. I propose to change the sentence to read "This value usually
affords a measure of the overall decay in crystal quality..." Does the
panel consider it worth also including Gotzon's second sentence for
information purposes?

D45.17 More _units
------------------
G> _exptl_crystal_density_diffrn, _exptl_crystal_density_meas,
G> _exptl_crystal_density_meas_temp. Units should be removed from the
G> definitions and put in the corresponding _units field. Also
G> _exptl_crystal_density_meas, _exptl_crystal_density_meas_temp may include
G> '_type_conditions esd'.

Again, I'll add a _units field, and also _type_conditions  esd. Thanks.


D45.18 _geom_*_site_symmetry
----------------------------
G> Once _symmetry_equiv_pos_id have been introduced a
G> parent/child link between them and _geom_*_site_symmetry should exist. I
G> know your practical reasons for maintaining the actual structure but...

The relation is not strictly a parent/child one, since the *_symmetry
codes contain the value of a _symmetry_equiv_pos_id AND the translational
code set. I know there is still a measure of unhappiness about our
treatment of these items, but the problem with defining individual
*_symmetry_xyz_id, *_symmetry_transl_x, *_symmetry_transl_y and
*_symmetry_transl_z items, say, is that you not only introduce an
additional 60 or so data names to the Core, but you must also still make
some provision for interpreting the n_mmm codes in existing data files.

D45.19 _journal_index_ categories
---------------------------------
G> The category journal_index_author has not been defined.

Nor the categories journal_index_formula_inorganic,
journal_index_formula_organic or journal_index_subject (all separate
categories because each might be separately looped). Maybe a single
journal_index category would be better, with a _journal_index_type_code to
specify the different target index. I'll think a little more about that.

D45.20 _publ_body_
------------------
G> _publ_body_ items are certainly confusing. Are they more general and
G> include _publ_section_?. Are the latter exclusively realised for Acta?.
G> Should _publ_manuscript_text be progressively replaced by _publ_body_?

The original _publ_section_ datanames were designed for Acta C, which has
a rigid structure for each of its papers: Title, Abstract, Comment,
Experimental, References. Other datanames (like _publ_section_discussion)
reflect the older (but still rigid) structure of the earler Acta C "Full
Article" and "Short Format" paper types. These are obsolete.

The "publ_body" category should allow a less rigid structure, with
multiple text sections and subsections. It is likely that the
_publ_body_element enumeration list will be extended in time to include
items like figure captions and table captions (and perhaps table bodies),
but we prefer to get experience with working through our CIF/SGML
conversion strategy before we implement these.

"_publ_manuscript_text" allows no structure to be imposed on the contents
of the paper, and will not be favoured by Acta.

It's my expectation that we shall give direction through each journal's
Notes for Authors on which of the _publ_ data names are most appropriate for
that journal; and I anticipate a new "Guide to CIF for Authors of Acta B"
in the not-too-remote future.

D45.21 _publ_manuscript_incl_extra_defn
---------------------------------------
G> If accepted, its two possible values yes/no should appear in a
G> _enumeration list.

Correct. Thanks.

D45.22 _refine_ls_goodness_of_fit
---------------------------------
G> I cannot get the meaning of the goodness of fit e.s.d.

Do you mean the fact that _type_conditions is "esd", i.e. that the
goodness of fit value may have an associated e.s.d? This is a direct
import from the original Core entry with "_esd  yes".

D45.23 _refine_ls_number_reflns
-------------------------------
G> Reflections included in the refinement are not
G> necessarily observed. Some programs allow the inclusion of reflections marked
G> as unobserved following certain criteria (for instance if Ycal>Yobs) and this
G> inclusion is done dynamically during the least-squares process.

I have always had some difficulty in grasping the precise intent of this
data name (see for example COMCIFS 30 and 31). I am open to advice on this.

D45.24 _refln_phase_meas
------------------------
G> If it is a value experimentally determined it should
G> include the corresponding '_type_conditions esd'.

Right. Will add this.

D45.25 _reflns_scale_meas_
--------------------------
G> Are these the usually refined 'scale factors'?. If yes
G> I would add '_type_conditions esd'.

OK.


D45.26 _symmetry_space_group_name_H-M
-------------------------------------
G> This is a very cumbersome field as it allows
G> different expressions for the origin:'P 2/n 2/n 2/n (origin at -1)' or 'P 2/n
G> 2/n 2/n @ -1' or ' origin at -1 because it is more usual and the space group
G> symbol is P 2/n 2/n 2/n'. As the ambiguous Hermann-Mauguin notation is the
G> standard one I think that at least a unified expression should be given for
G> expressing the origin choice. In this case the information could be more
G> easily decoded.

Agreed in principle, though I find it difficult to see how to achieve
this. One way is to recommend use of the phrase used in International
Tables A, but authors may find it (at best) tiresome to consult IT just
for a standard phrase. Is there a more standard way of defining an origin
position, that would support the introduction of a new dataname
_symmetry_space_group_name_H-M_origin? 

D45.27 Symmetry categories
--------------------------
G> The new category _symmetry_equiv_[] collides with the symmcif project. I
G> prefer a unique symmetry category to avoid excessive parent/child links.

The problem is with the mapping of 'categories' on to relational 'tables',
such that a group of datanames that must appear in a loop has to belong to
a different category than a group of datanames that must NOT appear in a
loop. The DDL2 formalism does allow you to circumvent this by defining
'category groups', so you could have a "symmetry" category group which
includes "symmetry", "symmetry_equiv" and other categories. Under DDL1, I
think we would just have an implicit grouping through categories that
begin with the same name element. (Oh dear; sounds like I'm arguing in
favour of DDL2 here :-).

D45.28 _symmetry_equiv_pos_id
-----------------------------
G> _symmetry_equiv_pos_id should be defined directly as the 'sequence number'
G> and not as 'a code identifying each entry... It should be the sequence
G> number...'. The link with _geom_*_symmetry_ appears implicitly in the
G> definition (see point D45.18)... On the other hand this item should be
G> _list_mandatory yes (I know that this probably introduces serious problems
G> of back-compatibility).

Is everyone happy with the redefinition as 'the sequence number'?
I acknowledge the difficulty with the implicit link (see comments to D45.18).
"_list_mandatory yes" introduces serious problems of back-compatibility.

D45.29 su vs esd
----------------
G> Finally, perhaps a short note somewhere indicating that s.u. replaces e.s.d.
G> would remove from some definitions these references to "standard uncertainty
G> (e.s.d.)"

Where do you recommend as "somewhere"?

D45.30 Example for _diffrn_orient_refln_[]
------------------------------------------
G> Here is a possible example, extracted from the output of a CAD4
G> diffractometer, for _diffrn_orient_refln_[].
G>
G> loop_
G> _diffrn_orient_refln_index_h
G> _diffrn_orient_refln_index_k
G> _diffrn_orient_refln_index_l
G> _diffrn_orient_refln_angle_theta
G> _diffrn_orient_refln_angle_phi
G> _diffrn_orient_refln_angle_omega
G> _diffrn_orient_refln_angle_kappa
G> -3   2   3    7.35   44.74   2.62    17.53
G> -4   1   0    9.26   83.27   8.06     5.79
G>  0   0   6    5.85  -43.93 -25.36    86.20
G>  2   1   3    7.36  -57.87   6.26     5.42
G>  0   0  -6    5.85 -161.59  36.96   -86.79
G> -3   1   0    6.74   80.28   5.87     2.60
G>  2   0   3    5.86  -76.86  -0.17    21.34
G>  0   0  12   11.78  -44.02 -19.51    86.41
G>  0   0 -12   11.78 -161.67  42.81   -86.61
G> -5   1   0   11.75   86.24   9.16     7.44
G>  0   4   6   11.82  -19.82  10.45     4.19
G>  5   0   6   14.13  -77.28  10.17    15.34
G>  8   0   0   20.79  -77.08  25.30   -13.96

Excellent. Thank you.


D45.31 Style of referring to category names
-------------------------------------------
Just a small point on style. In the Core dictionary, the category
definitions adopt the following style:

"Data items in the _cell_ category record details about the
crystallographic cell parameters."

whereas in mmCIF this is given as 

"Data items in the CELL category record details about the
crystallographic cell parameters."

Note that in the first example, leading and trailing underscores are used; in
the second, the category reference is in upper case. In both dictionaries the
actual name of the category is "cell". These are just typographic
conventions, and Paula and I have chosen to differ on (partly) aesthetic
grounds. Does this bother anyone?


That's all for today :-)

Regards
Brian
Prev by Date: (44) More substantial changes to submitted Core dictionary
Next by Date: (46) A further round of changes to the submitted Core
Index(es):
- Date
Discussion List Archives

(45) Further discussions on the submitted Core