Dear Colleagues There is new ftp area reserved for the use of COMCIFS members. To access files in this area, make an ftp connection to agate.iucr.ac.uk (or 192.70.242.60), login as "comcifs" with password "wheatear". The directory now contains copies of the DDL2 and mmCIF dictionaries (see discussion below). D16.1 esd versus su ------------------- Some time ago, we left open ("pending further developments") the adoption of the term "statistical uncertainty" in place of "estimated standard deviation". Howard now draws our attention to the fact that there have been further developments. H> The Executive Committee has approved the Statistical Descriptors II H> report which is now with the Technical Editor's office in Chester being H> prepared for publication in Acta. A copy of this is available from the ftp directory as the file statistical_descriptors_II. H> As I mentioned earlier two of the H> recommendations concern COMCIFS and there was some short discussion about H> them previously. H> (a) the term 'standard uncertainty' (symbol su) is recommended in place of H> 'estimated standard deviation' (symbol esd). The expression 'esd' or 'e.s.d' occurs in the core dictionary in the following lines (these are simply 'grepped' from the dictionary, but the context is usually fairly clear). _type_conditions esd (in several places) chemical formulae. Parentheses are used only for e.s.d.'s. ; Net intensity and e.s.d. calculated from the diffraction counts ; The e.s.d. of the individual mean standard scales applied to the _refine_ls_shift/esd_max .535 _refine_ls_shift/esd_mean .044 876-881. The value must be between 0. and 1. with an e.s.d. weight [1/(e.s.d. squared)]. See also _refine_ls_restrained_S_ weight [1/(e.s.d. squared)] and wr is the restraint weight. data_refine_ls_shift/esd_ loop_ _name '_refine_ls_shift/esd_max' '_refine_ls_shift/esd_mean' _enumeration_detail sigma "based on measured e.s.d.'s" criterion is usually expressed in terms of an e.s.d. threshold. There seems no problem with changing these references to "s.u." except in the data names _refine_ls_shift/esd_max and _min. Here, the data names should be retained, but the definition might read _definition ; The largest and the average ratios of the final least-squares parameter shift divided by the final standard uncertainty (s.u., formerly described as estimated standard deviation, e.s.d.). ; - verbose, but unambiguous. Yes? There is also the option to change the enumeration of _type_conditions to 'su', but this may be difficult, given that the DDL1.4 paper has now gone to press. Syd, are you willing to make this change? If not, an explanation of the historical reasons for the code 'esd' can always be given in any later description of this term. H> (b) the 2 to 19 rule is recommended for the number of figures used to report H> su's. [My programming assistant, Howard D. Flack, has modified the Geneva H> version of the cif output programme of XTAL to take account of this. The H> Acta Cryst C editor has a copy of this and claims it works H> properly down under as well.] My ear has frequently been bent on the subject of the "Rule of 19" (i.e. uncertainty values of less than 2 in the last decimal place should be expanded to another significant figure). It is likely that this ruling will be strictly enforced within IUCr journals. Should this be made a part of the CIF specification - in other words, should the occurrence of a quantity such as .1243(1) render the CIF invalid? If so, several CIFs already in the IUCr would be invalid by this ruling. D28.2 and D28.3 R factors -------------------------- D> In items 28.2 and 28.3 I agree with your suggestion for *_wR_factor, D> making it the definitive value based on the reflections used in the D> refinement. It would be useful if, in the next circular, you could D> summarise all the other types of R factor that exist or have been proposed, D> together with your proposal for the ones that should be included in the D> core. I think we need these in front of us so we can register our D> approval or otherwise. Oh, dear. I find that my understanding of all the finer points of detail on these topics is fragile, but I shall do my best. I give below my summary of the existing definitions, and follow this with the full definitions, and the proposals Brian Toby has made for similar definitions in the powder dictionary (the relevant point here is that powder workers do require an unweighted R factor based on intensities). There are various R factors defined in the macromolecular dictionary, generally for shells of resolution; I would suppose the _all, _obs and (nothing) suffixes can be applied to those in line with our decisions on the core definitions. So, as I understand it, the existing definitions are: '_refine_ls_R_factor_all' '_refine_ls_R_factor_obs' R factors calculated on F (for comparison with older calculations quoting this as the conventional R factor). 'All' means 'calculated using all collected data'; 'observed" means "using all data satisfying the 'observed' criterion', i.e. all data satisfying Fo > n.sigma(Fo), where n is some arbitrary cutoff factor stipulated in _relfns_observed_criterion. '_refine_ls_wR_factor_all' '_refine_ls_wR_factor_obs' Weighted R factors calculated on |F|, F^2^ or I~net~, according to which quantity was chosen in the least-squares minimization function. The '_all' and '_obs' suffixes carry the same meaning as before. There is a problem with calculations within SHELXL93 which omit some reflections from the refinement which are believed to be sytematically wrong. The number of reflections used in the refinement is therefore not 'all' (because some relections were collected, but are now ignored), nor the number 'observed' as per the cutoff criterion (although often the two will coincide). Hence the proposal was to calculate a weighted R factor using just those reflections that are actually used in the least-squares minimisation function. This quantity would be denoted '_refine_ls_wR_factor' (with no trailing suffix). There are three points on which my mind remains a little unclear. (1) This suggestion (for *_wR_factor) is well suited to the way SHELX works. Is it appropriate as a general definition? (2) Is there any merit in following the same principle for other calculated quantities that have '_all' and '_obs' flavours (in particular, _refine_ls_R_factor_all and _refine_ls_restrained_S_all)? (3) What is the meaning of _refine_ls_number_reflns? In my previous mailing I enquired whether this was to be understood as the number of reflections used in the refinement (in other words, just the number of data points taken into account in calculating the putative _refine_ls_wR_factor). But I note that it is referred to in the definition of _refine_ls_restrained_S_ data names. Is this usage consistent? ............... The existing core definitions for R, wR and S are ................ data_refine_ls_R_factor_ loop_ _name '_refine_ls_R_factor_all' '_refine_ls_R_factor_obs' _type numb _enumeration_range 0.0: _definition ; Residual factors for all reflection data, and for reflection data classified as 'observed' (see _reflns_observed_criterion). R = (sum||Fm|-|Fc|| / sum|Fm|); Fm and Fc are measured and calculated structure factors. This is the conventional R factor. See also _refine_ls_wR_factor_ definitions. ; data_refine_ls_wR_factor_ loop_ _name '_refine_ls_wR_factor_all' '_refine_ls_wR_factor_obs' _type numb _enumeration_range 0.0: _definition ; Residual factors for all reflection data, and for reflection data classified as 'observed' (see _reflns_observed_criterion). wR = [sum(w|Ym-Yc|^2^) / sum(wYm^2^)]^1/2^ where Ym and Yc are the measured and calculated coefficients specified by the _refine_ls_structure_factor_coef; w is the least-squares weight. See also the _refine_ls_R_factor_ definitions. ; data_refine_ls_restrained_S_ loop_ _name '_refine_ls_restrained_S_all' '_refine_ls_restrained_S_obs' _type numb _enumeration_range 0.0: _definition ; The least-squares goodness-of-fit parameter S' for all data, and for observed data, after the final cycle of least squares. This parameter explicitly includes the restraints applied in the least-squares process. S' = {[sum(w|Ym-Yc|^2^) + sumr(wr|Pc-Pt|^2^)] / (Nref+Nrestr-Nparam)}^1/2^ where the sum is over the specified reflection data; sumr is over the restraint data; Nref is the number of reflections used in the refinement (see _refine_ls_number_reflns); Nparam is the number of refined parameters (see _refine_ls_number_parameters); Nrestr is the number of restraints (see _refine_ls_number_restraints); Ym and Yc are the measured and calculated coefficients specified in _refine_ls_structure_factor_coef; Pc and Pt are the calculated and target restraint values; w is the least-squares reflection weight [1/(e.s.d. squared)] and wr is the restraint weight. See also _refine_ls_goodness_of_fit_ definitions. ; ............................................................................... ............... The proposed additions in the powder dictionary are ........... data_proc_ls_I_R_factor _name '_refine_proc_ls_I_R_factor' _category refine _type numb _enumeration_range 0.0: _definition ; Residual factors for estimated reflection intensities, R~I~ = (sum~hkl~ |I~obs~(hkl) - I~calc~(hkl)| / sum I~obs~(hkl) where I~obs~(hkl) and I~calc~(hkl) are the squares of the observed and and calculated structure factors. This is often referred to as R~B~ or R~Bragg~ in Rietveld refinements. See also _pd_proc_ls_prof_ for profile R-factor definitions. ; data_pd_proc_ls_prof_ loop_ _name '_pd_proc_ls_prof_R_factor' '_pd_proc_ls_prof_wR_factor' '_pd_proc_ls_prof_wR_expected' _category pd_proc_ls _type numb _definition ; Rietveld/Profile fit R-factors Note that the R-factor computed for Rietveld refinements using the extracted reflection intensity values (often called the Rietveld or Bragg R-factor) is not properly a profile R-factor. This R-factor may be specified using _proc_ls_I_R_factor. _pd_proc_ls_prof_R_factor, often called R~p~, is an unweighted fitness metric for the agreement between the observed and computed diffraction patterns R~p~ = sum~i~ ( I~obs~(i) - I~calc~(i) ) / sum~i~ ( I~obs~(i) ) _pd_proc_ls_prof_wR_factor, often called R~wp~, is a weighted fitness metric for the agreement between the observed and computed diffraction patterns R~wp~ = SQRT { sum~i~ ( w(i) * [ I~obs~(i) - I~calc~(i) ] ^2^ ) / sum~i~ ( w(i) * [I~obs~(i)]^2^ ) } _pd_proc_ls_prof_wR_expected, sometimes called the theoretical R~wp~ or R~e~, is a weighted fitness metric for the statistical precision of the dataset. For an idealized fit, where all deviations between the observed intensities and those computed from the model are due to statistical fluctuations, the observed R~wp~ should match the expected R-factor. In reality R~wp~ will always be higher than R~e~. R~e~ = SQRT { (n - p) / sum~i~ ( w(i) * [I~obs~(i)]^2^ ) } Note that in the above equations, w(i) is the weight for the ith data point (see _pd_proc_ls_weight) I~obs~(i) is the observed intensity for the ith data point, sometimes referred to as y~i~(obs) or or y~oi~. (See _pd_meas_count_total, _pd_meas_intensity_total or _pd_proc_total). I~calc~(i) is the computed intensity for the ith data point with background and other corrections applied to match the scale of the observed dataset, sometimes referred to as y~i~(calc) or or y~ci~. (See _pd_calc_intensity_total). n is the total number of data points (see _pd_proc_number_of_points) less the number of data points excluded from the refinement. p is the total number of refined parameters. ; ............................................................................... Howard has made the following remarks, which undoubtedly have some bearing on this discussion, but I am not expert enough to see how exactly they affect our deliberations. H>D> I assume that this item is the one on which refinement is based. H> H> There are all sorts of problems with loose statements like that. H> (a) The mimimized function in least squares does not usually have (constant) H> terms in the denominator. H> (b) The LS minimisation function is defined with the scale factor(s) applied H> to the calculated quantities whereas the R factors in general have the H> scale factor applied to the observed quantities (which makes the H> denominator not to be a constant) H> (c) If you apply restraints, these act on the minimisation function but H> not on the R Factors that you seem to be talking about. D25.6 _type_construct --------------------- D> Perhaps the next circular, which promises to say more about the D> new DDL, could also let us know how we can get a copy of REGEX since this D> will be necessary for out further discussions. I have put a copy of the POSIX document discussing regular expression syntax in the new ftp area. I feel that our discussions of _type_construct have demonstrated the feasibility of this approach (and the same approach is supported in DDL2), but I am unsure how to proceed at this point. The examples I have seen so far are incomplete, and a thorough approach needs to be taken to ensure self-consistency through all the dependent components. I think I would favour dropping this from the current (that is, forthcoming!) release of the dictionaries, but working on it energetically for future releases. Is there general agreement on this, or does anyone feel that it is esential to have this feature implemented at this point? New topics ========== D30.1 Hydrogen bonds -------------------- Here's a set of notes I made some time ago (in fact, pre-COMCIFS) that has just come to light again, reminding me of another matter that is overdue for discussion. Please bear with me if I include these old notes verbatim, rather than seek to rephrase them in modern terminology! I would suppose that our preferred route now is option (2) below, but I have discovered that option (1) has been routinely implemented by the Acta staff. Comments welcome. I notice, by the way, that the struct_conn category of the mmCIF details hydrogen bonds and other interactions, but I suggest the _geom_hbond_ approach would be more suitable for small-molecule CIFs. Authors frequently wish to describe hydrogen-bond geometry, and a typical table in Acta might look like this: D---H...A D...A D---H H...A D---H...A C(6)---H(C6)...O(2)^i^ 3.276 (5) 1.00 (4) 2.34 (4) 157 (1) C(9)---H(C9)...O(2)^ii^ 3.243 (5) 0.90 (4) 2.55 (4) 134 (1) (1) Certain authors have suggested the following additional data names to be used in such a case; the data naming scheme preserves chemical information (i.e. DH is a bond distance, DA a contact), but the resultant loop contains an inelegant mixture of _bond_, _contact_ and _angle_ identifiers. _geom_bond_atom_site_label_D _geom_bond_atom_site_label_H _geom_bond_distance_DH _geom_contact_atom_site_label_A _geom_contact_distance_HA _geom_contact_distance_DA _geom_angle_DHA _geom_contact_site_symmetry_A (2) An alternative is to group all the entities under a new second-level identifier [i.e. create a new category, geom_hbond], to obtain _geom_hbond_atom_site_label_D _geom_hbond_atom_site_label_H _geom_hbond_atom_site_label_A _geom_hbond_distance_DH _geom_hbond_distance_HA _geom_hbond_distance_DA _geom_hbond_angle_DHA _geom_hbond_site_symmetry_A and perhaps also (for completeness) _geom_hbond_site_symmetry_D _geom_hbond_site_symmetry_H _geom_hbond_publ_flag (3) A third possibility is to embed all the data within the existing geometry loops [e.g. the first example would have components within loop_ _geom_bond_atom_site_label_1 _geom_bond_atom_site_label_2 _geom_bond_distance C(6) H(C6) 1.00(4) and loop_ _geom_contact_atom_site_label_1 _geom_contact_atom_site_label_2 _geom_contact_distance C(6) O(2) 3.276(5) H(C6) O(2) 2.34(4) ] but to have a set of identifier 'pointers' in a separate loop loop_ _geom_hbond_donor _geom_hbond_hydrogen _geom_hbond_acceptor _geom_hbond_symmetry_acceptor C(6) H(C6) O(2) 2 C(9) H(C9) O(2) 2_655 D30.2 The New DDL ----------------- As I mentioned in passing last October, the macromolecular community decided at the mmCIF workshop in Brussels to develop an enhanced version of the dictionary definition language for use with the mmCIF dictionary. Syd, who with Tony Cook is the author of the original DDL, agreed to this development, and has been involved in the formulation of the new version, which is to be called DDL version 2. John Westbrook (of the Nucleic Acids Data Bank at Rutgers University) has been the main architect of this, and he has been assisted by Syd and by Nick Spadaccini (who is also at University of Western Australia). I have added John and Nick to the mailing list for any discussions we may have on the new version. However, while I am sure that any constructive comments on the formalism will be welcomed, I see our role more as assessing the applicability of DDL 2 to the core and other dictionaries. Because the mmCIF will include the core definitions, it is of course necessary to have the core definitions expressed in the same formalism as the mmCIF dictionary itself, and Paula has done a magnificent job in merging the core dictionary with the mmCIF definitions to produce a compound dictionary using DDL2. The question we need to address is whether we should distribute the revised core dictionary itself in DDL2 formalism, or whether it should go out in DDL1.4 formalism and be maintained in parallel by the mmCIF developers. To help in deciding this, I have mailed to everyone a (paper) copy of the ciftex representation of Paula's latest revision. This is to demonstrate that the dictionary need not look very different from the published Core, even though the underlying representation is somewhat different; but it will also allow us to concentrate attention on the content of the definitions - the new DDL is rather more verbose than the old one, and definitions can be hard to locate. Also, it will in the long run save paper - the ciftex version is only 102 pages long (!), as opposed to the 300 or so needed for a full ASCII printout. The potential drawback is that the details of the DDL are masked, perhaps to the extent that the full power of the new formalism is not apparent. I shall therefore append to this message a listing of the new DDL dictionary, and I shall be pleased to e-mail the draft mmCIF file to anyone who wishes to see that in its full (850 kb) glory. I emphasise again that this exercise is to allow us to consider the effects of the change of formalism, not to give approval to the definitions themselves, work on which is still in progress. Indeed, I have a slightly more recent version which includes major revisions to the _entity_... items (this is available in the ftp directory). And it is important that we not become hung up on details of the formalism, except where there appear to be real problems - the gestation period for this dictionary is already undesirably long. Let me make a few general remarks about the philosophy behind DDL2. We have already had extensive discussions on the desirability of providing a self-consistent machine-readable set of data attributes, and over the last year or so the version 1 DDL has grown to include relations between data items. This approach is now taken a stage further. In the new DDL dictionary, a hierarchy of objects is defined: category_groups (arbitrarily definable groups of categories, so that the geom_bond and geom_angle categories would naturally be collected into the geom category_group); categories (corresponding to the current definition of a category as a collection of data names which may occur in the same looped list, or outside of loops in a related aggregate); subcategories (collections of data items that form a coherent set within a category, e.g. *_h, *_k and *_l items might form a miller_index subcategory); and individual data items. Each of these hierarchical objects may be described by a separate set of DDL definitions (so there is, for example, a _category.description and a _sub_category.description). The organisation of DDL2 dictionaries is different from DDL1. Each definition is given within a save_ frame, where previously each appeared within its own data block. The save_ frames are permitted STAR syntactic devices for encapsulating blocks of information which may be referenced from other places within the current data block. At this point, however, such references are not used - the save_ frames merely split the dictionary up into logical chunks, as did the previous fragmentation into data blocks. But because each definition within a dictionary is related to the rest of the information in the dictionary, it is best to have a single data block encompassing the whole dictionary. John explains the reason for this reorganisation thus: JW> The save_ syntax has been used in order to have a more consistent use of JW> scope between data files and dictionaries. Since we are representing JW> links between data items we are are using save frames so that the referenced JW> data items are all within the scope of the current dictionary. This is JW> not the case now where data_ sections are used. Links between data JW> blocks really violate the STAR scope rule that requires each data block JW> to have a separate name space. Another point of difference is that in earlier dictionaries a single data block might contain the description of more than one related (more or less) data names. Hence, in the core we have data_cell_length_ loop_ _name '_cell_length_a' '_cell_length_b' '_cell_length_c' _type numb _enumeration_range 0.0: _esd yes _esd_default 0.0 loop_ _units_extension _units_description _units_conversion ' ' 'Angstroms' *1.0 '_pm' 'picometres' /100. '_nm' 'nanometres' *10. _definition ; Unit-cell lengths corresponding to the structure reported. ... ; In the new formulation, each such definition would have its own save_ frame (i.e. one each for _cell_length_a, _b and _c). However, it IS possible to have more than one definition within a save_ frame, and this occurs when 'parent' and 'children' are defined together (recall that the child relationship provides for pointers between identifiers in different lists - a typical example is a _geom_bond_atom_atom_site_label_1 which must match an _atom_site_label). In the new dictionaries, this would be written as save_atom_site.label _item_description.description ; The _atom_site.label is a unique identifier ... ; loop_ _item.name _item.category_id _item.mandatory_code '_atom_site.label' atom_site yes '_geom_bond.atom_site_label_1' geom_bond yes '_geom_bond.atom_site_label_2' geom_bond yes loop_ _item_linked.child_name _item_linked.parent_name '_geom_bond.atom_site_label_1' '_atom_site.label' '_geom_bond.atom_site_label_2' '_atom_site.label' _item_type.code char loop_ _item_examples.case C12 Ca3g28 Fe3+17 H*251 boron2a save_ I find that this creates some problems in producing the dictionary - the entry for '_geom_bond.atom_site_label_1' must be looked up under '_atom_site.label', for instance. However, Paula has solved this problem by including save_ frames for the _geom_bond... stuff that act as cross-references to the primary definition, and I am satisfied with this. Again I asked John for some clarification of this, and he describes the way in which this arrangement better mirrors the organisation of data tables in a relational description: JW> Here is the model for what we are doing. Each category definition JW> defines a table and each item (attribute) defines a column in the table. JW> The DDL defines that table structure or logical schema on which the JW> macromolecular dictionary is built. Each instance of a DDL category JW> in the macromolecular dictionary adds a row to its category's table. JW> Since this is the logical model, it is no longer possible to simply JW> inspect the contents of a dictionary definition and expect to find JW> all of the information about an item. JW> JW> As [BM] points out, this departs from the current usage of searching JW> within each definition for all of the information about an item. I look JW> at this in the following way. The dictionary lays out the logical JW> representation for the data. This does not mean that the structure of the JW> dictionary is the most efficient way of accessing the data. We are JW> reading the dictionary and building a table structure that we can JW> search rather than roaming around looking for stuff in the dictionary. BM> In many ways I appreciate the way this is done in your formulation, but I am BM> still worried about how one answers the question "what is the meaning of the BM> data name _geom_angle.atom_site_label_1 that appears in this data file?" One BM> turns to Paula's dictionary, and locates the data name within a certain save BM> frame (save_atom_site.label, of course ;->). There is lots of useful BM> information in that save frame about the data name's attributes, but its BM> "meaning", in human terms, is located in the description given in BM> save_geom_...label_1 (which saveframe contains no instance of the data name BM> itself, except implicitly in the framecode). It's all self-consistent, but BM> to unwrap all these details needs a set of conventions or rules which are BM> not yet explicitly set out anywhere. BM> BM> I guess I'm arguing that the save frame for this example should contain as a BM> minimum BM> save_geom_angle.atom_site_label_1 BM> _item.name '_geom_angle.atom_site_label_1' BM> _item_linked.parent_name 'atom_site.label' BM> save_ BM> which is suspiciously like the DDL1 structure, and I can hear you screaming BM> already. If you won't let me have this, I can still make my dictionary BM> typesetting program work (the application I'm actually playing with just BM> now), but it involves a certain amount of special coding - the new DDL is no BM> more 'self-defining' to me than the old DDL was to you. JW> This really goes to the issue of how you search for things in the dictionary JW> that I discussed in the previous section. I agree that it is more difficult JW> to find things in the new structure. The alternative is an assembly of JW> complete definitions. This would be almost impossible to maintain given JW> the size of the macromolecular dictionary, and it would be even more JW> difficult to maintain consistency between related items. One other major change that you may have noticed is the introduction of a dot character into data names to differentiate the category name from the instance within the category. John believes this to be very important for the efficient validation of tabular relationships (in other words it makes it easy to enforce the rule that a loop_ contains only items from the same category). Note that it is not essential to do this - each dictionary definition may explicitly list the category to which the data name belongs, and indeed in Paula's elaboration of the mmCIF dictionary, she has done this. But John prefers that the category should be easily extractable from the data name alone, using the dot (or some other separator character). This will directly contradict one of our decisions (see, for example (19)A10.6) that no character beyond the leading underscore should have a special meaning. It also raises the minor difficulty that all data names adhering to this convention and including a dot will be different from the datanames published in the core dictionary. To permit compatibility with existing data files, John has introduced an alias mechanism, so that _atom_site_label will be recognised and internally translated to _atom_site.label (and, indeed, to _atom_site.id also in this particular instance), which is an entry in the new dictionary. We need to give full consideration as to the wisdom or otherwise of this approach. For the most part, the trade-off seems to be between computational efficiency in John's applications, and the multitude of headaches that might result from changing all the existing data names. However, it's not quite so simple, since some of the aliases in Paula's draft do not map to exact equivalents in the new formulation (see, for instance, _atom_site.fract_x and _atom_site.fract_x_esd versus _atom_site_fract_x). There is also a proposal to introduce a new type of data structure within the CIF formulation that allows vectors or matrices to be presented as coherent entities (described by suitable dictionary descriptions); but I am having grave difficulty with seeing how this formulation can be compatible with the current restriction on single-level loop structures in CIF, and suggest that for the present we should represent matrices in the traditional way (by listing individual components). Hence I wish to encourage the discussion to follow the following strands (which I number separately for ease of reference): D30.3 Dictionary organisation within DDL2 ----------------------------------------- I move that we accept the save_ frame organisation in DDL2-compliant dictionaries, and that we require each data name to have a matching save_ frame to allow location of the data name. [I remind you of what this is about with an example. The save_ frame save_atom_site.label contains a loop_ of _item.name's, including '_geom_bond.atom_site_label_1'. Paula has constructed a save_ frame, save_geom_bond.atom_site_label_1 which does NOT contain the _item.name '_geom_bond.atom_site_label_1', but it does have an _item_description.description that points the reader to the appropriate parent definition. I wish this to be adopted as a systematic convention throughout DDL2 CIF dictionaries.] D30.4 Dot separator in data names --------------------------------- I put forward the proposal that the dot be permitted in data names to allow explicit reference to the category to which the data name belongs. I put this forward with reservations of my own, and I invite John to elaborate on the merits of this over an explicit _item.category_id listing in every case. If it is just a matter of computational efficiency, how does it affect real computations that he has been involved in? D30.5 Matrix/vector structure types ----------------------------------- I propose that we not implement _item_structure* components of DDL2 (which describe higher-level matrix and vector structures) in the current mmCIF dictionary. If John and Phil wish to challenge this, they are invited to provide examples of how these would work in practice, within the current STAR syntax rules and CIF restricted-STAR syntax conventions. Best wishes Brian