Discussion List Archives

[Date Prev][Date Next][Date Index]

(27) _type_construct, enumerations, category overviews...

  • To: COMCIFS@uk.ac.iucr
  • Subject: (27) _type_construct, enumerations, category overviews...
  • From: bm@uk.ac.iucr (Brian McMahon)
  • Date: Fri, 19 Aug 94 15:39:17 BST
Dear Colleagues

A brief update this time, before I set off on holiday (until Sept 7).

Items for approval
==================

A24.1 Draft CIF powder dictionary: This is now out for community review.
---------------------------------  It can be obtained from the IUCr in
            the usual ways (ftp to ftp.iucr.ac.uk, gopher to
            gopher.iucr.ac.uk, and from the WWW page
            http://www.iucr.ac.uk/welcome.html). I've also put a ciftex'd
            version in these places [Brian: it's exactly the same as the
            one I sent you].

A25.1 No use of _local_: It is agreed that dictionaries sanctioned by
-----------------------  COMCIFS will not include datanames beginning
                         with the string _local_.

A25.2 _diffrn_radiation_xray_symbol: It is agreed that this definition
-----------------------------------  will be added to the Core dictionary.

A25.7 Structuring front of dictionary: It is agreed that CIF dictionaries
-------------------------------------  will follow the structure outlined
            in 25.7, viz.

                     data_on_this_dictionary
                         _dictionary_name   xxxx  # _dictionary_ items
                     data_include_dependent_dictionaries
                         _include_file      xxx.dic  # parent dictionaries
                     global_
                         (list of global values for this file)
                     data_real_definitions ...


Continuing discussions
======================
D25.3 _pd_instr_radiation_probe to Core?
----------------------------------------
D> I am unrepentant.

Syd, would you please restate your objection, or change your vote,
according to Brian's clarification in circular 26?

D25.4 MIME types
----------------
D> 	On more mature consideration I agree with your assessment.  
D> Visionary thinking is important, but practical considerations usually
D> win the day.  I agree that *.cif files would have their uses.

D25.6 _type_construct
---------------------
D> 	This is a tricky one.  While I appreciate Syd's desire to avoid
D> creating new structures for the sake of new structures, your example of
D> how such structures could be made to simplify the parsing of
D> geom_*_site_symmetry are convincing.  I find it helpful in thinking my way
D> through cif problems to compare cif to natural language.  There are
D> obvious differences, but both are involved in the communication of ideas
D> and we can learn something from languages that have evolved in a
D> biological way.  In this case the analogy is words composed of syllables. 
D> Syllables cannot (in general) be used on their own, but every word can be
D> parsed into syllables (there are of course other ways of parsing words
D> e.g.  into letters and, for the spoken language, into phones and phonemes). 
D> It is also possible to parse some words into other words (e.g. manhole)
D> but equally well 'man' and 'hole' in this example are also syllables.  The 
D> analogy says that it should be possible to parse data items into sub 
D> items which may or may not be able to stand on their own.  In this case 
D> _sym_trans_x would not be a legal construct on its own.  In the example, 
D> you give _symmetry_equiv_posn_as_xyz as another possible syllable, but 
D> this itself is a parsable construct, parsable into syllables (e.g. 1/2+x) 
D> and letters, since each character in this smaller string has to be 
D> interpreted by the program in order to be used.  It is not possible to 
D> rely on table look up, as the structure of 1/2+x is not uniquely defined, 
D> x+1/2 would be equally acceptable.  On the other hand, we must be careful 
D> about assigning these syllables as _type null.  They are in fact of _type 
D> numb or char, and this information is important.  Perhaps they should 
D> have _category null, meaning that they cannot be used in conjunction with 
D> data items that have non-null categories.  What about _category syllable?

As I mentioned in 25.6, I favour the assignment of _category null, but
see also the discussion D27.1 below for further thoughts on assigning
_category to non-data items.

I have been talking with Syd about this, for he is anxious to have a
reliable formulation of this idea in the DDL which he is now finalising. 
I should mention here that we intend that the 'expanded' form of the
_type_construct should be a series of regular expressions conforming to the
POSIX standard IEEE document P1003.2 Draft 11.2 Sept 1991, pp.121-140. What
this means is that in an example such as

   _type_construct
;
       (_chronology_year    )\             # year must occur
    ((-(_chronology_month   ))?\           # month only if year
    ((-(_chronology_day     ))?\           # day only if month
    ((T(_chronology_hour    ))?\           # hour only if day
    ((:(_chronology_minute  ))?\           # minute only if hour
     (:(_chronology_second  ))?)?\         # second only if minute
   [+-](_chronology_timezone))?)?)?)?      # timezone if any time
;

an application needs to preprocess it to (a) strip comments, escaped
end-of-lines and blanks, and (b) expand each inner component recursively.
The expansion will finally result in a string, something horrible like

([1-9][0-9][0-9][0-9])((-([1-9]|1[012]))?((-([1-9]|[12][0-9]|3[01]))?((T....

which is a pure regexp that is POSIX conformant and may be matched by 
a standard library implementation of a regex pattern matcher.

As to the question of whether the components of this are, in David's metaphor,
'words' or 'syllables', Syd comments

S> Your point about the symmetry is a telling one and I agree that in this case
S> the symop and cell components would need to generic. BUT these generic values
S> would need to reside in the cif core dictionary because they are application
S> specific! AND this is the point that Nick made -- some will be generic and 
S> some will not. Some will be in the STAR core and some in the application
S> dictionaries. There should not be any rules about what form the components 
S> of a _type_construct specification should have. It will depend on the 
S> application. 

In other words, dictionary (application) developers are at liberty to supply
'syllables' (such as my posited _chronology_year) that may be used within
appropriate _type_construct's; and these syllables may be used or not as most
appropriate to the application. So if it were useful to have the
_type_construct for, say, _audit_creation_date, defined as

   _type_construct
; 
       (_audit_creation_year    )\             # year must occur 
    ((-(_audit_creation_month   ))?\           # month only if year 
    ((-(_audit_creation_day     ))?            # day only if month 
; 

where each component is separately defined as a valid dataname, this would be
perfectly legitimate - the _chronology_year... 'syllables' don't have to be
used.


D26.1 _diffrn_standards_decay_%
-------------------------------
D> 	The definition of this certainly needs to be changed but see my 
D> critique in D23.2 which shows why Syd's definition is not much better 
D> than the one that is already there.  What, exactly, is meant by 'mean 
D> intensity'?  Yes, in a sense it is obvious, but what precisely is the 
D> number that should be used?  The problems with this definition were why I 
D> favoured approaching the problem a different way.

D26.2 _diffrn_reflns_number
---------------------------
D> 	Syd's redefinition is fine.  I assume that he is not concerned 
D> that the number would include any absences caused by glide planes if 
D> these were measured.  In many cases they will be measured just to confirm 
D> the space group, and this fact will be important.

D26.3 *_scat_length_neutron
---------------------------
D> 	OK

D26.4 concatenated enumeration codes
------------------------------------
D> 	Strictly, where more than one enumeration code applies they 
D> should be looped.  In the example given, this is impossible because they 
D> already occur in a loop.  Is there any difference between  ABC and 'A B 
D> C' which would seem to be the only alternative.  This has to be a 
D> parsable datafield and, as long as only single letter codes are used, 
D> there is no ambiguity.

Hmmm... I think there are real problems here if one is to take the role of
enumerated values entirely literally. We (Paula, for instance) have often
argued that the purpose of an enumerated list is to supply a set of codes
representing the ONLY valid values that a data item can take. The
concatenation of single-letter codes is an exception to this - possibly a
legitimate exception, but an exception nonetheless. And the problem is, how
do you inform an automaton to treat certain cases exceptionally? Note, too,
that there already exceptions of a different type, for instance

    _name                      '_refine_ls_weighting_scheme'
    _category                    refine_ls
    _type                        char
    loop_ _enumeration          
          _enumeration_detail    sigma    "based on measured e.s.d.'s"
                                 unit     'unit or no weights applied'
                                 calc     'calculated weights applied'
    _enumeration_default         sigma   
    _definition
;              The weighting scheme applied in the least-squares process. The
               standard code may be followed by a description of the weight.
;                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm not sure how best to proceed with these cases - outlaw the exceptional
values altogether, or leave the general applications programmer with the
problem of not being able to rely on an enumeration list for validation?


Now for an old favourite, revived and given a new discussion thread
identifying code (see, for example, (12)A4.2 for an insertion point into
past discussions).

D27.1 Review of _category assignment for *_[] items
---------------------------------------------------
S> D25.5: I do not think there should be another DDL name for this.

(This arose from Peter's objection that there is only an implicit linking
of, say, data_audit_[] to the category 'audit'. Note, however, that the
agreement called for in (12)A4.2, and further discussed in (17)A4.2, does
formalise this linkage.)

S> But Peter's discussion does raise a small matter about the category of
S> item *_[] used to summarise a category of items. The debate on the name
S> structure of these items took place when I was banished from the Land of
S> Cif and living in the Land of Acta, so I really have no right to resurrect
S> this issue (but I will :-}). Why "dictionary_definition" for _category??
S> I have two reasons for asking this. One, I thought the idea of _category
S> was to allow items to be logically grouped [hence the actual category
S> codes "audit", "chronology". etc. seem to be more logical?]. Two, my 
S> first reaction on encountering dictionary_definition for the first time
S> was that it was something special to do with the DDL _dictionary_* items!
S> I consider this confusion to be unnecessary and somewhat worrying.
S> 
S> I should add the obvious point that since _type is null there is really no
S> need for all *_[] items to be in classed the same category.

Syd and I have been in further discussion about this, and we have agreed to
differ over whether the introductory items should belong to the category they
describe, or to a category of "introductory items". Peter, as you will recall
from D25.5, also supports the idea that these all belong together in their own
category. We worried a bit over the problem that categories are assigned in
such a way that one might find _atom_site_[] popping up in the middle of an
_atom_site_ loop; but in practice this is forbidden by the way one must
interpret the _type 'null' as assigned to these _[] items. Syd went on to say

S> Having said this, I want to make it very clear that I do not consider this
S> to be a crucial issue in the DDL -- I can live with *_[] items all having
S> the same _category, but I want it recorded that I think that it is unduly
S> awkward. I also want to say that if this unnatural grouping is to be 
S> retained, then for goodness sake give it a category name that reflects
S> its purpose -- "dictionary_definition" certainly does not, and to make it
S> worse will lead to confusion with the true _dictionary_* items that do
S> have the _category value "dictionary_..."!! A category name
S> "category_summary", "category_overview",  or simply "summary" or
S> "overview" would be better, but certainly not as good as "atom_site", etc.!

I see the merit in this proposal - what shall we change the name to?
"Overview" seems OK. I would prefer "explication", but accept that few
people are familiar with the term.

=====================
As I mentioned above, I'm off for a fortnight's holiday. From the
relatively light flow of correspondence of late, I guess many of you have
been, too. The alternative - that some of us are finding COMCIFS business
too indigestible - is, I hope, not true; but take heart. I obtained a
copy of the POSIX standard referred to above to check on the regex rules,
and its 993 pages make our standards deliberations appear positively
frothy and zestful!

Best wishes
Brian