Discussion List Archives

[Date Prev][Date Next][Date Index]

(8) restraints, dict files, *_[], comments

Dear Colleagues

There are a few contributions that have been made to existing discussion
threads, which I shall summarise here. In my next circular, I'll post a
review of the Tarrytown CIFtools meeting, which threw up several points
of relevance. In the posting after that, I'll start to develop some of the
themes that came out of that meeting (this will also include some further
points made by Brian Toby afterwards).

D4.1 Restraints
---------------
>From George Sheldrick:
G> I would like to support David's suggestion that (at least for the immediate
G> future) restraint information should be provided by program-specific names
G> beginning xtal_ or shelx_ etc.  I am confident that Dale Tronrud (_tnt_) and
G> Axel Brunger (_xplor_) will be able to help, but PROLSQ/PROFFT has probably
G> undergone more mutations in various hands and will require more thought.
G> This delegation of responsibility will certainly expedite the implementation
G> of the mm CIF.

And from Brian Toby:
B> I support the idea of assigning prefixes to programs strongly. It will
B> allow many parameters of interest to be encoded before there is
B> agreement in a field on "the one true way" to best describe and model
B> the phenomenon. For powder work a good example is Rietveld peak shape
B> parameters. Few programs use exactly the same parameterization, but
B> inclusion of the values actually used would be very valuable.
B> 
B> Further, the original CIF paper suggests that a unique code would be
B> of value for preventing conflict between locally-defined and standard
B> data items. I would like to suggest a prefix for these items, _local_.
B> The understanding will be that _local_ entries will never be accepted
B> into the standard CIF dictionary and thus will always be available to
B> individual laboratories without the potential for future conflict.
B> Likewise, the presence of a _local_ entry will signal that a local
B> dictionary will be needed to interpret the local entries.

Obviously, there's no major dissension from the viewpoint that non-standard
data names can have an identifying prefix. There are a number of practical
problems, though. Suppose I write a new high-explosive program called "TNT";
then my _tnt_ data items may conflict with Dale's. Should there be a central
registry of such labels (my heart sinks at the thought of administering
such a beast)? For this reason, the use of "_local_" as a standard flag for
non-standard data names (if you see what I mean!) runs the risk of
misinterpretation between two different "local" implementations - better that
each local group devises its own prefix. But this still doesn't guarantee
uniqueness of labels devised in different locations. One approach to this is
to supply local dictionaries, and a pointer within the CIF to load the
appropriate dictionary file, that will allow unambiguous interpretation of
the non-standard data names. This has disadvantages: it simply moves the
nomenclature problem elsewhere (how do you uniquely identify the appropriate
local dictionary to load?), and it places a burden on local data name
inventors to supply a formal documentation of their inventions (no bad thing,
of course, but will people actually do it?). I suggest we come back to the
mechanics of loading dictionaries a little later. Certainly, to begin with,
there should be no practical difficulty with _shelx_, _xtal_, _tnt_ and 
other such well known programs, as no *gentleman* (or *lady*) would steal
another's prefix.

George: would it be possible in principle to devise a set of refinement
flags or descriptors local to SHELX that enable a user to repeat the
refinement? I recall that we discussed this briefly in Goettingen earlier this
year, but I had the impression you thought it unworkable. It is of course
debatable as to whether this is an appropriate thing to do in an archive file
(recall Paula's point that some indication of the refinement techniques is
essential for long-term understanding of the archived data, but that a
program-specific listing may have a finite lifetime). 

There is obviously a trade-off between the very specific (after all, a free
text field headed _shelx_input_deck could presumably drive a complete run
of the program) and a set of descriptors of more general applicability - it
is surely legitimate to query the archive file for the list of bond distances
that were restrained to a certain length. There is a further problem, then,
when _shelx_clever_trick becomes so universally acclaimed that it ought
really to be called _clever_trick because everyone uses it!

D4.2. Introductory sections
---------------------------

This one won't quite lie down and die, though I think we're close...

B> For the "_appendix" sections, I have a strong dislike for _name_[mm]_
B> or  _name_[]_ as this is more confusing than _appendix, which we are
B> trying to improve upon. My modest suggestion is _name_.intro
B> (following the UNIX convention of .anything as hidden). This offers
B> several advantages: (1) it is more obvious as to what _.intro means
B> than _appendix; (2) it is not as ugly as  _pd_calc_[pd]_ (3) it will
B> still sort to the top. If it is necessary to have  a mm entry for a
B> category in the main dictionary one could use _.intro_mm (even _.mm is
B> better than _[mm] if characters are that precious).

Those who know me realise that I am an inveterate mugwump (political term:
someone who sits on the fence with mug on one side, wump on the other!),
but here I protest a vigorous preference for the square brackets notation.
Supporters of the dot notation should now rush to Brian's rescue - a straight
fight, and no need for the Chairman to call a vote!

B> We can further prevent confusion by displaying the _name_.intro in a
B> different format for the printed (CIFTeX version) of the dictionaries.
B> My personal favorite would be to have CIFTeX print a dictionary
B> entry:
B> 
B> data_pd_dataset_.intro
B>     _type                        null
B> 
B>     _definition
B> ;    This section is used to assign a unique code to a file or data
B>      block. In this way, the reference made to a set of data made in 
B>      the originating laboratory may be referenced when the file is read
B>      in a second laboratory....
B> ;
B> 
B> as:
B> 
B> --------------- Section Introduction: _pd_dataset_  --------------
B> This section is used to assign a unique code to a file or data
B> block. In this way, the reference made to a set of data made in 
B> the originating laboratory may be referenced when the file is read
B> in a second laboratory....
B> ------------------------------------------------------------------
B> 
B> This would also serve a second purpose. The lines would help the eye
B> separate the different sections of the dictionary and more rapidly
B> find the introductory material. 

I'm quite happy with this idea (whether via square brackets or dots!). The
typeset version of the mm dictionary puts examples in boxes, to set them
apart, and this works well, though Syd would prefer to see the whole entry in
a box. This can be a problem if a very long introductory section would split
across columns. Likewise, the rules above and below the section could become
misleading when the entry spans columns or pages. Printing on a tinted
(shaded) background is also possible, but difficult for ciftex to implement.
However, there are various typographic tricks we might employ, and we can
experiment with these when the dictionaries go to press. (I have a picture in
my mind's eye of a gold tooled leather-bound edition, printed on fine parchment
hand-wove paper...)


A4.3 One/many dictionary files
------------------------------

Brian Toby has sent this follow-up to our discussions on maintaining the CIF
Dictionary as a massive single file, or a series of modules:

B> On one vs. several files: I think it is important to distinguish
B> between the editorial process of developing a dictionary and the usage
B> of the CIF standard as each draft dictionary is "released." I would
B> not like to see the powder dictionary distributed as a single file to
B> users of CIF, since I want to encourage powder diffractionists to use
B> the core definitions as well (and define core entries, where
B> appropriate). I would suggest that for distribution we provide a few
B> files that always have the core dictionary included: mm+core,
B> powder+core, ... and one master dictionary that includes all extension
B> dictionaries for anyone working on general CIF applications. For
B> "editorial" work we should keep separate non-overlapping files as we
B> do now. The "distribution" files should simply be the "editorial"
B> files  concatenated (and I would prefer not sorted).

This point is noted, and will influence the way in which we "distribute" the
dictionaries. Again, we come back to the need for a mechanism for loading
dictionaries into applications software. I note that it is not now legitimate
simply to concatenate dictionary files (why not? because dictionaries 
currently in draft have a STAR global_ statement which extends from the point
of declaration ever onwards). This is another point to which we shall return
in the near future...


D8.1 Comments
-------------
Here's a new topic from Paula:

P> A local user is working on merging 2 CIFs (say input to a program and output
P> from a program with more stuff).  In the course of doing this, he realized 
P> that CIFs often (and perfectly validly) contain comment lines.  What is a CIF
P> parse/processer supposed to do with these lines?  Pass them on to the pro-
P> geny file?  Ignore (and thus delete) them?  Something else?
P> 
P> My answer was that anything that really belongs in the file should have a
P> home in a data item not in comment lines, but I would like to see what the
P> others think about this.

I guess this will be of some interest to Brian Toby, who earlier suggested to
me that the category introductory sections of the dictionary should be solely
comment fields! 

Presumably applications fall generally into two classes: those that extract
data from the CIF and that therefore look only at the data names and
associated values (and thus should simply skip comments); and those that
modify the file as a whole - CIF editors, shall we say. Comments lines are
especially useful in templates, such as the form.cif that we distribute, and
the Xtal template that flags data names of importance to Acta. Paul Edgington
at Cambridge has been working on a CIF editor that will load such templates
and allow the user to modify them, and he sent me this reply when I asked him
about his approach to comments:

> The answer is yes I do try and store comments between loading and saving 
> CIF's. Gets a bit hairy inside loops it has to be said as you just have to
> say "Comments are associated with the item before" or such like.

Regards
Brian