Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: Re: Question about DDLm]

  • Subject: [Fwd: Re: Question about DDLm]
  • From: David Brown <idbrown@xxxxxxxxxxx>
  • Date: Tue, 22 Apr 2008 11:41:03 -0400
Title:
Colleagues,

It has been suggested that a discussion that a group of us have been having privately should be open to the CIF software developers' list.  I am therefore circulating James Hester's reply to emails that I and Nick Spadaccini have exchanged.  I have added notes to provide some context.

Best wishes

David Brown

-----------------------------------------------------------------------------------
CONTEXT
One feature of the new methods Dictionary Definition Language (DDLm) is that it is designed to a create virtual CIF dictionary at run time by importing the subdictionaries that were used to construct the CIF and which are specified within the CIF itself..  My proposed procedure is copied in the document below followed by James' comments.  This is preceeded by some general comments by James, which in part refer to a recent problem encountered when enCIFer tried to read in a new version of the core CIF dictionary.  The problem arose because the latest version of DDL1 lists 'su' as equivalent to, and a preferred alternative to, 'esd' in the enumeration list of _type_construct.  However, as James points out, this equivalence is not machine readable.  In the most recent core dictionary, a _type_construct has been given the value of 'su' and, of course, enCIFer is not programmed to recognize this as a legimate value, and even if it accepts it as a legitimate value, it does not know how to interpret it.  James' email follows:
Hi Nick, David and Syd:

(our discussion may be suitable for cif-developers?)

Before addressing David and Nick's comments, let me make a few assertions:

1. The DDL files contain text descriptions, which guide the programmer
in constructing programs which understand the machine-readable subset
of DDL attributes.  Once the machine-relevant human-readable content
has been programmed, no further programming should be necessary.

2. I distinguish two classes of dictionary use: validation and
generation.  Validation can be done once when a DDL domain dictionary
is published to check consistency with the DDL specifications, and has
historically been optional.  Generative capabilities in DDL1 and DDL2
are restricted to production of default values but are obviously far
richer in DDLm.

As far as enCIFer is concerned, there is no way that even the most
dictionary-conversant program could have survived the change to the
DDL.  The latest DDL1 dictionary added the enumerated value 'su' to
_type_conditions together with a human-readable description that was
machine-significant - that is, without human reprogramming there was
no way that a DDL1-conversant program would understand what 'su' meant
as a _type_condition in a definition block.  While a program that
always loaded the DDL dictionary would see that 'su' was one of the
accepted values for _type_condition, at the time that program was
written, only 'esd' would permit bracketed values to appear after a
data value.

Now to David's thoughts regarding the correct way to process CIF files
using dictionaries.

When reading in CIF files, the DDL *domain* dictionary is obviously
necessary for both validation and generation.  As for reading in the
DDL dictionary, the DDL1 dictionary generates only
'_enumeration_default' values for '_list', '_list_level',
'_list_mandatory','_type_conditions'  and '_type_construct' of which
only '_list' could sensibly change, and even then it would have
wide-ranging implications and so is unlikely.  In short, there is no
reason to read DDL1 dictionaries for generative purposes.  The DDL2
dictionary doesn't specify enumeration defaults for dictionary
attributes, so has no updateable generative sections and therefore
doesn't have to be read in as part of constructing the domain
dictionary.

So we come to DDLm.  The draft DDLm dictionary contains a number of
defaults for domain dictionary attributes, some of which could be
changed more or less easily - I'm thinking _import.if_miss, for
example.  The only method which is defined is how to construct the
_import_list.id attribute from individual components (this is actually
a human-readable description masquerading as a method, for reasons I
can go into if anyone is interested and will take up with Nick and Syd
shortly).  In any case, the DDLm dictionary is indeed necessary to
fully construct the DDL domain dictionary, both to understand
importation and to correctly assign default attribute values.  For
example, if the need arose to add a human-readable 'comment' field
into _import_list.id, the DDLm dictionary file could be changed and a
smart program would still be able to unpack the fields it needed
without producing any error.  Again, as soon as machine-significant
entries are added (as opposed to human-significant only) the program
would need to be human-edited.

Getting back to the issue at hand, the problem that I think David has
correctly identified is that there is no data item for the CIF data
block to specify to which domain dictionary(s) it conforms.

Now for the steps:

(David) 1. Program loads the latest version of DDLm.

No.  It should load the version to which the domain dictionary
conforms, so it needs to know what the domain dictionary is.

(David) 2. Program loads an audit CIF dictionary, followed by the CIF
itself.  At this stage the CIF is scanned just for the audit items
which include information on the dictionaries used to prepare the CIF.

Implicit in this is that the audit CIF dictionaries/data items are
specified externally to the data block.  The logical place is in the
DDL specification, in order to keep all human-readable
machine-significant information in one place.  This breaks the nice
symmetry of the DDL dictionary being the dictionary for the DDL domain
dictionary only.  I suspect that, in the past, programmers have just
looked at the domain dictionary (which is where these data items were
located) and hard-coded the dataname.

(David) 3. Program constructs a top level (calling) CIF that imports
the necessary dictionaries to create the virtual dictionary.

I would rephrase this as saying that the full DDL domain dictionary is
constructed (no need to reference the CIF data file any further) by
reading the DDL domain dictionary, finding the conformant DDL
dictionary, loading it and applying any generative methods to itself,
then applying any generative methods to the DDL domain dictionary
(i.e. importation).

(David) 4. Program reads the CIF again using the virtual CIF
dictionary, extracting the required crystallographic information.

No nead to read again, simply to apply the necessary methods (but
re-reading is one way to do it).

COMMENTS BY DAVID BROWN
If I understand your suggestion, my scheme would be replaced by the following
1. Read in DDLm. In fact, since the program is specific for a particular version of the DDLm there is no reason why DDLm should not be incorporated into the program when it is compiled.  Any change in the DDL requires reprogramming anyway.

2.  The CIF is read in.  In addition to what is currently in the DDLm, there would also be the definition of an item _audit_dictionary that the program would expect to find in the CIF (rather than the CIF dictionary). This item, stored in the CIF, would be an image of the top domain dictionary complete with the necessary import statements.  The CIF is read, the _audit_dictionary is extracted and stored as the initial state of the domain dictionary.

3. The domain dictionary is expanded using the import statements found in this inital state..
 
4. The expanded domain dictionary is use to interpret the the remaining items in the CIF.

A couple of points here.  The user may have the lower level dictionries stored locally and may be working off-line.  The program would need to be able to find local copies of the dictionaries.  The user may also wish to import additional dictionaries in order to calculate properties that are not included in the virtual dictionary defined in the CIF.  I assume these features could be added as options in the program.

David Brown
begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.