[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
[Fwd: Re: Question about DDLm]
- Subject: [Fwd: Re: Question about DDLm]
- From: David Brown <idbrown@xxxxxxxxxxx>
- Date: Tue, 22 Apr 2008 11:41:03 -0400
It has been suggested that a discussion that a group of us have been having privately should be open to the CIF software developers' list. I am therefore circulating James Hester's reply to emails that I and Nick Spadaccini have exchanged. I have added notes to provide some context.
One feature of the new methods Dictionary Definition Language (DDLm) is that it is designed to a create virtual CIF dictionary at run time by importing the subdictionaries that were used to construct the CIF and which are specified within the CIF itself.. My proposed procedure is copied in the document below followed by James' comments. This is preceeded by some general comments by James, which in part refer to a recent problem encountered when enCIFer tried to read in a new version of the core CIF dictionary. The problem arose because the latest version of DDL1 lists 'su' as equivalent to, and a preferred alternative to, 'esd' in the enumeration list of _type_construct. However, as James points out, this equivalence is not machine readable. In the most recent core dictionary, a _type_construct has been given the value of 'su' and, of course, enCIFer is not programmed to recognize this as a legimate value, and even if it accepts it as a legitimate value, it does not know how to interpret it. James' email follows:
Hi Nick, David and Syd: (our discussion may be suitable for cif-developers?) Before addressing David and Nick's comments, let me make a few assertions: 1. The DDL files contain text descriptions, which guide the programmer in constructing programs which understand the machine-readable subset of DDL attributes. Once the machine-relevant human-readable content has been programmed, no further programming should be necessary. 2. I distinguish two classes of dictionary use: validation and generation. Validation can be done once when a DDL domain dictionary is published to check consistency with the DDL specifications, and has historically been optional. Generative capabilities in DDL1 and DDL2 are restricted to production of default values but are obviously far richer in DDLm. As far as enCIFer is concerned, there is no way that even the most dictionary-conversant program could have survived the change to the DDL. The latest DDL1 dictionary added the enumerated value 'su' to _type_conditions together with a human-readable description that was machine-significant - that is, without human reprogramming there was no way that a DDL1-conversant program would understand what 'su' meant as a _type_condition in a definition block. While a program that always loaded the DDL dictionary would see that 'su' was one of the accepted values for _type_condition, at the time that program was written, only 'esd' would permit bracketed values to appear after a data value. Now to David's thoughts regarding the correct way to process CIF files using dictionaries. When reading in CIF files, the DDL *domain* dictionary is obviously necessary for both validation and generation. As for reading in the DDL dictionary, the DDL1 dictionary generates only '_enumeration_default' values for '_list', '_list_level', '_list_mandatory','_type_conditions' and '_type_construct' of which only '_list' could sensibly change, and even then it would have wide-ranging implications and so is unlikely. In short, there is no reason to read DDL1 dictionaries for generative purposes. The DDL2 dictionary doesn't specify enumeration defaults for dictionary attributes, so has no updateable generative sections and therefore doesn't have to be read in as part of constructing the domain dictionary. So we come to DDLm. The draft DDLm dictionary contains a number of defaults for domain dictionary attributes, some of which could be changed more or less easily - I'm thinking _import.if_miss, for example. The only method which is defined is how to construct the _import_list.id attribute from individual components (this is actually a human-readable description masquerading as a method, for reasons I can go into if anyone is interested and will take up with Nick and Syd shortly). In any case, the DDLm dictionary is indeed necessary to fully construct the DDL domain dictionary, both to understand importation and to correctly assign default attribute values. For example, if the need arose to add a human-readable 'comment' field into _import_list.id, the DDLm dictionary file could be changed and a smart program would still be able to unpack the fields it needed without producing any error. Again, as soon as machine-significant entries are added (as opposed to human-significant only) the program would need to be human-edited. Getting back to the issue at hand, the problem that I think David has correctly identified is that there is no data item for the CIF data block to specify to which domain dictionary(s) it conforms. Now for the steps: (David) 1. Program loads the latest version of DDLm. No. It should load the version to which the domain dictionary conforms, so it needs to know what the domain dictionary is. (David) 2. Program loads an audit CIF dictionary, followed by the CIF itself. At this stage the CIF is scanned just for the audit items which include information on the dictionaries used to prepare the CIF. Implicit in this is that the audit CIF dictionaries/data items are specified externally to the data block. The logical place is in the DDL specification, in order to keep all human-readable machine-significant information in one place. This breaks the nice symmetry of the DDL dictionary being the dictionary for the DDL domain dictionary only. I suspect that, in the past, programmers have just looked at the domain dictionary (which is where these data items were located) and hard-coded the dataname. (David) 3. Program constructs a top level (calling) CIF that imports the necessary dictionaries to create the virtual dictionary. I would rephrase this as saying that the full DDL domain dictionary is constructed (no need to reference the CIF data file any further) by reading the DDL domain dictionary, finding the conformant DDL dictionary, loading it and applying any generative methods to itself, then applying any generative methods to the DDL domain dictionary (i.e. importation). (David) 4. Program reads the CIF again using the virtual CIF dictionary, extracting the required crystallographic information. No nead to read again, simply to apply the necessary methods (but re-reading is one way to do it). COMMENTS BY DAVID BROWN
If I understand your suggestion, my scheme would be replaced by the following
1. Read in DDLm. In fact, since the program is specific for a particular version of the DDLm there is no reason why DDLm should not be incorporated into the program when it is compiled. Any change in the DDL requires reprogramming anyway.
2. The CIF is read in. In addition to what is currently in the DDLm, there would also be the definition of an item _audit_dictionary that the program would expect to find in the CIF (rather than the CIF dictionary). This item, stored in the CIF, would be an image of the top domain dictionary complete with the necessary import statements. The CIF is read, the _audit_dictionary is extracted and stored as the initial state of the domain dictionary.
3. The domain dictionary is expanded using the import statements found in this inital state..
4. The expanded domain dictionary is use to interpret the the remaining items in the CIF.
A couple of points here. The user may have the lower level dictionries stored locally and may be working off-line. The program would need to be able to find local copies of the dictionaries. The user may also wish to import additional dictionaries in order to calculate properties that are not included in the virtual dictionary defined in the CIF. I assume these features could be added as options in the program.
begin:vcard fn:I.David Brown n:Brown;I.David org:McMaster University;Brockhouse Institute for Materials Research adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada email;internet:firstname.lastname@example.org title:Professor Emeritus tel;work:+905 525 9140 x 24710 tel;fax:+905 521 2773 version:2.1 end:vcard
_______________________________________________ cif-developers mailing list email@example.com http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Re: [Fwd: Re: Question about DDLm] (James Hester)