Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

the dictionary merging protocol

  • Subject: the dictionary merging protocol
  • From: Doug du Boulay <ddb@xxxxxxxxxxxxxxxxxxxxxxx>
  • Date: Mon, 15 Jul 2002 11:25:34 +0100 (BST)

I hope it is okay to make a few comments here about the  dictionary 
overlay protocol as documented here: 

           http://www.iucr.org/iucr-top/lists/cif-developers/msg00044.html

I hope we can daw a distinction between "valid" and "conformant"
with respect to the encouraged CIF data_block tags:   
         _audit_conform_dict_name
         _audit_conform_dict_version
         _audit_conform_dict_location

The following extract from the above essentially defines valid:

<required. These issues are addressed in reverse order below. Note that an
<application seeking to validate a data file should not consider the file
<invalid if a data name is found that has no definition in the dictionaries
<referenced. The CIF standard permits the incorporation of local and
<standard names in any data file. Nevertheless, it is recommended as good
<practice that all data names in a CIF should be able to be validated against
<dictionary files, including locally constructed dictionaries.


My understanding/definition of conformant is 100% or nothing. The slightest 
discrepancy at all means it is no longer conformant.
With this definition the _audit tags above seem mislabeled, but I 
will continue here assuming the intended meaning is "valid".



What is the advantage of a CIF data_block specifying that it is 
valid against dictionaries x.dic, y.dic & z.dic when it could also 
be full of unknown/unrecognised data items to which it is not conformant?
It would also be "valid" against a null dictionary.
On the other hand, I am guessing, that we could use a new undefined 
data item as a new member of a non listable category
and then loop_ over it alone thereby not only being non conformant 
but also destroying the previously accepted validity.
Is that possible?



>From the point of view of CIF validation, the proposed dictionary merging 
protocol looks functional enough. But the protocol itself seems to be 
a set of externally based informal rules designed to be hard coded 
into validation software. The commands for specifying how to create/
assemble a dictionary to which a given CIF data block may or may not 
be conformant (even though it may be valid) are actually embeded in 
the CIF, or passed to the validation software as arguments.  

There is no support therein for fine grained control over how 
individual data items and or category classes may be totally replaced,
by or appended to from the separate disparate dictionaries. 
It is an all or nothing approach. 

The currently envisaged dictionary construction mechanism does not
yet permit specification of such PREPEND, APPEND REPLACE modification
attributes in the CIF data_block itself, so there is no way to retain
this information across dictionary reconstruction invocations.


The dictionary constructed in this manner does not even exist as
a referenceable object (it is purely virtual). This could make it 
rather more difficult to group togeather data sets conforming to the
same (i.e. which?) dictionary. 


The recent discussion of the CIF specification indicates that in CIF1.1
dictionary style save_ frames will be permitted in purely data CIFs
opening up the possibility of combined dictionaries and data. 
I am not sure if this is the direction things are intended to go
but it seems to me to be tooooo flexible for something that is 
supposed to be a purely data archival format. It also seems counterproductive
to the overall scheme of standardization because basically any 
CIF can create any dictionary it likes and say hey I am valid against this, 
(even if it doesn't conform).




What I would prefer to see in a CIF data_block is a single reference to 
a real dictionary to which it is totally 100% conformant.
I think this might force the creaters and distributors of 
CIFs to be more responsible in iether rigourously adopting existing 
dictionaries, or making their internal dictionaries available to
the community if they want their CIFs widely accepted.
Therein lies my previous concern about:

> > And can a CIF (file or data_block?) really be totally conformant with
> > more than one dictionary, i.e. why the need for item 27 loop_?  Would it
> > not be


The benefit is that it would enforce an explicit hierarchical dictionary 
dependence, through inter-dictionary reference pointers, rather than an 
inferred one based on an ad hoc protocol burried in an 
implementation layer. 
In the final example of the last appendix of the dictionary overlay protocol 
referenced above there is an already recognized scope for error.

with the data_block contents:
     loop_
         _audit_conform_dict_name
         _audit_conform_dict_version
         _audit_conform_dict_location
           a.dic  2.1   .
           b.dic  1.0   .
           c.dic  1.0   /usr/local/dics/my_local_dictionary

if you then run the hypothetical command  line

dictcheck -mode OVERLAY  test.cif

you can end up with the situation where b.dic can overlay a.dic
but c.dic cannot overlay b.dic (very last part of the example ) 

On the other hand
c.dic could overlay a.dic and b.dic can then overlay c.dic. 
without any problems.

So there is, as was noted in the appendix, a potential ordering problem. 
Problems of this type should be alleviated by an explicit
hierarchical dependence.



Despite all that my real reason for preferring CIFs to be 100% conformant 
to one single specified dictionary was based on inherent laziness coupled to  
considerations of dictionary based data structures, as distinct from 
generic CIF based data structures.


With a typical cif parser you can readily create a generic cif data structure
but if you wish to create a dictionary based data structure, 
the generic cif data structure is now forced upon you as an 
essential precursor,  because you have to pick the cif apart to find the 
dictionary conformance/creation tags in order to identify the relevent 
dictionary data model needed. 
The intermediate step could be ignored if the conformant dictionary 
was specified right up front, XML style. 


The next issue was that it would be a damn sight more convenient to be
able to use a precompiled representation of a dictionary for generating
a dictionary based data structure than it would be to have to go 
off and build a new one for every new data_block you encounter.
This former would rely more heavily on dictionary caching.


The third issue concerns valid, as distinct from conformant
data files. It would be a lot simpler to build a dictionary based data 
structure knowing that the data to be stuck in it are 100% conformant,
rather than having to cater for spurious nonconformant garbage at every level.

Given that the self proclaimed purpose of the dictionary merging 
protocol was to facilitate the development of dictionary-driven 
applications and therby, I hope dictionary driven data structures,
it would be a shame if it all got started off on an 
inconvenient footing.  But I guess these are just implementation issues.



One final comment about the COMCIFS envisaged future CIF global_ data 
structure.  If every data_block in a CIF conforms to a 
completely different dictionary you could in general wind up with 
incompatible global data and no formal way to specify the inheritance 
that other data_blocks may need. Perhaps building in a hierarchical
object oriented style subclassing model from the beginning might help to
clarify such situations in the future.   


I guess really I am just questioning what the considerations were 
when this _audit_conform business was initated, and if it could be
clarified before being cast in stone.


Thanks
Doug


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.