[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
the dictionary merging protocol
- Subject: the dictionary merging protocol
- From: Doug du Boulay <ddb@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 15 Jul 2002 11:25:34 +0100 (BST)
I hope it is okay to make a few comments here about the dictionary
overlay protocol as documented here:
http://www.iucr.org/iucr-top/lists/cif-developers/msg00044.html
I hope we can daw a distinction between "valid" and "conformant"
with respect to the encouraged CIF data_block tags:
_audit_conform_dict_name
_audit_conform_dict_version
_audit_conform_dict_location
The following extract from the above essentially defines valid:
<required. These issues are addressed in reverse order below. Note that an
<application seeking to validate a data file should not consider the file
<invalid if a data name is found that has no definition in the dictionaries
<referenced. The CIF standard permits the incorporation of local and
<standard names in any data file. Nevertheless, it is recommended as good
<practice that all data names in a CIF should be able to be validated against
<dictionary files, including locally constructed dictionaries.
My understanding/definition of conformant is 100% or nothing. The slightest
discrepancy at all means it is no longer conformant.
With this definition the _audit tags above seem mislabeled, but I
will continue here assuming the intended meaning is "valid".
What is the advantage of a CIF data_block specifying that it is
valid against dictionaries x.dic, y.dic & z.dic when it could also
be full of unknown/unrecognised data items to which it is not conformant?
It would also be "valid" against a null dictionary.
On the other hand, I am guessing, that we could use a new undefined
data item as a new member of a non listable category
and then loop_ over it alone thereby not only being non conformant
but also destroying the previously accepted validity.
Is that possible?
>From the point of view of CIF validation, the proposed dictionary merging
protocol looks functional enough. But the protocol itself seems to be
a set of externally based informal rules designed to be hard coded
into validation software. The commands for specifying how to create/
assemble a dictionary to which a given CIF data block may or may not
be conformant (even though it may be valid) are actually embeded in
the CIF, or passed to the validation software as arguments.
There is no support therein for fine grained control over how
individual data items and or category classes may be totally replaced,
by or appended to from the separate disparate dictionaries.
It is an all or nothing approach.
The currently envisaged dictionary construction mechanism does not
yet permit specification of such PREPEND, APPEND REPLACE modification
attributes in the CIF data_block itself, so there is no way to retain
this information across dictionary reconstruction invocations.
The dictionary constructed in this manner does not even exist as
a referenceable object (it is purely virtual). This could make it
rather more difficult to group togeather data sets conforming to the
same (i.e. which?) dictionary.
The recent discussion of the CIF specification indicates that in CIF1.1
dictionary style save_ frames will be permitted in purely data CIFs
opening up the possibility of combined dictionaries and data.
I am not sure if this is the direction things are intended to go
but it seems to me to be tooooo flexible for something that is
supposed to be a purely data archival format. It also seems counterproductive
to the overall scheme of standardization because basically any
CIF can create any dictionary it likes and say hey I am valid against this,
(even if it doesn't conform).
What I would prefer to see in a CIF data_block is a single reference to
a real dictionary to which it is totally 100% conformant.
I think this might force the creaters and distributors of
CIFs to be more responsible in iether rigourously adopting existing
dictionaries, or making their internal dictionaries available to
the community if they want their CIFs widely accepted.
Therein lies my previous concern about:
> > And can a CIF (file or data_block?) really be totally conformant with
> > more than one dictionary, i.e. why the need for item 27 loop_? Would it
> > not be
The benefit is that it would enforce an explicit hierarchical dictionary
dependence, through inter-dictionary reference pointers, rather than an
inferred one based on an ad hoc protocol burried in an
implementation layer.
In the final example of the last appendix of the dictionary overlay protocol
referenced above there is an already recognized scope for error.
with the data_block contents:
loop_
_audit_conform_dict_name
_audit_conform_dict_version
_audit_conform_dict_location
a.dic 2.1 .
b.dic 1.0 .
c.dic 1.0 /usr/local/dics/my_local_dictionary
if you then run the hypothetical command line
dictcheck -mode OVERLAY test.cif
you can end up with the situation where b.dic can overlay a.dic
but c.dic cannot overlay b.dic (very last part of the example )
On the other hand
c.dic could overlay a.dic and b.dic can then overlay c.dic.
without any problems.
So there is, as was noted in the appendix, a potential ordering problem.
Problems of this type should be alleviated by an explicit
hierarchical dependence.
Despite all that my real reason for preferring CIFs to be 100% conformant
to one single specified dictionary was based on inherent laziness coupled to
considerations of dictionary based data structures, as distinct from
generic CIF based data structures.
With a typical cif parser you can readily create a generic cif data structure
but if you wish to create a dictionary based data structure,
the generic cif data structure is now forced upon you as an
essential precursor, because you have to pick the cif apart to find the
dictionary conformance/creation tags in order to identify the relevent
dictionary data model needed.
The intermediate step could be ignored if the conformant dictionary
was specified right up front, XML style.
The next issue was that it would be a damn sight more convenient to be
able to use a precompiled representation of a dictionary for generating
a dictionary based data structure than it would be to have to go
off and build a new one for every new data_block you encounter.
This former would rely more heavily on dictionary caching.
The third issue concerns valid, as distinct from conformant
data files. It would be a lot simpler to build a dictionary based data
structure knowing that the data to be stuck in it are 100% conformant,
rather than having to cater for spurious nonconformant garbage at every level.
Given that the self proclaimed purpose of the dictionary merging
protocol was to facilitate the development of dictionary-driven
applications and therby, I hope dictionary driven data structures,
it would be a shame if it all got started off on an
inconvenient footing. But I guess these are just implementation issues.
One final comment about the COMCIFS envisaged future CIF global_ data
structure. If every data_block in a CIF conforms to a
completely different dictionary you could in general wind up with
incompatible global data and no formal way to specify the inheritance
that other data_blocks may need. Perhaps building in a hierarchical
object oriented style subclassing model from the beginning might help to
clarify such situations in the future.
I guess really I am just questioning what the considerations were
when this _audit_conform business was initated, and if it could be
clarified before being cast in stone.
Thanks
Doug
Reply to: [list | sender only]
- Prev by Date: RE: A formal specification for CIF version 1.1 (Draft)
- Next by Date: Re: the dictionary merging protocol
- Prev by thread: Re: square brackets in the draft
- Next by thread: Re: the dictionary merging protocol
- Index(es):

