Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the dictionary merging protocol

  • Subject: Re: the dictionary merging protocol
  • From: "Herbert J. Bernstein" <yaya@xxxxxxxxxxxxxxxxxxxxxxx>
  • Date: Mon, 15 Jul 2002 12:43:03 +0100 (BST)
In his message, Doug says:

>
> Despite all that my real reason for preferring CIFs to be 100% conformant
> to one single specified dictionary was based on inherent laziness coupled to
> considerations of dictionary based data structures, as distinct from
> generic CIF based data structures.
>

This goal _cannot_ be achieved without invaliding some existing CIFs.
When an author creates a core CIF for publication, it is valid for him
to include items in the PUBL_MANUSCRIPT_INCL category, to specify tags
being used in that particular CIF which are not being drawn from the
standard publication request lists or, more importantly, even from the
core dictionary.

As it says in the category description:

    Data items in the PUBL_MANUSCRIPT_INCL category allow
    the authors of a manuscript submitted for publication to list
    data names that should be added to the standard request list
    employed by journal printing software. Although these fields are
    primarily intended to identify CIF data items that the author
    wishes to include in a published paper, they can also be used
    to identify data names created so that non-CIF items can be
    included in the publication. Note that *_item names MUST be
    enclosed in single quotes.

One might attempt to solve the problem by insisting that all authors
making use of this category provide a complete new dictionary merging
the core CIF dictionary with their proposed additions, but, in
order to ensure consistency of the use of tags from the "real"
core CIF dictionary, one would then have to somehow perform a
difference operation between these dictionaries, producing what
is effectively a mini-dictionary for the proposed layered extensions,
bring us right back to the starting point of having to be able to
handle layered dictionaries.

If you wish to have a fully functional validating parser, you need to
either:

  1.  Allow in the parser for the possibility of multiple layered
dictionaries; or

  2.  Provide an external filter/merge program to merge multiple
layered dictionaries into an internal temporary dictionary for
your single-dictionary validating parser to use.

Either way, we need to accept ot work out some agreeable variation on
Brian's proposed layering process.

Regards,
  Herbert

=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 020
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 yaya@dowling.edu
=====================================================

On Mon, 15 Jul 2002, Doug du Boulay wrote:

>
> I hope it is okay to make a few comments here about the  dictionary
> overlay protocol as documented here:
>
>            http://www.iucr.org/iucr-top/lists/cif-developers/msg00044.html
>
> I hope we can daw a distinction between "valid" and "conformant"
> with respect to the encouraged CIF data_block tags:
>          _audit_conform_dict_name
>          _audit_conform_dict_version
>          _audit_conform_dict_location
>
> The following extract from the above essentially defines valid:
>
> <required. These issues are addressed in reverse order below. Note that an
> <application seeking to validate a data file should not consider the file
> <invalid if a data name is found that has no definition in the dictionaries
> <referenced. The CIF standard permits the incorporation of local and
> <standard names in any data file. Nevertheless, it is recommended as good
> <practice that all data names in a CIF should be able to be validated against
> <dictionary files, including locally constructed dictionaries.
>
>
> My understanding/definition of conformant is 100% or nothing. The slightest
> discrepancy at all means it is no longer conformant.
> With this definition the _audit tags above seem mislabeled, but I
> will continue here assuming the intended meaning is "valid".
>
>
>
> What is the advantage of a CIF data_block specifying that it is
> valid against dictionaries x.dic, y.dic & z.dic when it could also
> be full of unknown/unrecognised data items to which it is not conformant?
> It would also be "valid" against a null dictionary.
> On the other hand, I am guessing, that we could use a new undefined
> data item as a new member of a non listable category
> and then loop_ over it alone thereby not only being non conformant
> but also destroying the previously accepted validity.
> Is that possible?
>
>
>
> >From the point of view of CIF validation, the proposed dictionary merging
> protocol looks functional enough. But the protocol itself seems to be
> a set of externally based informal rules designed to be hard coded
> into validation software. The commands for specifying how to create/
> assemble a dictionary to which a given CIF data block may or may not
> be conformant (even though it may be valid) are actually embeded in
> the CIF, or passed to the validation software as arguments.
>
> There is no support therein for fine grained control over how
> individual data items and or category classes may be totally replaced,
> by or appended to from the separate disparate dictionaries.
> It is an all or nothing approach.
>
> The currently envisaged dictionary construction mechanism does not
> yet permit specification of such PREPEND, APPEND REPLACE modification
> attributes in the CIF data_block itself, so there is no way to retain
> this information across dictionary reconstruction invocations.
>
>
> The dictionary constructed in this manner does not even exist as
> a referenceable object (it is purely virtual). This could make it
> rather more difficult to group togeather data sets conforming to the
> same (i.e. which?) dictionary.
>
>
> The recent discussion of the CIF specification indicates that in CIF1.1
> dictionary style save_ frames will be permitted in purely data CIFs
> opening up the possibility of combined dictionaries and data.
> I am not sure if this is the direction things are intended to go
> but it seems to me to be tooooo flexible for something that is
> supposed to be a purely data archival format. It also seems counterproductive
> to the overall scheme of standardization because basically any
> CIF can create any dictionary it likes and say hey I am valid against this,
> (even if it doesn't conform).
>
>
>
>
> What I would prefer to see in a CIF data_block is a single reference to
> a real dictionary to which it is totally 100% conformant.
> I think this might force the creaters and distributors of
> CIFs to be more responsible in iether rigourously adopting existing
> dictionaries, or making their internal dictionaries available to
> the community if they want their CIFs widely accepted.
> Therein lies my previous concern about:
>
> > > And can a CIF (file or data_block?) really be totally conformant with
> > > more than one dictionary, i.e. why the need for item 27 loop_?  Would it
> > > not be
>
>
> The benefit is that it would enforce an explicit hierarchical dictionary
> dependence, through inter-dictionary reference pointers, rather than an
> inferred one based on an ad hoc protocol burried in an
> implementation layer.
> In the final example of the last appendix of the dictionary overlay protocol
> referenced above there is an already recognized scope for error.
>
> with the data_block contents:
>      loop_
>          _audit_conform_dict_name
>          _audit_conform_dict_version
>          _audit_conform_dict_location
>            a.dic  2.1   .
>            b.dic  1.0   .
>            c.dic  1.0   /usr/local/dics/my_local_dictionary
>
> if you then run the hypothetical command  line
>
> dictcheck -mode OVERLAY  test.cif
>
> you can end up with the situation where b.dic can overlay a.dic
> but c.dic cannot overlay b.dic (very last part of the example )
>
> On the other hand
> c.dic could overlay a.dic and b.dic can then overlay c.dic.
> without any problems.
>
> So there is, as was noted in the appendix, a potential ordering problem.
> Problems of this type should be alleviated by an explicit
> hierarchical dependence.
>
>
>
> Despite all that my real reason for preferring CIFs to be 100% conformant
> to one single specified dictionary was based on inherent laziness coupled to
> considerations of dictionary based data structures, as distinct from
> generic CIF based data structures.
>
>
> With a typical cif parser you can readily create a generic cif data structure
> but if you wish to create a dictionary based data structure,
> the generic cif data structure is now forced upon you as an
> essential precursor,  because you have to pick the cif apart to find the
> dictionary conformance/creation tags in order to identify the relevent
> dictionary data model needed.
> The intermediate step could be ignored if the conformant dictionary
> was specified right up front, XML style.
>
>
> The next issue was that it would be a damn sight more convenient to be
> able to use a precompiled representation of a dictionary for generating
> a dictionary based data structure than it would be to have to go
> off and build a new one for every new data_block you encounter.
> This former would rely more heavily on dictionary caching.
>
>
> The third issue concerns valid, as distinct from conformant
> data files. It would be a lot simpler to build a dictionary based data
> structure knowing that the data to be stuck in it are 100% conformant,
> rather than having to cater for spurious nonconformant garbage at every level.
>
> Given that the self proclaimed purpose of the dictionary merging
> protocol was to facilitate the development of dictionary-driven
> applications and therby, I hope dictionary driven data structures,
> it would be a shame if it all got started off on an
> inconvenient footing.  But I guess these are just implementation issues.
>
>
>
> One final comment about the COMCIFS envisaged future CIF global_ data
> structure.  If every data_block in a CIF conforms to a
> completely different dictionary you could in general wind up with
> incompatible global data and no formal way to specify the inheritance
> that other data_blocks may need. Perhaps building in a hierarchical
> object oriented style subclassing model from the beginning might help to
> clarify such situations in the future.
>
>
> I guess really I am just questioning what the considerations were
> when this _audit_conform business was initated, and if it could be
> clarified before being cast in stone.
>
>
> Thanks
> Doug
>


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.