Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF Infoset

  • To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
  • Subject: Re: CIF Infoset
  • From: ddb@owari.msl.titech.ac.jp
  • Date: Sat, 4 Sep 2004 18:43:00 +0900 (JST)
Hi

> Here are a few more comments from IDB:
> >So how do you intend to get around this namespace issue? No CIFs that I
> >have encountered have ever declared their conformance to any 
dictionary.
> >Even if they did, there is something called the dictionary stacking
> >protocol
> >which allows those definitions to be overridden without declaring a
> >namespace.
> >On top of that there is the boundless capacity for making up your own
> >data names on the fly for which there may never be any dictionary
> >definition
> >at all. How can you reliably assign anything but a generic namespace to 
an
> >infoset? Its all just adhoc guesswork.
>
> The core dictionary defines three items which can be looped:
>     _audit_conform_dict_name
>     _audit_conform_dict_version
>     _audit_conform_dict_location        # Contains the URL where the
> dictionary can be found
> As far as I know these have not been widely used - Acta Cryst. should
> start insisting that these be included in submitted papers.  There is no
> need to give the dictionary version in anything as ephemeral a comment.


That sounds like a positive step, but would that go in every data_block or 
is it a global_ thing?

You may need to add something like _audit_conform_dict_stacking_order 
to ensure looped dictionaries of  symmetry overriding core don't get 
confused with core overriding symmetry, for example, (assuming loop order 
is not significan?) if that is possible? 

The problem I see is that the effort invested in implementing it for all 
newly created and submitted CIFs is wasted because it is an 
incomplete solution and no current software uses it or needs it.

You still have to deal with existing archives of CIF which don't state 
their conformance, and even for CIFs that  do, users are free to 
conjure up any ad hoc data names they like and use them in any context.

So, to try and resolve the namespace of each name, you would need to
(1) check the _audit_conform list of dictionaries in reverse order
(2) check against the list of registered prefixes for accidental matches
(3) check all versions of all publically accessible dictionaries
(4) then give up.

Not an efficient process if there was a match and  no guarantee that
it was a correct match if names were reused in different 
contexts in different dictionaries. Two simple things would fix that.
Associating a distinguishable prefix on each name with the _audit_conform 
stuff and banning ad hoc data names.

Anything else and you will always be just guessing.
I don't really know what you are hoping to achieve.

>
> ># start Validation Reply Form
> >_vrf_DIFF020_114
> >;PROBLEM: _diffrn_standards_interval_count and
> >RESPONSE: ... We have used an image-plate system
> >;
> >
> >If intelligent software was ever intended to deal with such _vrf_s, why
> >embed the only pointer to their purpose in supposedly non parsable data
> >names rather than  in looped, discrete sets of tags such as
> >
> >loop_
> >    _vrf_suite _vrf_subroutine _vrf_error_code _vrf_authors_response
>
> This would tidy things up, but the parser must be able to handle ad hoc
> data names without choking.


If its important enough to create a name for it then isn't it important 
enough 
define its purpose somewhere? Ad hoc data names seem to provide
nothing useful besides a legitimate excuse for laziness in the 
specification. Theres no incentive to organize things tidily.
Maybe they were important originally when COMCIFS were exploring 
the field, before dictionaries were introduced, but is it still important 
to be able to make up arbitrary stuff and stick it in a CIF without 
definition?
Who is doing this and how are they using it?
Do they really intend to save it for posterity?



> >>>>Q Is the order of "rows" in a loop_ unimportant?
> >>>
> >>>Yes (in CIF).
> >>
> >>That is very useful (and non-obvious from the spec. It then makes it
> >>possible to confirm the identity of two sets of coordinates, symmetry
> >>operations, etc.
> >>
> >>It is also debatable.
> >>The very recent introduction of _symmetry_equiv_pos_site_id means that
> >>the data integrity of the majority of prior archived CIFs containing 
tag
> >>values like:    _geom_bond_site_symmetry_1  "4_564"
> >>would be seriously impaired by a change of order in the
> >>loop_  _symmetry_equiv_pos_as_xyz
>
> This was a serious omission in the first version of CIF (you have to
> remember that this was produced before we even considered writing
> dictionaries in STAR format).  As you point out we have introduced the
> list reference _symmetry_equiv_posi_site_id (which incidentally has now
> been superceded by  _space_group_symop_id taken from the symmetry_cif
> dictionary - a dictionary which takes a more systematic and
> forward-looking approach to symmetry).  Again Acta Cryst. should insist
> on the inclusion of these id's.

Would a statement of conformance to an older dictionary version be 
sufficient grounds to escape these CIF changes (just checking :-)?

But I guess my original concern here was that order independence of loop_
structures based on earlier, and possibly alternative dictionaries, as 
well as
ad hoc looped data (maybe thats not important, but you never know...),
is not assured in general, particularly for raw data in whatever form it 
takes
(nmr? image CIF?).


> >I had a hazy recollection that  "this is a string" and   
this_is_a_string
> >were equally valid CIF constructs containing identical information
> >content,
> >used for example in space group names. Would they be formally identical 
in
> >an infoset? Does the white space in all strings have to be normalised 
(is
> >that the right word?)?
>
> We had a discussion of this point while preparing the symmetry_CIF
> dictionary and came to the decision that these two strings were not
> equivalent, i.e., underscore is not white space.. 

Bummer. I know one program that needs changes made :-(

But perhaps I could also draw your attention to this:
      http://journals.iucr.org/services/cif/stdcodes.html#Appdx4.3
as evidence that underscores do seem to be an
officially sanctioned form of white space in uchar data types.


And maybe I can raise another issue,  in the context of PMR's interest in 
data_global, would the following construct be legitimate:

data_global
   _publ_contact_author_name  "Fred"

data_a
   _import_data_from_block      global

# defined in an associated dictionary  as: 
data_import_data_from_block
    _name                      '_import_data_from_block'
    _category                  obscure_semantics
    _type                        uchar
    _definition
;
 Import all data from the named data_block into the current data_block 
Watch out for duplicate _data_element_names though!
Also watch out for circular imports!
; 

As far as I am aware there is nothing that restricts such semantics. 
Everything seems to be above board in terms of the CIF content.
its just that a request for _publ_contact_author_name  from 
within data block data_a  seems destined to fail at the software
access stage. Does that mean CIF conformant software can never be
totally CIF conformant?


Thanks for the response.
Doug



Reply to: [list | sender only]