Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF Infoset

On Aug 17 2004, Herbert J. Bernstein wrote:

> Peter asks some interesting questions.  I do not propose to answer
> them in detail here.  However, I should point out that interpretation
> of a given CIF may require 4 sets of documents:
>   1.  The CIF itself.
>   2.  The dictionary or dictionaries defining the tags
> used in the CIF
>   3.  The relevant DDLs
>   4.  The CIF specification:
>        http://www.iucr.org/iucr-top/cif/spec/version1.1/index.html

I agree with this. A little while ago I was invited to work with Syd and 
Nick and spent 2 pleasant weeks looking at whether this could be managed in 
a self-consistent system. In theory, yes. In practice it was questionable 
whether it was worthwhile and would be used.

It is almost isomorphic with the XML schema hierarchy:


i.e. the DDL is self-validating. The problem was that *any* changes to the 
DDL have repercussions down the line which multiply. In XMLSchema we have

SchemaSchema -validates-> XSDSchema -validates-> instance

The construction of slef-consistent schemas in XML has been anything but 
trivial and has caused much argument. It is unlikely that CIF will benefit 
from a rerun.

So I have taken the pragmatic view that we have DDL2 and DDL1 as currently 
accepted and used. As my own interests are currently in DDL1 I have 
restricted my questions and conserns to CIF (i.e. not STAR) and built 
software for this. My architecture should be sufficiently modular 
thatif/when CIF extends to fuller STAR it can be enhanced.

> Many of Peter's questions are answered in the specification.

The lexical questions are. I have used the syntax and semantics documents 
as reference. I have assumed these are formal abstractions of the original 
published article(s). If they are not, then it would be useful to abstract 
additional rules - I think that implementers need to know exactly what 
documents apply and what the rules are.

> The infoset concept is useful, but be warned that the appropriate
> handling of information depends on the context within which you are
> working, regardless of whether you are using CIF or using XML or
> the PDB format.  For an application intended to just get at the data,
> comments may be discarded, while for an application intended to reformat
> the presentation of the data, comments are highly significant
> information.  Similarly, the particular form of quoting, the
> distinction between "." and "?", etc. may or may not be
> signficant.  If the application in question is, say, a
> refinement program that just needs to read CIFs to extract
> expected crystallographic data, then construction of the "infoset"
> from a CIF is particularly simple.  More demanding applications,
> e.g. in CIF validation and publication suites, may need to deal
> with more subtle data and metadata questions.
I am afraid I disagree! If the interpretation of a CIF depends on what 
program is to be used to process it then it is (IMO) not an abstract 
archive and transfer format.

Peter M-R

comcifs mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.