Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF Infoset

> I am afraid I disagree! If the interpretation of a CIF depends on what
> program is to be used to process it then it is (IMO) not an abstract
> archive and transfer format.

Peter is entitled to his position, but it is at odds with current practice
in handling all programming languages, HTML, XML, Postscript  and, of
course, CIF.  Almost all languages handled by computers intentionally
have multiple "views" available from which drastically different
infosets could be constructed.  At a minimum, most languages have
a narrow, data-oriented view intended for consumption by an application
oriented language processor in which all comments are discarded,
and a broader human-oriented view in which comments must be preserved
and certain equivalences in the languages such as quoting equivalences
must be broken.  HTML is one of the most fruitful examples of this
approach, with an amazing selection of views, including the subtle
and complex interaction of comments with JavaScript and with SSI.

> > Many of Peter's questions are answered in the specification.
>
> The lexical questions are. I have used the syntax and semantics documents
> as reference. I have assumed these are formal abstractions of the original
> published article(s). If they are not, then it would be useful to abstract
> additional rules - I think that implementers need to know exactly what
> documents apply and what the rules are.

The published articles describe an older language specification, and
the recently released on-line specification describes a newer, reasonably
compatible, but revised language specification.  Just as reading
information on HTML 4 is useful in understanding XHTML, reading
information on CIF 1.0 is useful in understanding CIF 1.1, but it
would be a mistake to try to base formal language processing decisions on
the older specification in the areas where the newer specification says
something different.  I would urge all CIF software developers to treat
the new specification as a document that stands on its own legs.

Regards,
  Herbert

=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 yaya@dowling.edu
=====================================================

On Wed, 18 Aug 2004, Dr P. Murray-Rust wrote:

> On Aug 17 2004, Herbert J. Bernstein wrote:
>
> > Peter asks some interesting questions.  I do not propose to answer
> > them in detail here.  However, I should point out that interpretation
> > of a given CIF may require 4 sets of documents:
> >
> >   1.  The CIF itself.
> >   2.  The dictionary or dictionaries defining the tags
> > used in the CIF
> >   3.  The relevant DDLs
> >   4.  The CIF specification:
> >        http://www.iucr.org/iucr-top/cif/spec/version1.1/index.html
>
> I agree with this. A little while ago I was invited to work with Syd and
> Nick and spent 2 pleasant weeks looking at whether this could be managed in
> a self-consistent system. In theory, yes. In practice it was questionable
> whether it was worthwhile and would be used.
>
> It is almost isomorphic with the XML schema hierarchy:
>
> DDL-validates->DDL-validates->dictionary-validates->CIF
>
> i.e. the DDL is self-validating. The problem was that *any* changes to the
> DDL have repercussions down the line which multiply. In XMLSchema we have
>
> SchemaSchema -validates-> XSDSchema -validates-> instance
>
> The construction of slef-consistent schemas in XML has been anything but
> trivial and has caused much argument. It is unlikely that CIF will benefit
> from a rerun.
>
> So I have taken the pragmatic view that we have DDL2 and DDL1 as currently
> accepted and used. As my own interests are currently in DDL1 I have
> restricted my questions and conserns to CIF (i.e. not STAR) and built
> software for this. My architecture should be sufficiently modular
> thatif/when CIF extends to fuller STAR it can be enhanced.
>
> >
> > Many of Peter's questions are answered in the specification.
>
> The lexical questions are. I have used the syntax and semantics documents
> as reference. I have assumed these are formal abstractions of the original
> published article(s). If they are not, then it would be useful to abstract
> additional rules - I think that implementers need to know exactly what
> documents apply and what the rules are.
>
> >
> > The infoset concept is useful, but be warned that the appropriate
> > handling of information depends on the context within which you are
> > working, regardless of whether you are using CIF or using XML or
> > the PDB format.  For an application intended to just get at the data,
> > comments may be discarded, while for an application intended to reformat
> > the presentation of the data, comments are highly significant
> > information.  Similarly, the particular form of quoting, the
> > distinction between "." and "?", etc. may or may not be
> > signficant.  If the application in question is, say, a
> > refinement program that just needs to read CIFs to extract
> > expected crystallographic data, then construction of the "infoset"
> > from a CIF is particularly simple.  More demanding applications,
> > e.g. in CIF validation and publication suites, may need to deal
> > with more subtle data and metadata questions.
> >
> I am afraid I disagree! If the interpretation of a CIF depends on what
> program is to be used to process it then it is (IMO) not an abstract
> archive and transfer format.
>
> Peter M-R
>
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs
>


Reply to: [list | sender only]