Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF Infoset

On Aug 18 2004, Herbert J. Bernstein wrote:

> > I am afraid I disagree! If the interpretation of a CIF depends on what
> > program is to be used to process it then it is (IMO) not an abstract
> > archive and transfer format.
> Peter is entitled to his position, 

...and I'm not thrusting it on the community - but trying to see if there 
is a consensus

but it is at odds with current practice
> in handling all programming languages, HTML, 

I don't think this is true for at least the markup languages:


are both XML and therefore have an infoset. Whether implementers use it is 
their choice, but it's a formal specification. There is an HTMLDOM which is 
very well defined and is for practical purposes the infoset of XHTML

 Postscript  and, of
> course, CIF.  Almost all languages handled by computers intentionally
> have multiple "views" available from which drastically different
> infosets could be constructed.  At a minimum, most languages have
> a narrow, data-oriented view intended for consumption by an application
> oriented language processor in which all comments are discarded,
> and a broader human-oriented view in which comments must be preserved
> and certain equivalences in the languages such as quoting equivalences
> must be broken.  HTML is one of the most fruitful examples of this
> approach, with an amazing selection of views, including the subtle
> and complex interaction of comments with JavaScript and with SSI.

XHTML DOM and XHTML infoset are precisely defined. The behavioural 
semantics for HTML are wide and refer to generic concepts such as "user 
agent". The addition of javascript allows a wide range of behaviour. It is 
also true that stylesheets allow transformation of XHTML elements into 
visually different objects. Thus the elements <strong> and <em> are 
normally rendered as bold and italic. It would be allowable (though 
confusing) to render them in the reverse. However they are held internally 
as strong and em elements in a DOM and no stylesheet or semantics can 
change this.

However CIF is a data interchange specification and I have assumed that 
crystallographers have an abstract concept of what something is rather than 
how it is displayed. Thus a unit cell should be independent of the software 
to display it - the display may be different but the object represented is 

In a similar fashion we all assume that 
_cell_length_a 10.0
 and _cell_angle_alpha 80.0 have prescribed semantics and behaviour. I am 
not prevented from rendering them as a=10**-7 and alpha=1.39 (change of 
units) or even changing them to b and beta, but it would be foolish to 
interchange their values.

> > > Many of Peter's questions are answered in the specification.
> >
> > The lexical questions are. I have used the syntax and semantics 
> > documents as reference. I have assumed these are formal abstractions of 
> > the original published article(s). If they are not, then it would be 
> > useful to abstract additional rules - I think that implementers need to 
> > know exactly what documents apply and what the rules are.
> The published articles describe an older language specification, and
> the recently released on-line specification describes a newer, reasonably
> compatible, but revised language specification.  Just as reading
> information on HTML 4 is useful in understanding XHTML, reading
> information on CIF 1.0 is useful in understanding CIF 1.1, but it
> would be a mistake to try to base formal language processing decisions on
> the older specification in the areas where the newer specification says
> something different.  I would urge all CIF software developers to treat
> the new specification as a document that stands on its own legs.

I am working from the page on http://www.iucr.org labelled 
Version 1.1, working specification (posted 24 February 2003) 

which contains the specification in two files as:

"However, all CIF conformant software must as a minimum be able to locate, 
extract or write data items in strict accordance with the syntax rules that 
form part of this specification.

File syntax [link]
Common semantic features [link]

I assumed this relates to CIF1.1 and have attempted to interpret the 
specification only from these documents (i.e. precisely not referrring back 
to earlier versions.

I am finding this discussion useful - I hope others are.


Reply to: [list | sender only]