Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Opinions on comments as part of the content

At 20:11 07/03/2007, Joe Krahn wrote:
>peter murray-rust wrote:
>...
> > Our approach allows the prettification and reordering through
> > stylesheets. With CIFDOM it is trivial (using XSLT) to reorder the
> > items and loops in whatever way you wish. It is possible, though more
> > difficult, to reorder columns in tables. There is no obvious way in
> > which rows of tables can be reordered. (but note that they are
> > intrinsically unordered)
>It sounds like your use of a 'style sheet' is along the same lines as my
>concept of adding format hints to the dictionary. Perhaps a style sheet
>is better, because this allows customization for a users needs, rather
>than the impossible task of universal agreement on the 'best' layout.

In principle it is possible to attach one or more stylesheets to 
dictionaries (and we do this in one of our CML projects). But the 
main value of XSLT stylesheets is the abundance of software to 
process them. Writing your own system means you are the only one who 
can implement it.


>Given that CIF is not XML, I think it would be better to format the
>style information as CIF.

CIF is isomorphic to XML (Our CIFDOM can be transformed 
bidrectionally without loss.)

>  It might be easier to define features more
>relevant to CIF (i.e. XML does not have arrays).

XML languages can have arrays. XHTML is a good example and CML has an 
<array> element

>  It seems like it should
>be easy to define a sort order for rows and columns if desired. Would
>this fit into your style-sheet design?

It is easy to come up with your own convention (e.g. the column names 
should be lexically ordered). The hard part is getting anyone else interested.

>...
> >> Do the standards not state that names must be unique?
> >
> > Indeed it does. This does not stop many authors duplicating names
>Well, they are wrong.

Yes but it happens. If you tell and author 100 times their CIF (which 
reads into some limited software)  is invalid you don't always get thanked

>  However, you did bring up a good point. Is this
>valid if they are actually parts of the same data, but split up for some
>reason.
>
>...
> >
> > NO! global_ is part of STAR but not CIF. That is part of the problem.
> > I don't know who invented data_global but it wasn't an agreed
> > heuristic. My own belief is that in a  file such as
> >
> > data_global
> >    content_g
> > data_1
> >    content_1
> > data_2
> >    content_2
> >
> > the heuristics are:
> > * this is semantically equivalent to two separate CIFs:
> >
> > data_1
> >    content_g
> >    content_1
> >
> > and
> >
> > data_2
> >    content_g
> >    content_2
> >
> > * This requires that no items in data_global have the same names as
> > any in data_1 or data_2. This is nowhere defined and should be
> > * that the two CIFs have no other semantic relation other than any
> > that can be deduced from the common items in data_global
>I know that global_ is not part of CIF, but neither is this hack for
>using data_global. CIF says global_ is reserved for possible future use.
>Obviously people want a global_, so let's include it.

data_global IS legal CIF but with implicit semantics (or none). 
global_ is illegal CIF

>...
> > My own heuristics are:
> > _foo '?'
> > carries no useful information other than the author hasn't bothered
> > to remove it from the file
> > _foo '.'
> > is highly dangerous as the dictionary can contain default values
> > which most users have no idea of. Thus the default extinction
> > correction is (or certainly was)  'Zachariasen' and algorithmically
> > linking '.' to this value is certain to give misleading info.
> >
> > loop_
> > _foo _bar
> > a .
> > b c
> >
> > has a null value for one cell - this is required to make a 
> rectangular table
> >
> > loop_
> > _foo _bar
> > a .
> > b .
> >
> > should be equivalent to
> > loop_
> > _foo
> > a
> > b
> >
> > and this construct should be avoided
> >
> > loop_
> > _foo _bar
> > a ?
> > b ?
> >
> > is almost certainly an unedited template and should be replaced by:
> >
> > loop_
> > _foo
> > a
> > b
> >
> > and finally
> > loop_
> > _foo _bar
> > a ?
> > b c
> >
> > is indistinguishable from
> >
> > loop_
> > _foo _bar
> > a .
> > b c
> >
> > All these issues come into very sharp focus when processing CIFs - it
> > is not trivial to manage '.' in a column of otherwise real numbers.
> >
> > P.
>I take a similar approach. They both represent missing values, but
>missing for different reasons. If one really wants a default value in
>the dictionary, it should be "if not otherwise specified" and not "if
>the value is '.'". In that case, both still mean missing, just different
>reasons.
>
>Does ANYBODY really think it is practical to have two types of undefined
>values?
>
>Of course, CIF is just a text archive. There is nothing preventing the
>use of a string in the middle of an array of real numbers.

If the CIF name occurs in a loop_ and is defined in a dictionary as a 
NUMB then all values must be valid real numbers. If defined as CHAR 
it can be sequence of legal characters (there may be length restrictions).

>Some rules
>about numeric arrays would be helpful for practical use of CIF.

P.


Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road,  Cambridge CB2 1EW, UK
+44-1223-763069 



Reply to: [list | sender only]