[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] LOOP versus LIST

The real issue here is the data model that we are supporting -- databases
or something with tighter control over internals.  Especially after the 
discussions at the ESRF HDF5 hyperspectral data workshop last month, I am
increasingly convinced that it is a serious mistake to move away from
the database model.  While tighter control over internals is tempting, in
the end, as we move more into multithreaded, multiprocessor multiwriter
applications, the greater the performance penalty we will pay for 
over-specifying the internal representation of a CIF, especially in ways
that deviate from the relational model.

Nothing is gained for users in making a mandatory distinction between
single row loops and the same tags with individual values.  I propose
that CIF2 adopt the DDL2 mmCIF approach of treating them as equivalent.
Joe is right that having the distinction in the DDL then forces all 
parsers to refer to the dictionary to be able to make this pointless
distinction.

-- Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Wed, 3 Feb 2010, James Hester wrote:

> I generally agree with Joe's analysis. A scalar should never be assumed
> equivalent to a single-element list, and lists and loops are fundamentally
> different in that lists have a specific order, unlike loops which assign no
> significance to column or row ordering.
> 
> Some comments:
> 
> On Tue, Jan 12, 2010 at 4:37 AM, Joe Krahn <krahn@niehs.nih.gov> wrote:
>       It makes sense to distinguish a scalar from a loop of size=1 because
>       it
>       is easier in a programming context for loop items to always be stored
>       in
>       an array. However, handling data in a programming context requires a
>       DDL, which can define how the data is stored.
> 
> 
> I don't understand the logic behind that last sentence.  The syntax specification
> itself provides enough information to convert an input file into an abstract
> datastructure, with no recourse to a DDL.  As you note below, the DDL cannot (in
> a reasonable world) override the logical structure derived from the syntax
> specification, so it cannot dictate how the datafile is stored. 
> 
> I believe that these issues are properly dealt with under the heading of
> 'infoset' (borrowed that term from XML) andcan be developed in tandem with the
> DDL.  The DDL is restricted to manipulations that are consistent with the
> infoset.
>
>       If the distinction between
>       a scalar and single-row loop is not made at the CIF syntax level, the
>       DDL should not be able to dictate the use of loops. For example, CIF
>       defines ordering of items as not significant, and DDL cannot override
>       this. It makes sense for a DDL to suggest a preferred ordering, but
>       it
>       is only a suggestion, unless CIF format rules change.
>
>       If there is a desire for DDL to mandate loop and non-loop items, then
>       CIF2 should make an explicit distinction.
>
>       To avoid similar conflicts with list items, CIF2 should state that
>       loop
>       and list items are not interchangeable, so that the following two
>       pairs
>       are not equivalent:
> 
>
>       example1:
>
>       _loop
>       item.name
>       data1
>       data2
>       data3
>
>       item.name [data1 data2 data3]
> 
>
>       example2:
>
>       item.name [data1]
>
>       item.name data1
> 
>
>       Of course, these are implementation details that can be worked out
>       after
>       the lexing syntax is finalized.
> 
> 
> I agree with the statement that these pairs are not equivalent.
>
>       Joe
> 
> Herbert J. Bernstein wrote:
> > A list with lists nested to arbitrary depth can be a single data
> > value either in a loop or just for a single tag.
> >
> > DDL2 make no distinction between a one-element loop and the same unlooped
> > tag with the same value.  DDL1 (see _list) and DDLm (see
> > _definition.class) try to make a distinctions among things that are
> > and are not permitted to be looped.
> >
> > I do not understand why it is desirable to make such a distinction for
> > a single row table, following the DDL2 approach of allowing it
> > to be handled as either
> >
> >    _xxx.aaa data1
> >    _xxx.bbb data2
> >    _xxx.ccc data3
> >
> > or
> >
> > loop_
> >    _xxx.aaa
> >    _xxx.bbb
> >    _xxx.ccc
> >
> >    data1  data2  data3
> >
> > seem harmless to me, but DDL1 and DDLm make the distinction and a proper
> > parser should note violations of what was specified for the category.
> >
> > An index key is not a name, but a string, so I think it reasonable to
> > accept the empty string as a table index value.
> >
> > Case sensitivity is an interesting question.  I would prefer case
> sensitive
> > table indices, but I suppose that matter should be discussed.
> >
> >
> >
> > At 12:41 PM -0500 1/5/10, Joe Krahn wrote:
> >> I assume that a list of items defined via a loop is distinct from a list
> >> of items defined by a list. Is that correct?
> >>
> >> Likewise, is a list of one item distinct from a scalar value?
> >>
> >> Currently, CIF files don't differentiate between a one-element loop and
> >> a scalar. For example, RCSB components.cif does not use loops for atom
> >> data when there is only one atom. Is this stated anywhere?
> >>
> >> Also, is an empty string a valid TABLE index? Other CIF names require at
> >> least one character, but my understanding is that a TABLE index is any
> >> valid string, which includes an empty string. Strings are also
> >> case-sensitive, so I assume that TABLE indices are also case-sensitive.
> >>
> >> Thanks,
> >> Joe Krahn
> >> _______________________________________________
> >> ddlm-group mailing list
> >> ddlm-group@iucr.org
> >> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
> >
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> 
> 
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> 
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]