Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF formal specification

For DDL2 and all of the mmCIF applications:

+ row order is not guaranteed

+ column order in category sections is not guaranteed

+ category order within datablocks is not guaranteed

The major organizational issue for DDL2/mmcif applications are the
problems introduced if a category is repeated within a
single datablock.  This can introduce a variety of ambiguous
merge/update/overwrite situations that are best avoided.
I believe that it would be best to forbid this situation
in the syntax description.



Herbert J. Bernstein wrote:
> The "in general" on the row ordering means "in general".  A careful
> reading of the semantics of most CIF loops shows that they follow
> SQL-like rules on row ordering -- the rows can be presented
> in any order without changing the meaning of the table.  However,
> that is not an explicit rule of CIF syntax (as opposed to the
> semantics constraining the way it is used).   When possible,
> I think it would be desirable to provide columns which allow
> any and all order dependencies to be resolved from the content
> of the rows, rather than from context (e.g. atom serial numbers
> in atom lists), but I, for one, would be opposed to making
> that a syntactic requirement, especially in view of the man
> existing CIFs that do not comply.
> The table merge/split rules are a difference in semantics between
> DDL1 and DDL2.   Again, in most cases, SQL-like rules are followed,
> so this is an infrequent problem.  I think the current proposed
> wording is a fair representation of facts on the ground.
> I understand the desire to have a parser that will be aware of
> all the equivalences and symmetry violations, but as will any
> powerful and evolving language (including XML for that matter),
> some things need to be left to the applications.
> Regards,
>   Herbert
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>                  +1-631-244-3035
>                  yaya@dowling.edu
> =====================================================
> On Mon, 7 Mar 2005, Peter Murray-Rust wrote:
>>At 20:50 03/03/2005 -0500, Herbert J. Bernstein wrote:
>>>  The proposed new wording is not accurate.  There is significance to
>>>the ordering of data names, but certain reorderings do not change
>>>the meaning of the CIF. I would suggest the following combined rewrite
>>>of 7:
>>The following is very helpful. In essence it formalises the strategy that I
>>have employed in CIFDOM - the contents of a CIF may be re-ordered in
>>various ways without affecting any meaning. Of course this may surprise,
>>and even upset, some humans and it may be important to provide tools that
>>can reassure them - e.g. to display their tables in a favorite internal order
>>>7. A given data name (tag) (see 2.4 and 2.7) may appear no more than
>>>   once in a given data block or save frame.  A tag may be followed
>>>   by a single value, or a list of one or more tags may be marked by
>>>   the preceding reserved case-insensitive word loop_ as the headings
>>>   of the columns of a table of values.  White space is used to
>>>   separate a data block or save frame header from the contents of
>>>   the data block or save frame, and to separate tags, values and
>>>   the reserved word loop_.  Data items (tags along with their
>>>   associated values) that are not presented in a table of values
>>>   may be relocated along with their values within the same data
>>>   block or save frame without changing the meaning of the data block
>>>   or save frame.  Complete tables of values (the table column headings
>>>   along with all columns of data) may be relocated within the same
>>>   data block or save frame without changing the meaning of the data
>>>   block or save frame.  Within a table of values, each tag may be
>>>   relocated along with its associated column of values within the
>>>   same table of values without changing the meaning of the table of
>>>   values.  In general each row of a table of values may also be
>>>   relocated within the same table of values without changing the
>>>   meaning of the table of values.
>>I am not sure what "in general" means. It suggest that there could be some
>>implied semantics (e.g. who is first author, that the symmetry operations
>>are in a known order (- this is indeed the case). I would like to remove
>>all such implied semantics with explicit tags (although there are clearly
>>some current instances where it is a problem).
>>> Combining tables of values
>>>   or breaking up tables of values would change the meanings,
>>This is certainly true
>>>   is likely to violate the rules for constructing such tables
>>>   of values.
>>I can see that this might violate some higher level semantics (e.g.
>>references to components of tables) but I don't see that it violates
>>anything in CIF or DDL1.
>>>I apologize for the complexity of this, but it is actually harder to
>>>specify the meaning of an unordered set than it is to specify the
>>>meaning of an ordered tuple, since the former requires specification
>>>of equivalence classes, while the latter does not.
>>I agree that something of this formality is what is required.
>>Peter Murray-Rust
>>Unilever Centre for Molecular Informatics
>>Chemistry Department, Cambridge University
>>Lensfield Road, CAMBRIDGE, CB2 1EW, UK
>>Tel: +44-1223-763069
>>comcifs mailing list
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs

   John Westbrook, Ph.D.
   Rutgers, The State University of New Jersey
   Department of Chemistry and Chemical Biology
   610 Taylor Road
   Piscataway, NJ 08854-8087
   e-mail: jwest@rcsb.rutgers.edu
   Ph:  (732) 445-4290  Fax: (732) 445-4320

Reply to: [list | sender only]