Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF Infoset

At 2:26 PM +0100 8/19/04, Dr P. Murray-Rust wrote:
>On Aug 19 2004, Herbert J. Bernstein wrote:
>
...

>The difficulty is not pserving the data type, but the semantics of 
>downstream decisions. If one author writes _my_phone "123-45678" 
>they are announcing this is not a number while if another writes 
>_my_phone 123-45678 they are announcing it is a number. The 
>discussion so far seems to suggest that these statements overrule 
>the datatypes specified in the dictionary entries. There is a 
>particular problem in loop_s, where it is then possible to have 
>different data types within a column:
>
>loop_ _atom_site_occupancy
>1.0
>0.3
>"not refined"
>"0.3"
>"."
>
>which makes the implementation very difficult. I believe that a 
>programmer should be able to look up the data type in the dictionary 
>entry and write a routine that relies on a value being of the 
>correct data type and throws an exception if not.
>

If there is a dictionary, so the type is known, there are no downstream
decisions to be made.  If the data type is numeric, the non-numeric
strings are an error.  If the data type is a character type, all the
data values are valid.  If there is no dictionary, then the parser designer
has to make some context-sensitive typing decisions.  The choice in
CIFtbx is to infer the typing from the first instance of the data.  Other
choices could be made, including posponing the typing decision until
an entire column is read, but whatever the decision, once it is made,
the right thing to do is to report to the user conflicts between the
type of the data and the type chosen for the tag.  It is a bit like
the problem of working with an XML dataset without the DTD.  You have
to guess a bit on what is legal where, and sometimes you guess wrong.
It is best to have the dictionaries in CIF just as it is best to have
DTDs or schema in XML.

   -- Herbert
-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================


Reply to: [list | sender only]