[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <[email protected]>
- Subject: Re: CIF Infoset
- From: "Dr P. Murray-Rust" <[email protected]>
- Date: Thu, 19 Aug 2004 14:14:49 -0000
- In-Reply-To: <f06100502bd4a5da7564a@[192.168.2.100]>
- References: <Pine.LNX.4.44.0408181322570.18193-100000@mostaccioli.csse.uwa.edu.au><f06100502bd4a5da7564a@[192.168.2.100]>
On Aug 19 2004, Herbert J. Bernstein wrote:
> At 2:26 PM +0100 8/19/04, Dr P. Murray-Rust wrote:
> >On Aug 19 2004, Herbert J. Bernstein wrote:
> >
> ...
>
> >The difficulty is not pserving the data type, but the semantics of
> >downstream decisions. If one author writes _my_phone "123-45678"
> >they are announcing this is not a number while if another writes
> >_my_phone 123-45678 they are announcing it is a number. The
> >discussion so far seems to suggest that these statements overrule
> >the datatypes specified in the dictionary entries. There is a
> >particular problem in loop_s, where it is then possible to have
> >different data types within a column:
> >
> >loop_ _atom_site_occupancy
> >1.0
> >0.3
> >"not refined"
> >"0.3"
> >"."
> >
> >which makes the implementation very difficult. I believe that a
> >programmer should be able to look up the data type in the dictionary
> >entry and write a routine that relies on a value being of the
> >correct data type and throws an exception if not.
> >
>
> If there is a dictionary, so the type is known, there are no downstream
> decisions to be made. If the data type is numeric, the non-numeric
> strings are an error.
Good. This makes things much easier.
If the data type is a character type, all the data
> values are valid.
Again no problem.
If there is no dictionary, then the parser designer has
> to make some context-sensitive typing decisions. The choice in CIFtbx is
> to infer the typing from the first instance of the data. Other choices
> could be made, including posponing the typing decision until an entire
> column is read, but whatever the decision, once it is made, the right
> thing to do is to report to the user conflicts between the type of the
> data and the type chosen for the tag.
I understand the logic of this. It is probably manageable if there are only
char and numb - but becomes impossible if there are many. I am happy to go
along with any interpretation as long as it's general across the community.
I understand your proposal as:
Author: - if it's quoted its a char. (Note there are some strings that have
to be quoted but they can only be chars anyway) - it it's not quoted no
datatype is stated.
Reader:
- if there is a dictionary the type is defined by that:
-if the dictType is a char, no problem
- if dictType = numb, and authorType is char, then error
- if dictType = numb and authorType is not stated, try to decode as numb
-if impossible, throw an error
- if there is no dictType
-if an item, try to decode as numb; if successful treat as numb else char
- if in a loop_ use this logic to decide data type of first value
- if all types are numb , decide the column is a numb
- if any types cannot be decoded as numb, make all of them chars
- never throw any dataType errors
I can live with this (as I expect that many authors will make up their own
data types without dictionaries). However I think this (and other recent
discussions need formalising in the spec. It is unlikely that implementers
will work this out consistently!
P.
It is a bit like the problem of
> working with an XML dataset without the DTD. You have to guess a bit on
> what is legal where, and sometimes you guess wrong.
Yes, but XML only has one dataType (string) if a DTD is not provided.
It is best to have
> the dictionaries in CIF just as it is best to have DTDs or schema in XML.
I agree. I think it's almost essential.
P.
> -- Herbert
>
Reply to: [list | sender only]
- References:
- Re: CIF Infoset (Nick Spadaccini)
- Re: CIF Infoset (Herbert J. Bernstein)
- Prev by Date: Re: CIF Infoset
- Next by Date: Re: CIF Infoset
- Prev by thread: Re: CIF Infoset
- Next by thread: Provence and property rights
- Index(es):

