This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
[IUCr Home Page] [CIF Home Page] [mmCIF Home Page]

Re: ITEM_TYPE_LIST (was RE: TER and MODEL)

John Westbrook (jwest@ndbdev.rutgers.edu)
Thu, 28 Sep 1995 07:46:22 -0400


Hi..

In the following message the issue of the special role of the '.' and '?'
characters is raised.

------------------------------------------------ Peter's message on mmciflist.
On Sep 27,  8:58am, Peter Keller wrote:
> Subject: ITEM_TYPE_LIST (was RE: TER and MODEL)
>
> > >is that there are no coordinates to go with the TER.  To be consistent
> > >with the rest of the examples in cifdic.m95, we should fill the
> > >missing coordinate columns with a period, but that means we have to
> > >ensure that "." is a valid "float" _item_type.code.  Fortunately,
> > >"float" does not yet seem to be formally defined in dd1 2.1.0, so I
> > >would suggest choosing a syntax for float which treats "." as an
> > >acceptable "float".  Then TER would be OK.
>
> The contents of the ITEM_TYPE_LIST category in mmCIF is still a major 'low
> level' problem (by that, I mean something which is independent of any
> considerations of proteins, crystallography, etc). In its current form, it
> makes robust lexical analysis of macromolecular CIF's impossible. I think
> (hope) that Paula, John and the others are including this in issues for
> discussion round about now.
>
> The fact that float is not defined in the DDL, only means that the DDL
> items themselves don't have a special float type. Float _is_ defined in
> the dictionary itself, and hence is defined for items used in CIF's. The
> definition is around line 27100, and is:
>
>                float     numb
>               '-?(([0-9]+)|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?)'
> ;              int item types are the subset of numbers that are the floating
>                numbers.
> ;
>
> [ By the way, this is another typo: that should read:
>
> ;              float item types.......
>
> ]
>
> As I'm sure you can see, this construction doesn't allow '.' Neither does
> it allow '?'. My working method so far as been to take it as read that
> these two characters are 'universals', and should be checked for before
> any attempt to interpret _item_type.code for a particular item.
>
> There are other potential problems with this, but I'll put them on hold
> for the time being, because I know that COMCIFS are talking about a whole
> range of issues around now. We'll see what comes out of these discussions.
>
> [snip]
>
> > 	the PDB format.  Quite a number of things in the PDB format
> > 	are done the wrong way.
>
> I agree with Dale on this general point. It shouldn't be necessary to
> perpetuate PDB-format kludges, no matter how ingenious.
>
> Cheers,
> Peter.
>
>-- End of excerpt from Peter Keller


I think Peter's point is important because it seems to be a source of confusion
for those that are looking closely at the contents of the mmCIF dictionary.
In selecting the regular expressions for the various numerical data types in the DDL and
dictionary no attempt has been made to allow for the either  the character for
"not appropriate" (.) or the character for "missing data" (?).  These are
defined as special characters by the STAR grammar (like data_, loop_, save_, ...)
and need to be handled in an appropriate manner by the parsing software.  We
similarly do not include quotation marks which delimit character strings in
the character type regular expresions unless these characters could reasonably
be part of the character data (eg. text).

At any rate, this was the intention and these regular expressions probably need to
be scrutinized carefully.


John

-- 
****************************************************************************
*  John Westbrook                       Ph:  (908) 445-5156                *
*  Department of Chemistry              Fax: (908) 445-5958                *
*  Rutgers University                                                      *
*  PO Box 939                        e-mail: jwest@rutchem.rutgers.edu     *
*  Piscataway, NJ 08855-0939                                               *
****************************************************************************