This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
[IUCr Home Page] [CIF Home Page] [mmCIF Home Page]

Re: multiple NMR structures

Lynn Teneyck (teneyckl@SDSC.EDU)
Fri, 20 Oct 95 23:10:20 -0700


Frances Bernstein has raised the question of how to treat multiple
models in mmCIF.  This is also related to Dale Tronrud's recent posting
about the split between diffraction data blocks and model data blocks.
Herb, Frances, and Eldon have all raised important points.

Eldon Ulrich points out that the problem is NOT just one of NMR
datasets; crystallographic problems can raise very similar issues.
Eldon says,

>My apologies.  The _atom_site.id is just a counter for the rows.  From a quick
>but not thorough look, it appears that a _atom_site_model.id by itself will do
>the trick.  I am curious how PDB or mmCIF formats will handle multiple x-ray
>structures where the data is collected as a function of time?

Ouch.  You mean we can't just push this one off onto the
spectroscopists?

There is a difference here, in that each model in this type of X-ray
experiment has an associated set of experimental data.  In the NMR case,
there is one set of data which is interpreted using multiple models.
This distinction doesn't get the crystallographers off the hook,
however; we have the choice of either treating each structure in a
time-resolved series as a separate entry, or providing a model number to
distinguish them within a single entry.

The bottom line here is that, as Dale so correctly pointed out, the true
mapping of data and models is many <-> many and the present structure does
not accomodate this.

Herb Bernstein notes that

>The question of unique atom numbers is not just a PDB question, but
>a serious mmCIF and DDL question.  At present _atom_site.id is the
>key for the atom_site category and "must uniquely identify a
>record in the ATOM_SITE list"  Certainly you may define a new and
>different key instead, but you would be making a major change
>in mmCIF in so doing.  You don't re-use an atom number when you have
>alternate pisitions for an atom.  It would seem similar in concept
>not to re-use them for multiple models.

Actually there IS an important conceptual difference here.  The
alternate positions sometimes assigned by crystallographers are part of
a single model.  The implied statement is "This set of atomic
coordinates, including the alternate conformations, is necessary to
explain the experimental observations."

The NMR datasets have a different view of the world.  In that case, the
implied statement is "This model explains the experimental observations.
So does that one."  Each model is, in a sense different from the
crystallographic case of alternate conformations, an explanation of the
data.  Therefore I argue that it is not similar in concept as to whether
or not atom numbers are re-used.

Frances Bernstein correctly notes that the current definition in the
dictionary precludes re-using atom site id's unless the models are
separated.  Specifically,

>According to the mmCIF dictionary,
>
>;      The value of _atom_site.id must uniquely identify a record in the
>       ATOM_SITE list.
>
>Thus it would not be permitted to use the same _atom_site.id for the 'same'
>atom each time it occurs in different models.
>
>For the 'same' atom in different models, all the identification fields
>(atom name, residue name, chain, etc.) are identical; the only difference is
>in the atom number and, of course, the coordinates.
>
>One could, of course, make a separate entry out of each model but that
>would become quite unwieldly with the large number (>>50) of models that
>are deposited for some structures.

Including a model number in the identification does not strike me as
unwieldly; the natural data structure to describe this is an array of
models, and it doesn't particularly matter whether it is one, two, or a
hundred models.  This may produce problems trying to map multiple
models into relational tables . . . they do not strike me as
particularly difficult, but I haven't done any serious work in that
area.  John Westbrook is much more qualified than I to comment on
whether or not this produces serious problems for database developers.

At present there seem to be three alternatives.

1.  Add a "model number" or "experiment number" to permit arrays of
    models or data.

2.  Simply concatenate the models, relying on a unique atom number to
    keep you out of trouble.

3.  Create a separate mmCIF for each model, relying on a mechanism
    outside of mmCIF to keep the models and data related.

Of these alternatives it seems to me that number 1 is the only one which
clearly preserves the logical structure of the data, but I am open to
persuasion.

Lynn Ten Eyck