Frances Bernstein has raised the question of how to treat multiple models in mmCIF. This is also related to Dale Tronrud's recent posting about the split between diffraction data blocks and model data blocks. Herb, Frances, and Eldon have all raised important points. Eldon Ulrich points out that the problem is NOT just one of NMR datasets; crystallographic problems can raise very similar issues. Eldon says, >My apologies. The _atom_site.id is just a counter for the rows. From a quick >but not thorough look, it appears that a _atom_site_model.id by itself will do >the trick. I am curious how PDB or mmCIF formats will handle multiple x-ray >structures where the data is collected as a function of time? Ouch. You mean we can't just push this one off onto the spectroscopists? There is a difference here, in that each model in this type of X-ray experiment has an associated set of experimental data. In the NMR case, there is one set of data which is interpreted using multiple models. This distinction doesn't get the crystallographers off the hook, however; we have the choice of either treating each structure in a time-resolved series as a separate entry, or providing a model number to distinguish them within a single entry. The bottom line here is that, as Dale so correctly pointed out, the true mapping of data and models is many <-> many and the present structure does not accomodate this. Herb Bernstein notes that >The question of unique atom numbers is not just a PDB question, but >a serious mmCIF and DDL question. At present _atom_site.id is the >key for the atom_site category and "must uniquely identify a >record in the ATOM_SITE list" Certainly you may define a new and >different key instead, but you would be making a major change >in mmCIF in so doing. You don't re-use an atom number when you have >alternate pisitions for an atom. It would seem similar in concept >not to re-use them for multiple models. Actually there IS an important conceptual difference here. The alternate positions sometimes assigned by crystallographers are part of a single model. The implied statement is "This set of atomic coordinates, including the alternate conformations, is necessary to explain the experimental observations." The NMR datasets have a different view of the world. In that case, the implied statement is "This model explains the experimental observations. So does that one." Each model is, in a sense different from the crystallographic case of alternate conformations, an explanation of the data. Therefore I argue that it is not similar in concept as to whether or not atom numbers are re-used. Frances Bernstein correctly notes that the current definition in the dictionary precludes re-using atom site id's unless the models are separated. Specifically, >According to the mmCIF dictionary, > >; The value of _atom_site.id must uniquely identify a record in the > ATOM_SITE list. > >Thus it would not be permitted to use the same _atom_site.id for the 'same' >atom each time it occurs in different models. > >For the 'same' atom in different models, all the identification fields >(atom name, residue name, chain, etc.) are identical; the only difference is >in the atom number and, of course, the coordinates. > >One could, of course, make a separate entry out of each model but that >would become quite unwieldly with the large number (>>50) of models that >are deposited for some structures. Including a model number in the identification does not strike me as unwieldly; the natural data structure to describe this is an array of models, and it doesn't particularly matter whether it is one, two, or a hundred models. This may produce problems trying to map multiple models into relational tables . . . they do not strike me as particularly difficult, but I haven't done any serious work in that area. John Westbrook is much more qualified than I to comment on whether or not this produces serious problems for database developers. At present there seem to be three alternatives. 1. Add a "model number" or "experiment number" to permit arrays of models or data. 2. Simply concatenate the models, relying on a unique atom number to keep you out of trouble. 3. Create a separate mmCIF for each model, relying on a mechanism outside of mmCIF to keep the models and data related. Of these alternatives it seems to me that number 1 is the only one which clearly preserves the logical structure of the data, but I am open to persuasion. Lynn Ten Eyck