This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
[IUCr Home Page] [CIF Home Page] [mmCIF Home Page]

Re: struct_conn

herbert_bernstein (yaya@aip.org)
Thu, 21 Mar 96 08:08:06 EST


Now to the substance of the status of _atom_site.label_seq_id and 
_atom_site.auth_seq_id.  Should one the other or both be mandatory
in a "proper" mmCIF data set.  If we look for guidance to the
output from an NDB database search as of yesterday, we find that
the interpretation there seems to be that it is sufficient of
produce _atom_site.auth_seq_id (which is not, in the present
dictionary mandatory) but not to include _atom.site.label_seq_id
(which is, in the present dictionary mandatory).  I think that
the resulting mmCIF dataset is quite reasonable.  For these
entries all you would get by including both labels would be
a duplicated field, and the _atom_site.auth_seq_id is certainly
the one most people would want if they are to have only one.
As long as there is some simple mechanical way to derive the
necessary information, even when these fields are not duplicates,
it would seem most useful to have the _atom_site.auth_seq_id
in the atom list and "demote" _atom_site.label_seq_id to
be implicit (i.e. you have to be able to derive it, but you
need not keep presenting it) rather than mandatory, whence my
suggestion that each of the ...auth... and ...label... tokens
be made implicit, rather than mandatory, so that a rational
choice could be made, as it has been made by NDB, to present
whichever one makes sense for the dataset involved.

That being said, it would be very helpful for database work to
include a nice clean concordance of _atom_site.auth_seq_id
and _atom_site.label_seq_id in a sensible place.  The most
efficient place would seem to be in the ENTITY_POLY_SEQ
category, where, when there are non-obvious mappings a line
per matchup could be provided.  Indeed, doing it that way,
and by extending the key to include the appropriate implicit
token for author nomenclature, we can present as many alternative
author labels for a residue as one may wish, with no confusion,
and no clutter in the atom list.

Am I violating my pleas for stability?  If this is done carefully,
I think not.  It we are not to invalidate existing mmCIF data
sets, the important rules would be:
  1.  No new "mandatory" tokens be added;
  2.  New "implicit" or non-mandatory tokens may be added;
  3.  An existing mandatory token may be demoted to implicit; and
  4.  If any token is renamed, an alias for the old name be
introduced.

It is a little trickier promoting an existing non-mandatory token
to implicit.  That can cause trouble in general, but I think
in the case of these auth tokens, this can be done without much
trouble, as long as the rule is that either an auth or a label
token must be given in each appropriate context and that, in the
end, for database purposes, it must be possible to simply and
mechanically derive the label token (with the default assumption
of equality)