This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
[IUCr Home Page] [CIF Home Page] [mmCIF Home Page]

Re: request for additional token

Paula Fitzgerald (paula_fitzgerald@Merck.Com)
Wed, 1 Nov 95 16:48:23 EST


******************************
** READ AND COMMENT, PLEASE **
******************************

Hello folks -

OK, all of the little stuff in the current round has been dealt with.  Now
for some of the meatier issues.

At the end of this message, I summarize the whole thread on PDB Z values
versus the contents of _cell.formula_units_Z.  I know that including it all
here makes this message unduly long, but this is how I am trying to do
my own bookkeeping in managing this process.

That having been said, I would like to quote from a couple of these messages.
First Fran, in her original posting:

> I do not think there is a need to have a discussion about the meaning
> or the usefulness of this parameter; it has been present in PDB entries
> as long as the present format has existed.  What would be appreciated
> would be the addition of a field in the mmCIF dictionary that would
> hold this quantity and would facilitate the translation of PDB to mmCIF
> and vice versa.

Then Lynn, in a response, speaking of the definition of _cell.formula_units_Z:

>                            I don't
> think we should revise the definitions in a way
> that will make mmCIF less compatible with CIF.

These two messages summarize the conclusion of the committee on this issue -
that we really ought not to mess with the core definition in something as
fundamental as this, and that we are happy to add a data item to give Fran
the convertability that she needs.

BUT...there are some points that were raised in this dialog that need a
reply.  At one point, Herb said:

> I don't think that one would actually be revising the core CIF
> definition in any way, just allowing the mmCIF definition to
> cover a case which is exactly parallel to the small molecule
> definition, by taking the entity definitions as the functional
> equivalents of formulae.  The macromolecular case is fuzzier
> than the small molecule case, but the PDB Z happens to match
> the wording of the _cell.formula_units_Z if one adds entity
> definitions as formulae.

The key point for me here is in the last sentence "..if one adds entity
defintions as formulae."  This maybe is already clear to everyone, but just
to be sure I want to stress that the ENTITY category only tells us that the
asymmetric unit contains apples and oranges and pears, or grapes and apples
and bananas, or whatever.  The relevant category for the discussion of Z is
STRUCT_ASYM, the contents of which tell us that the asymmetric units contains
2 apples and 3 oranges and 6 pears, or 24 grapes and 1 apple and 2 bananas.

That is the point I would like to stress as we add Fran's new data item,
and hence I have written the defintion as follows:

save__cell.Z_PDB
    _item_description.description
;              The number of the polymeric chains in a unit cell.  In the case
               of heteropolymers, Z is the number of occurrences of the most
               populous chain.

               This data item is provided for compatibility with the original
               Protein Data Bank format, and only for that purpose.  This is
               not a very satisfactory definition, as the multiplicity of
               macromolecular structures can be different for the different
               components of the cell.  A more useful measure of a "macro-
               molecular Z" could be obtained by counting the number of times
               each molecular entity appears in the STRUCT_ASYM list and
               multiplying by the number of equivalent positions in the unit
               cell.  There would be one such number for each type of 
               molecular entity.
;

Comments?

Paula

- - - - -

Lynn Ten Eyck writes:

There he goes again, spoiling a good discussion with facts . . .

Herbert Bernstein (yaya@aip.org) writes

>Here, for those who don't have it handy, is the 1992 PDB commentary
>on Z, which shows, I think, how close to the existing mmCIF
>Z value definition it is:
>
>"Confusion over the value to use for Z (number of molecules per cell)
>arises because of different conceptions of the meaning of 'molecule'.
>We have adopted the (crystallographic) convention that Z should equal
>the number of times the same polymeric chain is contained in the cell.
>In case of different numbers of chains per cell this will be explained
>in the REMARK section and Z will denote the number of the more
>populous species per cell."

This is moderately reasonable; I think in the case of, say, a heterotrimer
A2B I would prefer Z=1 instead of Z=2, because it would seem to me that
the "molecule" is A2B.  However, I believe these cases are rare and there is
not much point arguing over them further.

>That being said, it would seem the definition of _cell.formula_units_Z
>would work with the addition of the following:
>
>For macromolecular structures, the value of _cell.formula_units_Z
>is the number of occurances of the entity defined by _entity_poly_seq
>in the cell.  For heterogeneous combinations of polymers for which
>the populations of distinct polymeric entities with a cell differs,
>the value for the most populous ones will be used.

This works, unless there is a holdout for applying a superstructure of NCS
definitions . . .  I think actually the heterogeneous multimeric problems are
covered in the requirement that the biological unit be defined.

- - - - -

Herbert Bernstein writes:

Here, for those who don't have it handy, is the 1992 PDB commentary
on Z, which shows, I think, how close to the existing mmCIF
Z value definition it is:

"Confusion over the value to use for Z (number of molecules per cell)
arises because of different conceptions of the meaning of 'molecule'.
We have adopted the (crystallographic) convention that Z should equal
the number of times the same polymeric chain is contained in the cell.
In case of different numbers of chains per cell this will be explained
in the REMARK section and Z will denote the number of the more
populous species per cell."

That being said, it would seem the definition of _cell.formula_units_Z
would work with the addition of the following:

For macromolecular structures, the value of _cell.formula_units_Z
is the number of occurances of the entity defined by _entity_poly_seq
in the cell.  For heterogeneous combinations of polymers for which
the populations of distinct polymeric entities with a cell differs,
the value for the most populous ones will be used.

This is not a critical issue, but it would seem helpful to users
of future multidisciplinary data base searches to use the same name
for a common concept where possible.

- - - - -

Herbert Bernstein writes:

I don't think that one would actually be revising the core CIF
definition in any way, just allowing the mmCIF definition to
cover a case which is exactly parallel to the small molecule
definition, by taking the entity definitions as the functional
equivalents of formulae.  The macromolecular case is fuzzier
than the small molecule case, but the PDB Z happens to match
the wording of the _cell.formula_units_Z if one adds entity
definitions as formulae.

- - - - -

Lynn Ten Eyck writes:

Frances has raised the issue of the PDB Z value
and a place to put it in mmCIF.  Personally I
do not see a lot of use for the quantity -- it
doesn't give the crystallographic multiplicity
of asymmetric units, and for heteropolymers
seems to me to be actively misleading.

_cell.formulat_units_Z is rooted in small
molecule crystallography, as is essentially all
of the chemical_formula material.  It seems to
me that the ENTITY data items were defined
because macromolecular crystalls are not as
well defined chemically as small molecule
crystals, and if we need a placeholder for the
PDB Z value we should put one in.  I don't
think we should revise the definitions in a way
that will make mmCIF less compatible with CIF. 
Would revision of the definition of
_cell.formula_units_Z do this?

- - - - -

Frances Bernstein writes:

This is a follow-on message to the one I sent yesterday about
needing a token for the PDB Z value, based on information provided
by Herbert Bernstein.

_cell.formula_units_Z is defined as:

;              The number of the formula units in the unit cell as specified
               by _chemical_formula.structural, _chemical_formula.moiety or
               _chemical_formula.sum.
;

But when I look at _chemical_formula I find:

;              Data items in the CHEMICAL_FORMULA category would not, in
               general, be used in a macromolecular CIF.  See instead the
               ENTITY data items.

which seems to imply that one should not be using _cell.formula_units_Z
in mmCIF.

Herbert pointed out that if one were to revise the definition of
_cell.formula_units_Z, it could be used to hold the PDB Z value.
Apparently Phil Bourne assumes that _cell.formula_units_Z is meant to
hold the PDB Z value and he uses it for that purpose in pdb2cif.

- - - - -

Frances Bernstein writes:

The PDB has a field on CRYST1 records that is called Z but it
is not the same as the crystallographic Z.  Here is the definition
from the PDB format description:

The Z-value is the number of polymeric chains in a unit cell. In the
case of heteropolymers, Z is the number of occurrences of the most
populous chain.

I do not think there is a need to have a discussion about the meaning
or the usefulness of this parameter; it has been present in PDB entries
as long as the present format has existed.  What would be appreciated
would be the addition of a field in the mmCIF dictionary that would
hold this quantity and would facilitate the translation of PDB to mmCIF
and vice versa.

- - - - -

********************************************************************************
 Dr. Paula M. D. Fitzgerald  ______________ voice and FAX: (908) 594-5510
   Merck Research Laboratories ______________ email: paula_fitzgerald@merck.com
     P.O. Box 2000, Ry50-105     ______________ or bean@merck.com           
       Rahway, NJ 07065  USA 
         (for express mail use 126 E. Lincoln Ave. instead of P. O. Box 2000)  
********************************************************************************