This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
[IUCr Home Page] [CIF Home Page] [mmCIF Home Page]

Suggestions for the definition of chemical links

Dale Tronrud (DALE@gold.uoregon.edu)
Tue, 5 Sep 1995 15:22:37 -0700 (PDT)


	   I am continuing my effort to understand the expression of standard
	stereochemistry in the mmCIF format.  I have come across what I think
	are a couple of significant problems with the current mmCIF dictionary.

	   As an example consider the insulin tetramer structure.  The
	asymmetric unit contains four A chains and four B chains.  The
	"A" and "B" are the names of the entities.  Each of the four chains
	would be given unique names, e.g. A1, A2, A3, and A4.  The "B" chains
	might be named B1, B2, B3, and B4.  These are the names of the
	eight "asym"s.

	   I want to define links between the monomers of various asym's.
	The simplest example is, in the insulin example, the disulfide between
	residues 6 and 11 of asym B1.  I need a record which would contain the
	entity_poly_seq.num's of residues 6 and 11 of entity "B" and the
	chem_link.id of the disulfide bond.  I cannot find such a record in
	the mmCIF dictionary.

	   I would like to propose "entity_link" which would be defined
	something like this

	loop_
		_entity_link.entity_id
		_entity_link.mon_1_id
		_entity_link.mon_2_id
		_entity_link.link_typ
	B	6	11	SS

	   The insulin example is quite interesting because it also demonstrates
	a second type of linkage -- a link between asym's.  The entity_link
	record would define a link within all molecules of that entity.  The
	loop above defines a disulfide in all the B chains.  There are two
	disulfides between the A and B chains, however, and these links
	cannot be defined with entity_link.  We need in addition an asym_link
	record.  This record would contain both asym ids, both mon_id's,
	and the symmetry operator used to generate the particular symmetry
	image of the second asym.  The inclusion of the symmetry operator
	allows cross linked crystals to be described.  An example of the
	asym_link usage is

	loop_
		_asym_link.asym_1_id
		_asym_link.mon_1_id
		_asym_link.asym_2_id
		_asym_link.mon_2_id
		_asym_link.link_typ
		_asym_link.link_symmetry
	A1  7	B1  7	SS	1_555
	A2  7	B2  7	SS	1_555
	A3  7	B3  7	SS	1_555
	A4  7	B4  7	SS	1_555
	A1  19	B1  20	SS	1_555
	A2  19	B2  20	SS	1_555
	A3  19	B3  20	SS	1_555
	A4  19	B4  20	SS	1_555

	   Unfortunately, this example only uses the identity operator in
	the links.  I know from my experience with TNT, which is the only
	program I know which can handle links to symmetry images, that
	such links are occurring with increasing frequency.

	   If entity_link and asym_link are adopted chem_link must be
	changed.  It appears to me that chem_link is intended to define
	a linkage between two monomer types not particular monomers.
	My interpretation of chem_link is that it allows the definition
	that a glycine followed by a isoleucine are linked by a peptide
	bond, which is much too broad a statement to be useful.

	   This observation leads to the philosophical discussion I promised
	in my last posting.  The main point which has confused me when
	trying to understand links in mmCIF is that the examples do not
	include any, even though there are clearly a great many links in
	these polymers.

	   It appears that mmCIF is taking the same path as the PDB format
	where the peptide bond is implicitly assumed to exist whenever
	one amino acid follows another in sequence.  In addition, the
	sugar-phosphate backbone is assumed to exist in nucleic acid
	sequences.  Other links are then specified by some other mechanism
	(except in the case of the PDB where, if you have a non-disulfide 
	bond, you are out of luck.).

	   I don't like this strategy because, since the information about
	the linkages are not stored in the file, it must be built into
	the program.  As I understand it, the goal of the mmCIF format is
	to completely describe the model and the experiment.  It would be
	quite simple to include the peptide links in the entity_link
	table along with the disulfides and the other linkages.  Then
	all kinds of links would be handled in the same fashion, regardless
	of their occurrence in peptides, nucleic acids, carbohydrates, or
	an of the weird stuff that floats in our crystals.

	   If you include the peptide bond in the entity_link table it
	would be useful to be able to specify one of the links of a residue
	as the "primary" link and all the others "secondary".  This
	designation would give meaning to a statement link all residues from
	1 to 100.  The program would follow down the primary links to find
	all the residues between.  All asym_link's should be considered
	secondary so no flag is needed.

	   Putting all these suggestions together I get the following
	example of the definition of an entity_link table.  This is basically
	a transliteration of the sequence file in TNT which has proved
	to be very flexible in its ability to describe unusual geometry.

	loop_
		_entity_link.entity_id
		_entity_link.mon_1_id
		_entity_link.mon_1_id
		_entity_link.link_typ
		_entity_link.primary
	B	1	2	peptide	yes
	B	2	3	peptide	yes
	B	3	4	peptide	yes
	B	4	5	peptide	yes
	B	5	6	peptide	yes
	B	6	7	peptide	yes
	B	6	11	SS	no
	B	7	8	peptide	yes
	B	8	9	peptide	yes
** Truncated by DET to save space **


							Dale Tronrud