Hello again - It seemed useful to me to make a distinction between that which I have done and that which remains to be dealt with, which is why I am keeping them in separate messages. The discussion in the message tend to be longer, so what I will do is introduce each new section with my usual separator of - - - - - Frances Bernstien writes: Under save__atom_site.calc_flag you have ; A standard code to signal if the site data has been determined by diffraction data or calculated from the geometry of surrounding sites, or has been assigned dummy coordinates. The abbreviation 'c' may be used in place of 'calc'. ; _item_enumeration.detail d 'determined from diffraction measurements' calc 'calculated from molecular geometry' c 'abbreviation for "calc"' dum 'dummy site with meaningless coordinates' My personal suggestion would be not to allow any abbreviations here. Seeing 'c' I probably would think 'calculated' but seeing 'd' one could easily think 'dummy' instead of 'determined' or 'diffraction'. I would suggest 'det' (or 'diff' or 'data' or 'meas'), 'calc', and 'dum' as the codes. My personal preference would be 'meas'. If you really want one letter codes then why not use 'm' or 'meas', 'c' or 'calc', and 'd' or 'dum'? - - I think there is some history here, and we have to keep c and calc in order to be able to read files written under the definitions in the original CIF core dictionary. But this will take some looking into. - - - - - Frances Berstein writes: I am trying to understand how mmCIF handles microheterogeneity because we have already had that in at least one entry. After looking at the dictionary I have a few questions: 1. The item _entity_poly_seq.hetero is described as ; A flag to indicate whether or not this monomer in the polymer is heterogeneous in sequence. This would be a rare phenomenon. and it is not mandatory. Shouldn't it be mandatory if there is microheterogeneity? This leads to a more general issue: I could only find yes or no as values in the _item.mandatory_code fields throughout the dictionary. Should there be a way to show that something is mandatory under certain conditions. (Note also that microheterogeneity does not occur often in PDB entries but I think "rare" might be too extreme.) 2. I am not completely clear on how you propose to handle microheterogeneity When I look at save_ENTITY_POLY_SEQ _category.description ; Data items in the ENTITY_POLY_SEQ category specify the sequence of monomers in a polymer. Allowance is made for the possibility of microheterogeneity in a sample by allowing a given sequence number to be correlated with more than one monomer id - the corresponding ATOM_SITE entries should reflect this heterogeneity. it seems to say that, in the case of microheterogeneity one should repeat _entity_poly_seq.num with the same residue number for each possible residue in the case of microheterogeneity, as follows: ENTITY_POLY_SEQ loop_ _entity_poly_seq.entity_id _entity_poly_seq.num _entity_poly_seq.mon_id 1 1 ALA 1 2 GLY 1 3 SER 1 3 VAL 1 4 PRO in the case of there being SER/VAL microheterogeneity at residue 3. If this the representation that is intended, then there appears to be a conflict with save__entity_poly_seq.num _item_description.description ; The value of _entity_poly_seq.num must uniquely and sequentially identify a record in the ENTITY_POLY_SEQ list. Note that this item must be a number, and that the sequence numbers must progress in increasing numerical order. which does not allow for a number to be repeated. If I understood the intended representation of microheterogeneity in the entity_poly_seq section, then should the atom_site information basically be loop_ _atom_site.group_PDB _atom_site.type_symbol _atom_site.label_atom_id _atom_site.label_comp_id _atom_site.label_asym_id _atom_site.label_seq_id _atom_site.label_alt_id _atom_site.cartn_x _atom_site.cartn_y _atom_site.cartn_z _atom_site.occupancy _atom_site.B_iso_or_equiv _atom_site.footnote_id _atom_site.entity_id _atom_site.entity_seq_num _atom_site.id ATOM N N SER A 3 . 23.664 33.855 16.884 1.00 22.08 . 1 3 17 ATOM C CA SER A 3 . 22.623 34.850 17.093 1.00 23.44 . 1 3 18 ATOM C C SER A 3 . 22.657 35.113 18.610 1.00 25.77 . 1 3 19 ATOM O O SER A 3 . 23.123 34.250 19.406 1.00 26.28 . 1 3 20 ATOM C CB SER A 3 . 21.236 34.463 16.492 1.00 22.67 . 1 3 21 ATOM N N VAL A 3 . 23.664 33.855 16.884 1.00 22.08 . 1 3 22 ATOM C CA VAL A 3 . 22.623 34.850 17.093 1.00 23.44 . 1 3 23 ATOM C C VAL A 3 . 22.657 35.113 18.610 1.00 25.77 . 1 3 24 ATOM O O VAL A 3 . 23.123 34.250 19.406 1.00 26.28 . 1 3 25 ATOM C CB VAL A 3 . 21.236 34.463 16.492 1.00 22.67 . 1 3 26 I particularly care about the fields: _atom_site.label_comp_id _atom_site.label_seq_id _atom_site.entity_seq_num - - Frances rightly points out that there is a problem with our current mode of representing microheterogeneity. I thought we had done with correctly when the data items were first created, but later I had one of those horrible realizations that the pointers were not clean in this regard. I'm still not sure how to solve the problem, but we will eventually find a way. - - - - - Frances Bernstein writes: In file http://ndbserver.rutgers.edu/mmcif/examples/BDLB13.cif Helen has a residue +A in the field _atom_site.label_comp_id. She also has a section for CHEM_COMP that includes +A and describes it. The mmCIF dictionary description says that _atom_site.label_comp_id is a pointer to _chem_comp.id in the CHEM_COMP cetegory. When I look at save__chem_comp.id in the dictionary it says ; The value of _chem_comp.id must uniquely identify each item in the CHEM_COMP list. For protein polymer entities, this is the three-letter code for amino acids. For nucleic acid polymer entities, this is the one-letter code for the bases. Thus I am puzzled by the fact that the entry used +A when the dictionary appears to say that this field should be the one-letter code. Or should the dictionary be modified to allow things like +A? - - Here we can probably solve the logical problem just by rewording the definition, but I want a chance to consult with Helen about this before doing something that still might not be correct. - - - - - Eldon Ulrich writes - I have a few questions on constructing chemical structures. 1. How would a mixed polymer of nucleic acids and deoxynucleic acids be described? Would one type of monomer be considered standard and the others given non-standard ids that would then be linked to the standard structures. 2. Within the ENTITY and CHEM_LINK_BOND sections it does not seem possible to describe how a non-standard amino acid is linked to adjacent monomers. For example how to describe iso-aspartyl group linked through the side-chain carboxyl to the following amino acid. I could not find away to get from this section back to a specific set of two residues in the sequence of a polymer. - - I think I can answer Eldon's questions by just sitting calmly for a moment and thinking about them, but I don't have that moment right now, so this too will have to wait. - - - - - I also have pending a series of questions from Dale Tronrud, but since I haven't even begun to think about them yet, I haven't included them in this summary. If you guys have thoughts about the issues outlined above, don't be shy about letting us know. Talk to you all soon. Paula ******************************************************************************** Dr. Paula M. D. Fitzgerald ______________ voice and FAX: (908) 594-5510 Merck Research Laboratories ______________ email: paula_fitzgerald@merck.com P.O. Box 2000, Ry50-105 ______________ or bean@merck.com Rahway, NJ 07065 USA (for express mail use 126 E. Lincoln Ave. instead of P. O. Box 2000) ********************************************************************************