Discussion List Archives

[Date Prev][Date Next][Date Index]

(13) CIF and PDB: report from Paula

Dear Colleagues

The PDB at Brookhaven National Labs was site visited in June of this year.  
Results of the site visit are not yet official, but it is widely understood 
that the review was unfavorable.  There will be a call for proposals for an 
open competition for this facility.  This call has not yet gone out, but it is 
expected shortly.  Funding will be through a multi-agency mechanism, with 
funds coming from the National Science Foundation, the National Institutes of 
Health and the Department of Energy.  NSF will administer the grant evaluation 
and awarding process.

The current PDB operation at Brookhaven will be funded on continuing funds 
during the transition.  They of course will be able to, and are fully expected 
to, respond to the call for proposals.  In anticipation of a completely 
revised application (I assume), major changes have begun to happen.  Last week 
Tom Koetzle resigned as Director, and Dave Stampf has been appointed Interim 
Acting Director, or some such title.  We are told that interviews are being 
conducted for a new Director, but I know no details about that.

One of the technical problems that we have had in finalizing CIF as a new 
format for the Protein Data Bank has been less than effective communications 
with the staff at the PDB.  The evidence of a meeting that was held last 
Friday is that this is no longer a problem.  The meeting was attended by Phil 
Bourne, Helen Berman and myself representing mmCIF and Dave Stampf, Dejun Xue 
and Nancy Oeder representing the PDB.

The goal was to see if we can identify the holes in the current mapping of PDB 
to CIF and vice versa using some of the current tools, Phil's pdb2cif awk 
script and Dejun's tool for going in the other direction.  It was a very 
productive meeting - particularly from my standpoint, as it revealed that most 
of the obvious problem are with the tools and not with the dictionary.

But one thing that came out of our discussion is very relevant to COMCIFS, and 
that is what we are currently calling the External Reference Files.  The 
particular issue is the keywords that we want to use at several levels for 
describing the asymmetric unit structure and the biological structure.  
Everyone agrees that keywords are useless unless a consistent set is employed.  
But we disagree about how to do this when mmCIF hits the streets.

The two viewpoints are the following:  Paula) The set of keywords will grow as 
science grows, and so we want to manage them in a way that will be flexible 
and easily expanded.  Therefore it would be a strategic error to embed them in 
the mmCIF dictionary as enumeration values for data items such as (for 
example) _struct_biol_keyword.  Ultimately, the keywords will be managed 
through external reference files, but since that formalism does not yet exist, 
we should enter them in the current dictionary as examples, much as we have 
done with the least-squares restraints. Everyone Else) The list of keywords 
will indeed grow, but not so fast that we cannot manage the list within the
constraints of COMCIFS management of the dictionary.  We would like to use 
ERFs for keywords, but we don't have ERFs yet, and we have to do something 
now.  Therefore, keywords must be defined in enumeration lists.

It really was all against one on this issue, and I see only two ways out of 
this.  1)  Solve the ERF problem *NOW*  2)  Give in and put the keywords in 
the enumeration list.

As an exercise, Dave Stampf looked at the current PDB files, and found about 
250 different HEADER records (the only attempt at keywording current PDB 
files).  There is strong evidence here of the problems of non-enumerated 
keywords, and my guess is that the list will reduce to about 100 after it is
cleaned up.  I will forward a copy to you.

A subtext issue in all of this is the issue of updating enumeration lists.  To 
my thinking, doing this violates the assumptions of immutability of officially 
sanctioned CIF data definitions, but we have already agreed to do this with 
the core (for example, in adding y as a synonym for yes).  This is an issue 
that COMCIFS should address right away.


List of PDB Header records

ACID ANHYDRIDE HYDROLASE
ACYL-COENZYME A BINDING PROTEIN
AIDS-RELATED VIRUS GAG POLYPROTEIN
ALPHA-AMYLASE INHIBITOR
AMINO-ACID TRANSPORT
AMINOTRANSFERASE
ANTI-HYPERTENSIVE, ANTI-VIRAL PROTEIN
ANTIBACTERIAL PROTEIN
ANTIFREEZE PROTEINS
APOPROTEIN
BACTERIAL ENCAPSULATION
BILIN BINDING
BINDING PROTEIN
BINDING PROTEINS
BIOTIN BINDING PROTEIN
CALCIUM BINDING
CALCIUM BINDING PROTEIN
CALCIUM-BINDING PROTEIN
CALCIUM/PHOSPHOLIPID BINDING
CARBON-OXYGEN LYASE
CARBOXYLIC ESTER HYDROLASE
CARBOXYLIC ESTER HYDROLASE ZYMOGEN
CARDIO PICORNAVIRUS COAT PROTEIN
CHAPERONE PROTEIN
CHEMOTAXIS
CHROMOSOMAL PROTEIN
COAGULATION INHIBITOR
COAT PROTEIN (VIRAL)
COMPLEMENT FACTOR
COMPLEX (ANTIBODY-ANTIGEN)
COMPLEX (DEOXYRIBONUCLEIC ACID/SPERMINE)
COMPLEX (LIGASE-T/RNA$)
COMPLEX (PROTEINASE-INHIBITOR)
COMPLEX (PROTEINASE/INHIBITOR)
COMPLEX(ANTIBODY-ANTIGEN)
COMPLEX(SERINE PROTEINASE-INHIBITOR)
CONTRACTILE SYSTEM PROTEIN
CONTRACTILE SYSTEM PROTEINS
CYTOKINE
CYTOLYTIC PROTEIN
CYTOTOXIN
DEFENSIN
DEHALOGENASE
DEOXYRIBONUCLEIC ACID
DNA BINDING
DNA BINDING (VIRAL)
DNA BINDING REGULATORY PROTEIN
DNA-BINDING MOTIF
DNA/RNA
ELECTRON TRANSFER (FLAVOPROTEIN)
ELECTRON TRANSFER (IRON-SULFUR PROTEIN)
ELECTRON TRANSFER(CUPROPROTEIN)
ELECTRON TRANSFER(IRON-SULFUR PROTEIN)
ELECTRON TRANSFER(IRON-SULFUR)
ELECTRON TRANSPORT
ELECTRON TRANSPORT (COPPER BINDING)
ELECTRON TRANSPORT (CYTOCHROME)
ELECTRON TRANSPORT (HEME PROTEIN)
ELECTRON TRANSPORT PROTEIN
ELECTRON TRANSPORT PROTEIN (CYTOCHROME)
ELECTRON TRANSPORT PROTEIN(CUPROPROTEIN)
ELECTRON TRANSPORT(CYTOCHROME)
ELONGATION FACTOR
ENDODEOXYRIBONUCLEASE
ENDONUCLEASE
EXCITATION ENERGY TRANSFER
EYE LENS PROTEIN
FATTY ACID SYNTHESIS PROTEIN
FATTY ACID-BINDING PROTEIN
FELIX
GENE REGULATING PROTEIN
GENE REGULATORY PROTEIN
GLYCOGEN PHOSPHORYLASE
GLYCOPROTEIN
GLYCOSIDASE
GLYCOSIDASE INHIBITOR
GROWTH FACTOR
HEME PROTEIN OF ELECTRON TRANSPORT
HEPATIC LECTIN
HISTOCOMPATIBILITY ANTIGEN
HORMONE
HORMONE (MUSCLE RELAXANT)
HYDRO-LYASE
HYDROLASE
HYDROLASE   (SERINE PROTEINASE)
HYDROLASE (ACID PROTEASE)
HYDROLASE (ACID PROTEINASE)
HYDROLASE (ASPARTIC PROTEINASE)
HYDROLASE (C-TERMINAL PEPTIDASE)
HYDROLASE (CARBOXYLIC ESTER)
HYDROLASE (ENDORIBONUCLEASE)
HYDROLASE (METALLOPROTEINASE)
HYDROLASE (NUCLEIC ACID, RNA)
HYDROLASE (NUCLEIC ACID,RNA)
HYDROLASE (O-GLYCOSYL)
HYDROLASE (PEPTIDE)
HYDROLASE (PHOSPHORIC DIESTER)
HYDROLASE (PHOSPHORIC DIESTER, RNA)
HYDROLASE (PHOSPHORIC MONOESTER)
HYDROLASE (PROTEINASE)
HYDROLASE (SERINE PROTEINASE)
HYDROLASE (SULFHYDRYL PROTEINASE)
HYDROLASE ZYMOGEN
HYDROLASE ZYMOGEN (SERINE PROTEINASE)
HYDROLASE(ACID PROTEINASE ZYMOGEN)
HYDROLASE(ACID PROTEINASE)
HYDROLASE(ACTING IN CYCLIC AMIDES)
HYDROLASE(ACTING IN CYCLICAMIDINES)
HYDROLASE(ACTING ON ACID ANHYDRIDES)
HYDROLASE(ALPHA-AMINOACYLPEPTIDE)
HYDROLASE(ASPARTIC PROTEINASE)
HYDROLASE(ASPARTYL PROTEINASE)
HYDROLASE(CARBOXYL ESTER)
HYDROLASE(CARBOXYLIC ESTER)
HYDROLASE(CARBOXYLIC ESTERASE)
HYDROLASE(ENDORIBONUCLEASE)
HYDROLASE(O-GLYCOSYL)
HYDROLASE(SERINE PROTEINASE)
HYDROLASE(ZYMOGEN)
HYDROLASE-TRANSPEPTIDASE
HYDROPHOBIC SEED PROTEIN
IMMUNOGLOBULIN
IMMUNOGLOBULIN BINDING PROTEIN
IMMUNOGLOBULIN(PART)SEQUESTERS ANTIGENS
IMUNOGLOBULIN
INFLUENZA VIRUS HEMAGGLUTININ
INSULIN AND INSULIN-LIKE HORMONES
INTEGRAL MEMBRANE PROTEIN PORIN
INTRAMOLECULAR OXIDOREDUCTASE
IRON STORAGE
IRON TRANSPORT PROTEIN
ISOMERASE
ISOMERASE(INTRAMOLECULAR ALDOSE/KETOSE)
ISOMERASE(INTRAMOLECULAR LYASE)
ISOMERASE(INTRAMOLECULAR OXIDOREDUCTASE)
ISOMERASE(INTRAMOLECULAR OXIDOREDUCTSE)
LECTIN
LECTIN (AGGLUTININ)
LEUCINE ZIPPER
LIGASE (SYNTHETASE)
LIGASE(AMIDE SYNTHETASE)
LIGHT HARVESTING PROTEIN
LIPOCALIN
LIPOPROTEIN
LYASE (ALDEHYDE)
LYASE(ACTING ON POLYSACCHARIDES)
LYASE(CARBON-CARBON)
LYASE(CARBON-OXYGEN)
LYASE(OXO-ACID)
LYMPHOKINE
METALLOTHIONEIN
NEUROTOXIN
NEUROTOXIN (POST-SYNAPTIC)
NUCLEIC ACIDS (DNA/RNA)
NUCLEOTIDYLTRANSFERASE
ONCOGENE PROTEIN
OXIDO-REDUCTASE
OXIDO-REDUCTSE(ALDEHYDE/DONR,NAD/ACCPT)
OXIDOREDUCTASE
OXIDOREDUCTASE (/NAD$(A)-ALDEHYDE(D))
OXIDOREDUCTASE (CH-NH(D)-NAD OR NADP(A))
OXIDOREDUCTASE (CH-OH(D)-CYTOCHROME(A))
OXIDOREDUCTASE (FLAVOENZYME)
OXIDOREDUCTASE (H2O2 (A))
OXIDOREDUCTASE (H2O2 ACCEPTOR)
OXIDOREDUCTASE (H2O2(A))
OXIDOREDUCTASE (INCORPORATION OF O2)
OXIDOREDUCTASE (NAD(A)-CHOH(D))
OXIDOREDUCTASE (OXYGEN(A))
OXIDOREDUCTASE (QUINOPROTEIN)
OXIDOREDUCTASE (SUPEROXIDE ACCEPTOR)
OXIDOREDUCTASE(ACTING ON NADH OR NADPH)
OXIDOREDUCTASE(ALDEHYDE(D)-NAD(A))
OXIDOREDUCTASE(CHNH(D)-NAD+ OR NADP+(A))
OXIDOREDUCTASE(CHOH(D)-NAD(A))
OXIDOREDUCTASE(H2O2(A))
OXIDOREDUCTASE(NAD(A)-CHOH(D))
OXIDOREDUCTASE(NADP+(A),FERREDOXIN(A))
OXIDOREDUCTASE(NITRIC OXIDE(A))
OXIDOREDUCTASE(OXYGENASE)
OXIDOREDUCTASE, CHOH DONOR, NAD ACCEPTR
OXIDOREDUCTASE/ELECTRON TRANSPORT
OXO-ACID-LYASE
OXYGEN BINDING
OXYGEN STORAGE
OXYGEN TRANSPORT
OXYGEN TRANSPORT PROTEIN
PANCREATIC HORMONE
PARVOVIRUS COAT PROTEIN
PENTOSYLTRANSFERASE
PEPTIDE ANTIBIOTIC
PERIPLASMIC BINDING PROTEIN
PHOSPHORIC DIESTER HYDROLASE
PHOSPHOTRANSFERASE
PHOSPHOTRANSFERASE (CARBOXYL ACCEPTOR)
PHOSPHOTRANSFERASE(CARBOXYL AS ACCEPTOR)
PHOTORECEPTOR
PHOTOSYNTHETIC REACTION CENTER
PICORNAVIRUS
PLANT SEED PROTEIN
PLANT SEED STORAGE PROTEIN (VICILIN)
PLASMINOGEN ACTIVATOR
POSTSYNAPTIC NEUROTOXIN
PRELIMINARY
PRESYNAPTIC NEUROTOXIN
PROTEIN INHIBITOR
PROTEINASE INHIBITOR
PROTEINASE INHIBITOR (CHYMOTRYPSIN)
PROTEINASE INHIBITOR (KAZAL)
PROTEINASE INHIBITOR (SUBTILISIN BPN*)
PROTEINASE INHIBITOR (TRYPSIN)
REDUCTASE
RETINOL TRANSPORT
RHINOVIRUS COAT PROTEIN
RIBONUCLEIC ACID
RIBOSOMAL PROTEIN
SEA ANEMONE TOXIN
SERINE PROTEINASE
SERINE PROTEINASE INHIBITOR
SERPIN
SIGNAL TRANSDUCTION PROTEIN
SITE-SPECIFIC RECOMBINASE
STEROID BINDING
STRUCTURAL (CELL ENVELOPE COMPONENT)
SWEET TASTING PROTEIN
SYNTHETIC PROTEIN MODEL
T-CELL SURFACE GLYCOPROTEIN
TEXTURE OF CONNECTIVE TISSUE
THYMIDINE PHOSPHORYLASE
TOXIN
TOXIN (HEMOLYTIC POLYPEPTIDE)
TRANSCRIPTION REGULATION
TRANSFERASE
TRANSFERASE (ACYLTRANSFERASE)
TRANSFERASE (CARBAMOYL-P,ASPARTATE)
TRANSFERASE (METHYLTRANFERASE)
TRANSFERASE (METHYLTRANSFERASE)
TRANSFERASE (PHOSPHORYL)
TRANSFERASE (PHOSPHOTRANSFERASE)
TRANSFERASE(AMINOTRANSFERASE)
TRANSFERASE(PHOSPHO,ALCOHOL ACCEPTOR)
TRANSFERASE(PHOSPHORYL,ALCOHOL ACCEPTR)
TRANSFERASE(PHOSPHOTRANSFERASE)
TRANSFERASE(THIOSULFATE,CYANIDE SULFUR)
TRANSPORT (THYROXINE,RETINOL) IN SERUM
TRANSPORT AND PROTECTION PROTEIN
TYPE 1 COPPER PROTEIN
VIRUS
ZINC FINGER /DNA$ BINDING DOMAIN
ZINC FINGER DNA BINDING DOMAIN

- Paula