Discussion List Archives

[Date Prev][Date Next][Date Index]

(22) For information: NMR Information File

Dear Colleagues

Apologies again for the relative dormancy of COMCIFS over the last several
weeks (though I have not been aware of any severe complaints about this!).
This mailing reviews developments relating to the establishment of an NMR
information file; I hope to send off another before the Atlanta meeting to
tidy up some loose ends and summarise our current status.

First, I should like to put on the record here Peter's comments about the use
of CIF within Roche:

PMR> ...  I evangelised about CIF as a 
PMR> general format for molecular structure at the Molecular Graphics Society 
PMR> last year, and as a result of that Roche decided to use it as an internal 
PMR> de facto standard.  They are very pleased with it, and have written an 
PMR> extensive dictionary for molecular modelling.

While this is not an initiative we need become involved in at this stage, I
think it's useful to be aware of groups who are working on specific
applications, and to encourage cross-fertilization of ideas where
appropriate.

In the same spirit, we should be aware of the NMR Information File project
(NMRIF), and in this case our involvement may become more than passive. In
April, Eldon Ulrich, Director of the Biological Magnetic Resonance Data Bank
(based at University of Wisconsin-Madison) asked the Executive Committee of
the IUCr for formal permission to use the STAR format in developing a data
exchange file which was intended to be modelled on and compatible with CIF.
The EC considered this request at their meeting in Chester last week, and
gave approval for such use. The EC is keen that Eldon (or some other
technical representative from this project) be associated with the work of
COMCIFS to ensure that file compatibility is maintained, and to enhance good
relations between the IUCr and people working in relevant fields of NMR
structural work.

David Brown attended the Monterey workshop at which the technical issues
behind this were addressed, and he reported his reactions to the meeting
thus, in a message to Mike Dacombe last May:

D> Ulrich's organisation (the NMR databank) is
D> establishing a database of NMR-determined protein structures and they are
D> at about the stage that the Cambridge Database was in the late '60s -
D> struggling with a small budget, unsure where they are going, and
D> unfortunately not having a strong international organisation like the IUCr
D> to support them - in short, they need a lot of encouragement.  The PDB has
D> been archiving NMR structures, but their format concentrates only on
D> structure and misses many of the other features that NMR experiments
D> offer.  Further, the way in which an NMR structure is presented (in terms
D> of interatomic distances) does not fit well with the the PDB way of
D> describing structure (in terms of atomic coordinates) - NMR measurments
D> are made in solution and the PDB is set up to describe crystal structures. 
D> We should encourage the development of the NMR database, and should ensure
D> the maximum compatibility between the information they have and the
D> information stored in the PDB.  Compatibility must be at two levels, the
D> level of concepts and the level of file structures.  Clearly the concept
D> level is the most important, because this allows information in one
D> database to be compared directly with information in the other. 
D> Compatibility of file structures is also desirable because it allows files
D> from both databases to be merged and read by a single application 
D> program...
D> 
D> 	It was clear at Monterey that the NMR people were very interested
D> in the STAR file structure, although they were careful to leave the option
D> open that they might go a different route.  I got the impression that
D> there was good collaboration between the PDB and the NMR database and the
D> NMR group were anxious to build on the extensive work that has gone into
D> the development of the macromolecular CIF dictionary...
D> 
D> 	In short, the IUCr should encourage the NMR database to adopt the 
D> STAR format and to work closely with the PDB and the macromolecular CIF 
D> working groups to develop a suitable dictionary of datanames.

I subsequently received a detailed report on the meeting from Eldon, which
should be of interest to those not present:

> The following report for the NMR data exchange and archiving workshop was
> written by the organizers.
>  
>      A workshop entitled "Biological Macromolecular NMR Data Exchange and
> Archiving" was held on April 10th at the Doubletree Hotel in Monterey, CA.
> Thirty-five scientists attended representing the NMR, crystallographic, and
> database communities.  The workshop was organized as a collaborative effort
> between the NMR and crystallographic communities to address the following
> issues:  1) Does the NMR community support the development of a uniform
> data transmission format?  2) What is the Crystallographic Information File
> (CIF) format, data dictionary, and data definition language?  3) Is the CIF
> format a suitable model for the development of an NMR data exchange
> mechanism?  4) What experimental, spectral, and derived NMR data items are
> important to transmit and archive?  5) What form of organized effort should
> the NMR community mount to develop the tools and file formats required for the
> exchange of NMR data?
>      The workshop consisted of two presentation sessions and an open
> discussion.  Opening the first session chaired by Joel Sussman, John Markley
> presented an overview of why the NMR community needs a standard NMR
> data transmission and archiving format.  Helen Berman followed with an
> overview of the efforts of the crystallographic community to create a standard
> format for crystallographic data. The rest of the morning session consisted of
> presentations on the background and current status of the CIF format and data
> structure description.  David Brown gave an introduction to CIF as a subset of
> the STAR (Self-defining Text Archival and Retrieval) format.  He also
> presented some of the enhancements of the format that have been the
> outgrowth of meetings of the macromolecular community to formalize the
> relationship of items and groups of items in a more rigorous way.  Keith
> Watenpaugh expanded on this in the mmCIF (macromolecular CIF) illustrating
> how a complete set of definitions is being developed that will allow for
> archiving of the complete crystallography experiment, including
> crystallization, data collection, structure solution, refinement, and
> structure description. Whether the atomic coordinates come from
> crystallography or n-dimensional NMR, a molecular structure description
> that allows computer programs and the non-crystallographer to extract a
> variety of valuable information will be one ofthe key elements of the
> 3-dimensional macromolecular data base.  Phil Bourne
> explained some of the efforts that are going into defining more carefully the
> data definition language and the development of software for creating,
> converting, and manipulating CIF files using the language and the
> crystallographic and structure definitions.  This software would benefit
> everyone who uses a common data format.  Phil also provided participants prior
> to the workshop drafts of two manuscripts he is preparing.  These manuscripts
> outline the CIF data dictionary and data definition language and their use
> with 3-dimensional macromolecular data.  David Stampf then described the
> Brookhaven Protein Data Bank's (PDB) progress on converting their original
> format, which has been used for over two decades, to the CIF format.  Current
> efforts are centered around cleaning up the PDB data sets.  Once the mmCIF
> standard is approved, the process of transferring the present PDB data into 
> CIF format can proceed.  While it will be possible to convert data back to the
> PDB format from the new mmCIF format, information will be lost, since the old
> format cannot accommodate much of the additional information that will be
> part of the CIF data files.
>      After lunch, Beverly Seavey described the current structure, content, and
> data entry methods of the BioMagResBank (BMRB).  Eldon Ulrich presented
> the list of information that BMRB ultimately should contain and proposed a
> procedure for creating an NMR data dictionary modeled on the macromolecular
> crystallographic data dictionary that could be used for user entry of data to
> BMRB.  Gerhard Wagner then led a discussion of the issues raised during the
> presentations.
>      Discussions over lunch, again over dinner, and at the Experimental NMR
> Conference which followed the workshop demonstrated a strong level of
> interest among the attendees in developing a standard for NMR data exchange.
> The presentations on CIF clearly defined its format, the labor that has gone
> into creating it, and the work still to be carried out to make the format
> useful to macromolecular crystallographers. mmCIF is still evolving and may
> continue to undergo incremental changes for some time to come.  The final
> adopted version of the mmCIF will represent a compromise and may not appear
> perfect from any single point of view.  However, many crystallographers are
> convinced that CIF provides a workable solution to the problem of standard
> data exchange.  The NMR spectroscopists definitely expressed a uniform
> sentiment that the CIF format used by crystallographers should be followed in
> constructing a compatible NMR data dictionary.  There was no stated interest
> in starting from scratch to build a unique NMR data file format.  With the PDB
> shifting to data submission utilizing a CIF format, the NMR community must
> move quickly to develop at least a minimal data dictionary that will allow the
> deposition and archiving of macromolecular structures derived from NMR data.
> In addition, the BMRB intends to expand its content to include experimental
> information, coupling constant, relaxation, kinetic, and thermodynamic data
> generated by NMR experiments.  To do this will require establishing methods
> for authors to submit their data in a format that can be automatically loaded
> into the database.  The CIF format will need to be exercised using examples
> encompassing this broad range of information to determine if the format can
> accommodate all of the data.  From the presentations and discussions, it was
> clear that if problems arise in describing NMR data in a CIF like format, the
> crystallographic community would be amenable to alterations or extensions to
> CIF.  This will be very important in maintaining and expanding CIF as a
> standard representation for a wide variety of data.
>      The feeling was expressed several times that the crystallographic
> community has been better organized than the NMR community, and perhaps as a
> result it has been easier for the crystallographic community to agree upon
> a set of standards.  The presentations made it clear, however, that the
> creation of CIF and its macromolecular extensions has not been a
> community-wide effort.  It has been the product of a small dedicated
> group who have devised a reasonable
> set of standards, and are now revising them with input from the scientific
> community and are working to convince the community of the value of
> adopting the standards.  After dinner, Flemming Poulsen (Carlsberg
> Laboratory) and David Wishart (Univ. of Alberta) volunteered to join Eldon
> Ulrich (Univ. of Wisconsin) and John Markley (Univ. of Wisconsin) in writing
> an NMR data dictionary.  Jim Prestegaard (Yale Univ.), Vladimir Basus
> (UCSF), and Jeffery Hoch (Rowland Institute) have also expressed interest in
> critically reviewing the NMR data dictionary as it is being developed.  These
> individuals will provide the core needed to begin the next step in creating a
> standard format for the exchange of NMR data that is compatible with
> that presented by the crystallographic community.
>    This workshop has been a positive first step in the long process that will
>  be required to create a standard format for data exchange.  It is only the
> first step, and much hard work will be needed to develop a dictionary and
> format simple enough and clean enough to encourage its community-wide
> acceptance as a standard.  The process needs to be presented to the NMR
> community at every possible opportunity, and the community needs to see
> that real and vigorous progress is being made in producing it. While the
> NMR data that are stored are different from the structure data stored
> in the Protein Data Bank, it is no more different than the x-ray diffraction
> and crystallization data is.  The true test of the STAR/CIF, or any
> other standard file structure, will be its ability to accommodate
> the full range of data needed to describe both the x-ray and NMR
> experiment as well as the results they generate.

David has been in further e-mail correspondence with Eldon, encouraging this
initiative, and defining our terms of reference and the way in which the
NMRIF development committee might best work with COMCIFS. One point of
practical interest is that the NMRIF committee proposes to post drafts of its
work in progress on the ftp server yola.nmrfam.wisc.edu, in the directory
/NMRIF, and Eldon is inviting comment and criticism of this work (though, as
David points out to him, it is unlikely that any of us shall have time in
hand to be looking over his shoulder).

Regards
Brian