Bookmark and Share

Macromolecular CIF dictionary

[Workshop participants] Attendees at the Macromolecular CIF Dictionary Workshop, from left, P. Fitzgerald, D. Xue, C. Pu, P. Murray-Rust, S. Wodak, H. Ohkawa, J. Cushing, P. Bourne, D. Stampf, J. Jiang, J. Zelinka, W. Steigemann, and M. Scharf, back row: K. Watenpaugh, L. TenEyck, N. Kolchanov, G. Kleitwect, H. Berman, S. Couch, A. Brunner, N. Spadaccini, B. McMahon, J. Richelle, W. Chang, J. Westbrook, S. Moodie and I. Shindyalov. (Courtesy P. Bourme)

The first version of the macromolecular Crystallographic Information File (mmCIF) dictionary should be submitted to the IUCr COMCIFS Committee early in 1994. Currently the draft dictionary contains over 600 items describing the major features of the crystallographic and the resultant macromolecular (and biological) structure. While mmCIF provides a means of maintaining a comprehensive record of experiment, which can subsequently be archived or exported, wide use of mmCIF will only be possible if suitable software is available to produce the mmCIF data file from existing programs and maintain it.

The National Science Fndn. sponsored a workshop for development of a scheme to implement software tools for handling mmCIF data, applying modern design principles and coordinating development efforts. At the workshop P. M. D. Fitzgerald described the current status of the mmCIF dictionary and B. Toby described the powder diffraction CIF dictionary that was submitted to COMCIFS. P. Bourne described the current Data Definition Language (DDL) and M. Scharf described shortcomings in the DDL from a software developer's perspective.

Discussion revealed a difference in philosophy between crystallographers and software designers. Crystallographers view CIF as a simple way of representing the data which is easy to read by humans. Software designers view the CIF as approaching a context-free grammar from which relationships between data items as well as the data items themselves are explicitly represented: in other words, easily read by machine but not by humans. E. Dodson described her efforts to represent insulin structures as mmCIFs and P. Murray-Rust, D. Stampf, and J. Westbrook described writing parsers for mmCIF. Beyond the basic parsing of mmCIF is the issue of how best to represent mmCIF in memory. Several different approaches currently being taken were described including three different c++ class libraries (PDBlib, PDBquery, and one emphasizing graphical representation).

The mmCIF overlaps the small molecule DCIF dictionary. B. McMahon described software tools being used at the IUCr for processing CIFs, including a tool to typeset Acta papers directly from CIFs and N. Spadacinni (U. of Western Australia) described work with the Molecular Information File (MIF). The National Center for Biotechnological Information (NCBI) has adopted a different representation for DNA and protein sequence data based on Abstract Syntax Notation (ASN.1) and S. Bryant contrasted this with the mmCIF representation.

A. Brunger described future plans for X-PLOR and how mmCIF might be included. J. Cushing described her work on integrating large computational chemistry codes written in FORTRAN into an object oriented framework. At the conclusion of the workshop a draft of a minimum set of data items representing a macromolecular structure was prepared for consideration by the community at large. Further changes to the DDL were considered including the ability to reference external CIFs (i.e., related structures, additional dictionaries heterogens, ideal values for monomers, etc.). Referencing external CIFs would permit the use of local dictionaries upon which any tools resulting from the workshop dictionaries might develop to the point of including precursive methods and procedures for common crystallographic calculations which could then be read by a CIF parser, and subsequently a code generator, to produce code in a variety of programming languages.

If you are interested in knowing more about mmCIF look into the gopher hole at Columbia (cuhhca.hhmi.columbia.edu, Port 70). Some information is available via anonymous ftp from the same site.

Philip E. Bourne