This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
[IUCr Home Page]


IUCr 1995 Report - Commission on Crystallographic Data

Cambridge Structural Database

The CSD continues to be available to all academic institutions via the 32 affiliated National Data Centres. It is also supplied to industrial companies, predominantly in the pharmaceutical sector, where it is used in drug design. The availability of the CSD on CD has resulted in a steep increase in the number of academic sites registered. CCDC Web pages are being increasingly used, and the information is continually being upgraded (see http://www.ccdc.ac.uk).

(A1)Database contents. The database aims to include all published organic and metal-organic complex structures and now contains 152464 data entries (April 1996 release). The CSD main-file growth in the last year has been 11194 new entries.

The database of `CSD Usage' continues to grow. This is a collection of Abstracts for scientific papers making significant use of the CSD. This is searchable by Quest on keywords and the text of the Abstract. There are 538 Abstract entries in the April 1996 release. A separate database is distributed for 3897 Protein Data Bank entries. Detailed statistics are available from CCDC on request.

(A2) Protein Data Bank (PDB). Since April 1995, the CSD release has included the coordinate data for the Brookhaven PDB entries in compressed format on CD-ROM. It should be emphasized that these coordinates are not searchable by Quest as part of the 3D-search facility, rather they are provided as a convenient package for those users who search the PDB sequence information with Quest and wish quickly to extract and view the relevant hits. A utility program, pdbget, is supplied to retrieve the entries in PDB format. The PDB coordinates are provided on a separate CD for Unix and on a single CD for VMS systems.

The Quest program allows sequence searching via menu options and an automatic link to the coordinate file is provided in the form of a RASMOL button. This allows the user to view a current hit immediately in a RASMOL window.

(A3) Platforms. The CSD is supported on VAX/VMS, DEC Alpha VMS, DEC Alpha OS/F, SGI (Unix) and SUN (Unix) platforms. The distributed CDs also contain executables for IBM RS/6000, HP 700 Unix, DEC Alpha OS/F, which are created on machines remote to CCDC and not supported to the same degree. Instructions are provided that enable the software to be compiled on almost any Unix system.

Distribution is now predominantly on the CD medium, though a few sites still require tape versions. This has greatly facilitated the procedure of release at CCDC (which is scheduled for April and October of each year).

(A4) CSD-MDL Database. This is a version of CSD in the form of an MDL registered database, which is searchable by the MDL MACCS and ISIS software. It is provided to certain users who have licences for MACCS and CSD. Update procedures have been written so that CCDC supplies a file of updated records in the SD-file format to the user, who then updates the local copy of the CSD-MDL. Some users have been supplied with the SD file direct for the complete database for loading into their own database systems. CCDC has invested in extra hardware to support the production of CSD-MDL and releases are now up to date with the main CSD database.

(A5) CSD-UNITY Database. This is a version of the CSD converted into a Tripos UNITY database. Software has been written that produces data entries in the Tripos SLN format, suitable for loading to UNITY. This version is available to registered users of CSD who also have a Tripos UNITY license.

(A6) Software. The CSD is provided with: Quest3D, the search program, including 3D searching, and intermolecular contacts; VISTA, a program for visual display of statistics on geometry parameters extracted by Quest3D; PLUTO, a program for visualization of structures, especially intermolecular contacts in the crystal.

Quest has been updated with a new output file format (MOL2) as used by the modelling program Sybyl (from Tripos) and widely used in other contexts. Quest now also allows output of simple coordinate lists for atoms, both orthogonal and fractional, and optionally by fragment selection, as in the old GSTAT program (COOR option).

The VISTA program has been significantly improved in the last year, with spread-sheet facilities, polar histograms/scatter plots and PostScript output.

PLUTO now also allows output of coordinate lists in the GSTAT style. This is especially useful for dumping of packing programs.

(A7) PreQuest. This new program has the aim of providing a facility to the crystallographic community that accepts a variety of input formats for data and creates a local CSD database. It is also the program used for in-house validation of all new data entries at CCDC. This is due for release in October 1996 and has been through extensive beta testing at certain non-CCDC sites.

The main objective is to provide users with local databases in CSD format, which are searchable by the Quest program, concatenated with the main CSD if desired. This is particularly important to industry where many companies have as many as 1000 structures that are currently not releasable to the public domain. It is important often to academics also to be able to perform the same geometry calculations on new compounds, pre-publication, as are relevant to their current research using the main CSD. This also provides the user with a definitive check of their data before deposition at CCDC or publication. Particular emphasis has been placed on the reading of the standard CIF format.

The PreQuest program also allows input of non-crystallographic Cartesian data from whatever source, e.g. MO calculations, molecular modelling.

(A8) Data deposition. Since August 1995, the CCDC has been inviting the electronic deposition of structures as private communications. There has been no restriction on the permitted format and a `deposition form' has been made available on the CCDC WWW page (see http://www.ccdc.ac.uk). In practice, there has been a steady growth in the percentage of CIF depositions (33%), which is preferred by CCDC. These files may be read directly by the PreQuest program and after checking can be archived to the main database. It is hoped that the number of depositions will continue to grow; the current estimate is about 500 per annum.

(A9) The IsoStar Project. A new development project is under way. This is an organized library of intermolecular contacts for a selected range of chemical groups of biological interest. Automatic scans are made of the CSD for each group and the contacts stored so that they may be viewed interactively. The resulting 3D distribution of contacts is very valuable in providing a picture of the most likely interactions to occur between groups, which is relevant to drug design and protein-ligand interaction. It is intended to add contact distributions taken from ligands in the protein data from the PDB. After consultation with collaborators, it is likely that the results should be available in an easy-to-use format such as a WWW browser. The scatter plots and vector plots will probably be accessed by menu under Netscape and the 3D visualization performed by RASMOL. It should be noted that this project is still at the prototype stage but should be substantially complete by the end of 1996.

(A10) Version 6. A development of the CSD system is under way, under the title Version 6. This will provide a more flexible interface for the user, especially in the areas of query formation and browsing of hits. New software is being written for the interface, which will drive the non-graphic search engine Quest with the same functionality as Version 5. There will be emphasis on ease of transfer of results to other software.

A second aspect of Version 6 will be a redesign of the internal storage of the database fields. This is so that the CSD will be able to accommodate new data fields if necessary and remove most of the internal hard-coded limitations of the present system (e.g. the maximum number of atom sites is 1000).

This is at an early stage of development, which will run over a two-year time scale.

International Centre for Diffraction Data

(B1) Databases. The set 45 Powder Diffraction File products were released in August 1995. The set 46 products are on schedule for August 1996 release. The 1994 release of NIST Crystal Data is available, and the 1995 release is currently undergoing alpha testing. CD-ROMs containing both the PDF and CD are no longer available because the size of the combined databases exceeds the capacity of a single CD-ROM.

The (MS Windows) search/retrieval program PCPDFWIN has become very popular and a new PC Search Index, to assist in phase identification using the Hanawalt and Fink techniques, has been developed (available in August 1996). Plans are to discontinue development and support of DOS-based software and make the source code for discontinued programs available.

The ICDD continues to archive powder patterns in CIF/STAR format. A pilot full-pattern data set of clay patterns is undergoing testing.

The ICDD awarded 37 Grants-in-Aid worldwide, with the expectation of generating new entries for the Powder Diffraction File and welcomes additional grant applications.

(B2) Science. Poster sessions highlighting new developments in powder diffraction were held in the October 1995 and March 1996 ICDD meetings. Clinics in X-ray diffraction and X-ray fluorescence were held at ICDD headquarters in June 1995 and 1996. Four $2000 crystallographic scholarships were awarded. Plans have been made to expand the coverage of the PDF by the inclusion of calculated powder patterns. Recommendations for the calculation of powder patterns have been derived.

An ICDD WWW site (http://www.icdd.com:7999/) has been established. It is planned to develop it into the electronic centre for powder diffraction - including product information, bulletin boards, discussion groups, public domain software and other features. Its development is being guided by an Electronic PDF Committee.

(B3) Organizational items. A new Board of Directors, chaired by R. L. Snyder, was elected. One of the major objectives of the ICDD is to increase collaboration with other database organizations. R. Jenkins has been named General Manager of the ICDD. The ICDD hopes to expand greatly its worldwide membership.

NIST Crystal and Electron Diffraction Data Center

(C1) Database. The NIST Crystal and Electron Diffraction Data Center is concerned with the collection, evaluation and dissemination of data on solid-state materials. The Data Center maintains a comprehensive database with chemical, physical and crystallographic information on all types of well characterized substances. These materials fall into the following categories: inorganics, organics, organometallics, metals, intermetallics and minerals. During this year, the master database has been significantly augmented with respect to all categories of materials and now contains approximately 227000 entries. From this central database, two distribution databases are produced: (1) NIST Crystal Data and (2) the Electron Diffraction Database. These databases are made available through computer-oriented modes of dissemination including PC, scientific instruments and on-line searching.

The major project in 1995 was the organization of the NIST-sponsored Workshop on Crystallographic Databases, an international meeting of representatives of all crystallographic data centers, journal editors, instrument manufacturers and database users. Manuscripts from all speakers have been collected and edited and will be published in the NIST Journal of Research. A number of the manuscripts describe the use of NIST data products including, for example, a manuscript entitled Using NIST Crystal Data within Siemens Software for Four-Circle and Smart CCD Diffractometers. In this paper, it is emphasized that the diffractometer/NIST-database combination creates a new analytical tool for materials research and analysis.

(C2) NIST Workshop on Crystallographic Databases. The NIST Workshop on Crystallographic Databases was one in a series of NIST-sponsored workshops, each focusing on a particular type of data including crystallographic, thermodynamic, phase diagram and mass spectral data. Scientific databases are becoming critical to research in the industrial and academic communities. By bringing together top scientists involved in producing crystallographic data with users of the resulting databases, this Workshop served as a forum to examine how well the scientific community is being served and what data activities the community feels are important in the future. This Meeting was sponsored by the National Institute of Standards and Technology and laid the ground work for future directions with respect to crystallographic databases.

A main goal of the Workshop was to foster interactions between users and providers of crystallographic databases and between the communities that use the different databases. During the Workshop, three sessions of scientific presentations were held: Formal Data Activities; Scientific Uses of the Databases; Data Transfer: Ensuring State of the Art Technology.

In the first session (Chair: D. Watson), a representative from each of the data centres covered present activities and projected future activities of the data centre. In the second session (Chair: C. Brock), the focus was on using crystallographic databases in analysis, in the production of materials properties and in the design of new chemicals, pharmaceuticals and materials. In the third session (Chair: B. McMahon), speakers addressed issues related to data transfer such as: (1) data exchange standards (CIF etc.); (2) the role of journals in the evaluation of published data; (3) data exchange between journals and crystallographic data centres; (4) computerized modes of data dissemination. Following the presentations, a discussion session (Chair: J. Flippen-Anderson) focused on Barriers to the Use of Crystallographic Data and on Partnerships for the Future. Workshop proceedings will be published in a special issue of the NIST Journal of Research.

As anticipated, the Workshop was of special interest to those who use crystallographic data in their research or are involved with this data in some other capacity, such as managers of scientific projects, journal editors, on-line system designers, instrument manufacturers and librarians, among others. In spite of the fact that the crystallographic databases have been in existence for many years, the workshop represents the first time that all parties have met to discuss common issues. Many attendees commented that the Workshop was very instructive, extremely useful and that it should be repeated in a few years. In addition to the invitees, many scientists have expressed a strong interest in the subject and have requested a copy of the Workshop Proceedings.

During the Workshop, representatives from the data centres as well as speakers in the third session noted that the modes of data collection and evaluation are in transition. Data transfer is facilitated by the rapid acceptance of standardized Crystallographic Information Files (CIF). Both IUCr and American Chemical Society journals use CIFs as an integral part of their publication process. With respect to data and software exchange, the data centres are establishing close cooperative ties with each other as well as with instrument manufacturers, scientific journals and users. These interactions are deemed critical to ensure the production and availability of high-quality data at an ever increasing rate. To further coordinate such efforts and to meet future challenges, the possibility of establishing a more formal federation between the data centres was discussed. It was clear to all participants that the crystallographic databases have become critical to research and analysis in many diverse areas of science. The recognized importance of such data is the result of a number of factors including the intrinsic value of evaluated crystallographic data, the vast amount of data and the evolution of modern computerized delivery systems. From the users point of view, several points were clear. First, the cost of searching the databases should be as low as possible. Second, search commands for all the databases should be simple to use. In addition, since one often wishes to search multiple databases, the command structure should be standardized. Third, with respect to scientific software, many users stated the need for new scientific algorithms to seek and recognize structural patterns, motifs, structure types etc.

CRYSTMET: NRC Metals Data File

(D1) Database. CRYSTMET - NRC's Metals Data File presently has 55000 entries. NRC has decided to discontinue the production of the database. J. Rodgers offered to take over production of CRYSTMET, NRC has agreed and has given him the exclusive rights to CRYSTMET. Funding for the continued production of this database is being sought. The on-line availability of this database will continue until the end of March 1996. J. Rodgers has submitted a proposal to NRC to acquire the rights to the on-line service to the crystallographic databases; this future on-line system will be developed by J. Rodgers and others.

ICSD: Inorganic Crystal Structure Database

(E1) Database. The ICSD now contains 41476 entries (1656 have been added or corrected, 1220 are completely new). The database is distributed as before by Fachinformationszentrum (FIZ) Karlsruhe, Germany, together with the retrieval system RETRIEVE and the 3D graphics programs CRYSTAL VISUALIZER, STRUCTURE TIDY and LAZY PULVERIX. For use in academic institutions, special conditions can be arranged as part of the IUCr/FIZ/Gmelin Institut Agreement.

[Next section] [Previous section] [Index] in IUCr annual report for 1995
[1994] [1996] IUCr Commission on Crystallographic Data report


Updated 11th February 1997

Copyright © 1995 International Union of Crystallography

IUCr Webmaster