The Collaborative Computational Project No. 4 (CCP4) is a BBSRC-funded project to promote collaboration in the development and use of software in macromolecular crystallography. As part of its remit, CCP4 produces a suite of some 160 programs covering functionality for structure solution. The majority of the suite consists of Fortran 77 programs which can be built and run on most UNIX platforms, Windows and Macintosh OS X. Each Fortran program reads in control data on standard input together with a small number of standard format data files, and writes out log information to standard output together with modified data files. Traditionally, shell scripts are used to run a series of programs, linking the output of one program to the input of a following program.
A few years ago, it was decided that a Graphical User Interface (GUI) was required for the CCP4 suite. The decision was partly driven by the increasing proportion of non-crystallographers using the suite for whom guidance in using the programs would be very valuable. In addition, students are now more familiar with a point-and-click environment rather than command-line driven programs. The CCP4 GUI, named "ccp4i", was designed and written by Liz Potterton at the University of York over the period 1997-2001, with the first public release in 1999. Since early 2001, ccp4i has been maintained and developed by the CCP4 team at Daresbury Laboratory, in particular Peter Briggs.
ccp4i was written with a number of fundamental design principles in mind. Most importantly, the GUI was to be a separate layer from the CCP4 programs, and was not to impact on the latter in any way. Thus, users could continue to use CCP4 from the command line if they so wished. At one level, therefore, ccp4i simply prepares input for CCP4 programs and presents the output to the user. In fact, computations in ccp4i are centred on "tasks" rather than individual programs, where a task is a sequence of programs (specified by the developer) corresponding to a particular step in structure solution. Each task is associated with a single window where the user enters filenames for data files and other control input. A typical task window looks like:
Task windows are typically launched from the main window of ccp4i which looks like:
On the left-hand side is a list of available tasks. Clicking on a task launches the aforementioned task window. In the centre panel is a list of jobs that have been run or are running, together with some status information. Finally, on the right-hand side are tools for viewing the output of jobs and for administering jobs and customising ccp4i. Output files can be viewed by a variety of utilities which are made available with a plug-in approach. Help is available at several levels in ccp4i – a simple on-screen message line is bound to the cursor location, while clicking on the "Help" button in each window takes the user to the appropriate section of the html documentation pages.
Thus ccp4i facilitates the preparation of program input, provides tools for viewing output, and gives guidance to the user through an integrated help system and through the provision of well-defined crystallography tasks. In addition, ccp4i includes a set of basic project management tools. A user’s work is categorised according to "project", corresponding roughly to an individual structure solution. The user works in one project at a time, and each project has information on all jobs run within that project. Previous jobs can be reviewed and if desired re-run with slightly modified parameters.
ccp4i is written in standard Tcl/Tk, the only extension used being Blt for drawing line plots. Tcl/Tk was chosen since it was at the time well-established, widely ported and perceived to be easy to use. Java was considered but Tcl/Tk was thought to be a more natural choice for the intended purpose. Today, we would probably use Python, but while Tcl has some limitations as a scripting language, it continues to serve us well for ccp4i. The latest GUI developments in CCP4 will also use Tcl/Tk for the sake of compatibility with existing tools, but will in addition make use of IncrTcl, the object-orientated extension.
Jobs are run from a separate tclsh process. This process reads job parameters into the local scope, sources the appropriate tcl run script, and finally exec’s the program executable. The job parameters are held in a file, which is written by the task interface, and which can be re-used or modified at a later date. More recently, ccp4i has been adapted to run Python scripts, although the interface remains coded in Tcl/Tk.
Because ccp4i was designed as a separate layer to the CCP4 suite, it is relatively easy to use it to interface to 3rd party programs, provided they can be run from scripts. To create a new task interface, at most four files need to be written: 1) a tcl file defining the layout of the task interface, 2) a file defining the parameters required and default values if appropriate, 3) a run script to run the program with the parameters defined by the interface, and optionally 4) a template file for the program input. Library procedures are available to facilitate writing these files. Once written, these files can simply be added to the CCP4 distribution, and the new task registered via a tool in the main interface window.
Interfaces have now been written for some non-CCP4 protein crystallography software, namely ARP/wARP and SHELXD. ARP/wARP (http://www.arp-warp.org/) is a software suite for the automatic building of macromolecular models. SHELXD is a program for structure solution of small molecules and heavy atom substructures of macromolecules (http://shelx.uni-ac.gwdg.de/SHELX/). Further 3rd party interfaces are under development.
As well as providing a friendly user interface, ccp4i provides an environment for scripting. The run scripts referred to above take the place of traditional shell scripts and allow for the sequential running of programs with a limited amount of logic applied (i.e. with the actual sequence depending on the parameters supplied). The run scripts are written by the developer, and the user is not encouraged to change them, though this is possible.
A small amount of processing also takes place at the interface level. Utility programs are run at various points to check the contents of files, to perform format conversions, and to check the consistency of parameters. From the point of view of the user, ccp4i hides much of the administration of data files.
Nowadays, there is a drive towards increased automation, with the user expecting to make fewer decisions, except those of a scientific nature. Automation occurs at the level of individual programs, with more sophisticated algorithms, but also at the scripting layer. As mentioned above, the ccp4i run scripts embody some logic, but this is principally in terms of pre-defined parameters. Clearly what is needed is decision making based on the outcomes of previous steps in the pipeline. Steps are now being taken to develop CCP4 and the ccp4i environment in this direction.
As described above, ccp4i performs several functions, including those of graphical interface, project management, scripting, data management and job control. While ccp4i performs these functions well within CCP4, it is desirable to adopt a more modular system. This would be more appropriate for distributed systems, with interface, programs and data located on geographically disperse systems. It would also facilitate linking to external resources, such as remote databases.
The first stage in developing a more modular system is to separate out the existing functionality of ccp4i. The intention is to develop the underlying architecture, while preserving the functionality that the user sees. The existing ccp4i will be split into three components: 1) a Database Handler for controlling access to the job database, 2) a Graphical User Interface, and 3) a Resource Handler for linking the other components to each other, to CCP4 applications and to external resources. Once this separation is achieved, the components can be developed further quasi-independently. The Graphical User Interface will be kept as purely a graphical layer, with the minimum of additional functionality. Likewise, the Resource Handler should be a thin layer who’s principle job is to pass messages to the appropriate resource. An additional component, not yet implemented, is an Expert System which embodies knowledge and logic for the purpose of automating structure solution.
Many of these ideas have been tested in recent developments in Mosflm and the associated DNA project. See http://www.mrc-lmb.cam.ac.uk/harry/mosflm/ and http://www.dna.ac.uk/ for further details. Communication between modules in these projects is via XML-encoded messages passed through sockets, and CCP4 is likely to adopt a similar scheme.