Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Items for the Agenda of the COMCIFS closed meeting

I have four comments arising from David's items for discussion.

(1) Increasingly, large IT-driven projects such as the European
Bioinformatics Institute are handling data transfer through application
programming interface (APIs) rather than specific formats. That is, one
provides programming handles into abstract data structures
("return(Molecule), get3DPosition(Molecule->Atoms)") and relies on a set
of library functions or filters to map these data structures onto the
format or formats of the day. In practice the data structures are
initially built against a particular software implementation (probably the
program most used within the domain, or that happens to be best known by
the programmers). My suspicion is that there is no guarantee that these
data structures will map onto any arbitrary format (but the current
generation of software engineers is pretty good at forcing the best
possible fit).

In this context the CIF dictionaries remain essential in providing a
reference data model for crystallography. I find CIF itself a good
format (or language) for developing new data structures, because it is
syntactically very simple and has restricted internal structures (no
nested loops, for instance). Even so, the complexities of parent-child
relationships are stretching our ad-hoc dictionary development
procedures, and we should explore carefully whether the dictionaries
should continue to be developed in CIF (with appropriate development of
tools to ensure their self-consistency), or whether environments such as
RDF or UML can be helpful in maintaining or at least checking the
consistency of the model. To my mind, a strong argument in favour of
continuing development based on CIF/STAR is that already the CIF
dictionaries offer more properties of an object that can be validated
against a dictionary than are found in most (if not all) XML
environments.  Syd's proposed DDL3/dREL will take this even further.

(2) Whatever the chosen ontology development route, we will at some
level need to be able to interface with XML to achieve
cross-disciplinary interoperability. I believe COMCIFS should develop a
canonical mapping of CIF data files to an XML data structure. The
Bilbao symmetry database group has produced XML-based applications from
their existing collection of symmetry data sets in CIF format, and is
keen to develop more. The RCSB's mmCIF/XML mapping is already used by
the worldwide Protein Data Bank, and to my mind has a strong claim as a
de facto standard since it's already tested in use. Is it suitable for
smaller-scale applications like the Bilbao symmetry server?  We should
explore whether this is a suitable mapping for universal application: it
is surely desirable to avoid a proliferation of ad hoc mappings to XML.

(3) I am rather against trying to develop the DDL1 model further - it
accommodates pdCIF (just), and it's simple enough to be understood by
programmers in languages such as Fortran whose only interest in CIF is
to feed in the numbers required for crunching. Is there a case for
freezing the current DDL1 dictionaries at the current revision and going
over to DDL2 for any new content in the core dictionary?  (Or, what may
amount to the same thing, freezing the "core dictionary" at this
revision, except for trivial additions like _publ_author_email, and
developing new content such as molecular descriptions and extended
diffraction density in DDL2 dictionaries?)

(4) As matters stand, the existence of the DDL2 version of the core
dictionary within mmCIF saddles the mmCIF maintainers with a heavy
burden of maintenance. I am sure the RCSB would be happy to see the IUCr
adopt DDL2 for small-molecule CIFs and COMCIFS take over responsibility
for maintaining the DDL2 core dictionary. However, I can't see a case for
the IUCr to drop existing support for DDL1-based CIFs (tools such as
enCIFer are heavily DDL1/core-based, pdCIF may simply not work in DDL2).
But we may be able to provide support for small-molecule structures
in DDL2 as well, though it may take some effort to persuade the
small-molecule community to migrate to DDL2.

Best wishes

Reply to: [list | sender only]