Discussion List Archives

[Date Prev][Date Next][Date Index]

(75) mmCIF Workshop; imgCIF/CBF Workshop; review of COMCIFS

  • To: COMCIFS@iucr.ac.uk
  • Subject: (75) mmCIF Workshop; imgCIF/CBF Workshop; review of COMCIFS
  • From: bm
  • Date: Fri, 24 Oct 1997 13:16:23 +0100
Dear Colleagues

It was good to see some of you again at the mmCIF and image workshops this
past week. I would like to offer a brief synopsis of the workshops (from my
viewpoint) and then make some comments on how I see them affecting the
proposed COMCIFS review. People who were at the workshops but who see things
differently are of course welcome to send in their view of things.

D75.1 mmCIF Workshop
--------------------
In many respects, this felt much more a CIF workshop than the earlier
meetings in this series. While the York, Tarrytown and Brussels meetings had
identified, analysed and ultimately resolved structural difficulties in the
data model underlying the mmCIF dictionary, this meeting established an
atmosphere of purposeful adoption and extension of the standard. The only
disappointment was the lack of obvious progress in developing software tools
beyond those so well established at the NDB and developed by Phil Bourne and
Herbert Bernstein; but there was evidence that some of the crystallographic
software authors are beginning to make suitable provision for implementing
the new standard. The absolute need to manage future dictionary extensions
effectively came up in a number of discussions, and there seemed to be some
support for the establishment of a central register of local and extension
dictionaries along the lines being developed by the Dictionary Maintenance
Working Group. (Reminder: the discussions so far are at
http://www.iucr.org/cif/comcifs/wg1/; we also made progress on the technical
issues of dictionary merging.)

Helen Berman, John Westbrook and their co-workers are to be congratulated on
the smooth running of this meeting.

D75.2 Image Workshop
--------------------
Although I wasn't able to stay until the end of the image workshop at
Brookhaven, there seemed to be very substantial progress in defining the
goals of the project and establishing a working framework for their
implementation. This workshop was particularly notable for its level of
commercial sponsorship and support. Equipment manufacturers declared
themselves ready to adopt and endorse a hard standard if one emerged. In
that light, the meeting addressed two particular issues of interest to
COMCIFS: the mechanism for handling binary data in a CIF framework, and the
general structure of the data model for describing the relevant
instrumentation.

The solution to the binary/ASCII quandary was the following. The need to
handle very large data sets at high efficiency (a requirement that could
involve taking advantage of machine-dependent features) makes the adoption
of an ASCII-only format unacceptable. Consequently, a new format file will
be developed, called CBF (Crystallographic Binary File), which has certain
header and blocksize properties appropriate to optimum handling on existing
machine architectures. The contents of the file will be held in tag/data
associative groupings that follow the CIF model, and the tags will be CIF
ASCII tokens. The tokens will be defined in a standard DDL2 dictionary file,
so that a dictionary of CIF terms suitable for full assimilation into other
CIF data files may be constructed. This dictionary might be known as the
imgCIF dictionary. Software will be made available to convert a CBF to a
fully-compliant CIF, where the binary data streams (representing image data)
will be suitably ASCII encoded (perhaps using MIME or some subset of MIME
encoding as a publicly documented standard).

The contents of the imgCIF dictionary will include descriptors of the binary
data array, and descriptions of its size, ordering, dynamic compression and
other technical details needed to rebuild the image. But there will also be
a very rich set of instrumental descriptions, and the definition set drawn
up for this purpose will greatly extend the current sparse DIFFRN_DETECTOR
and *_SOURCE and *_MEASUREMENT categories. 

It seems essential to me that this work should not be held up because of
concerns over the binary implementation. The binary file is not, and will
not be called, CIF; but its content is fully describable in a CIF dictionary
and it may be converted easily - if not trivially - to a fully compliant CIF
when and if ever true archival or machine-independent transfer are required.
The type of delay that followed early resistance to the DDL2-style extensions
to mmCIF would be hurtful to this project, and to its full integration with
mmCIF and coreCIF requirements for instrument characterization.

So I would recommend that COMCIFS be prepared to endorse fully the imgCIF
dictionary, and to consider whether an extension of its remit to cover and
protect the CBF format is also appropriate.

Note that I do not regard CBF as necessarily a general solution to the
question of how to include binary information in the CIF framework. There
are different issues involved in the incorporation of graphics file formats
for publication, or generic multimedia annotations, that are probably best
addressed separately from this application-specific approach.

==========================

Based on the energy underlying both workshops, and on their complementarity
yet diversity of aim, I am prompted to throw in a few suggestions of my own
to the current review of the future of COMCIFS:


D70.1 COMCIFS Review
--------------------
Syd proposes a three-tier system of executive, project subcommittees and
project working parties. I think this is on the right lines, but the
inter-relationship between the project subcommittees and project working
parties isn't completely clear to me. I think instead the three tiers that
are necessary should be:
     (1) executive
     (2) dictionary subcommittees
     (3) technical working groups
where (2) and (3) both report to (1). The distinction would be that
dictionary subcommittees (2) shall have responsibility for the maintenance
of dictionaries, and have essentially indefinite duration; but technical
working groups (3) should be established to address specific questions posed
by the executive and should have limited lifetimes. Only members of the
executive would have voting rights; both dictionary subcommittees and
technical working groups should have at least one member belonging to the
executive. If you prefer more traditional bureaucratic nomenclature,
(2) constitutes Standing Committees, (3) ad-hoc Committees.

Membership: The executive committee should be small enough to be effective,
large enough to bring viewpoints from across the discipline, and indeed from
other disciplines (I have in mind that there should be at least one member
who is knowledgeable about informatics). Six sounds about right. The members
should be chosen for their technical expertise, and not ex officio in
consequence of their standing on other IUCr bodies. Hence, I question the
wisdom of Syd's suggestion that the Chair of the Database Committee should -
ex officio - be COMCIFS Vice-Chair (I hasten to add that's a structural
criticism, not a reference to any individual!).

On the other hand, there are several bodies who feel that they have a right
to representation in this forum, and they might have Observer status. Likely
candidates would be the IUCr President, representatives of the Database
Committee, Electronic Publishing Committee, Journals Commission, Nomenclature
Commission.

A case might be made to have an additional type of membership - call it
Associate Member, perhaps - for the Chairs of subcommittees and
technical working groups. Such associate members would have the
responsibility for reporting their progress to the executive through the 
Full Member appointed to liaise with them.

The executive should conduct open discussion through a mailing list
(membership of which should be restricted to the executive and, perhaps, the
Observers) and not through the current moderated discussion. In like manner,
the subcommittees and technical working groups may conduct their business
through separate mailing lists. It would be beneficial to manage all the
mailing lists through Chester, or at least to mirror the discussions there.

A coordinating secretary would assist the executive in formulating an agenda,
in liaising between the various subcommittees and working groups, and in
posting public notices and summaries at the direction of the executive. The
coordinating secretary need not, of course, have executive rights. (Nor need
it be the current incumbent. I feel it very much an honour and a privilege
to serve in this role, but it's rather like the privilege accorded to the
little unarmed drummer-boy of leading the regiment into the teeth of battle!)

These are some of the matters that I think the executive would need to
address in the near future:
  - The effective management of a distributed dictionary system
  - The responsibility for managing DDL and not solely CIF dictionaries
  - The continuing overhead (or not) of maintaining two DDLs
  - Related to that, the implications of permitting nested loops and other
       STAR constructs
  - Implementation of _type_construct and of methods
  - Identification of areas of crystallography still lacking a dictionary
  - Adoption of the CBF format as an IUCr standard
  - Management of binary attachments to archival CIFs
Though many are issues of policy, they need to be thoroughly investigated
technically before the policy is enunciated. For example, it is generally
felt that nested loops would be A Good Thing; but it is essential, I think,
to have working applications that can handle the greater complexity of
nested loops before the current flat file representation can be abandoned. A
technical working group would write a library for nested loop manipulation
*before* it was adopted as a policy shift.

In any case, several of these issues will come before COMCIFS in the near
future, whatever its structure after review.

With good wishes
Brian