Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Advice on COMCIFS policy regarding compatibility of CIFsyntaxwith other domains.

Dear Colleagues,

   Brian's points are well taken, and we all, I am certain, would 
appreciate hearing "from this wider circle of COMCIFS advisers with 
all your experience
of your respective communities."  Peter's comments have been very useful.
Please, if you will, take the time to review the messages in this
thread and to contribute your thoughts on what is important, what is
not, and which way we should go.  You are all stakeholders.  We need to know
your thoughts.  CIF is at a critical juncture.  We would like to get
this right.

You will find the threads of COMCIFS discussion at


and the related DDLm group discussion at


Please, if you can, take the time.


At 12:53 PM +0000 3/5/11, Brian McMahon wrote:
>I should like to make a number of comments to explain my perspective
>on the choices we face in continuing to develop CIF in the direction
>of greater functionality. (Think of me in this context as an Editor of
>International Tables G, with a specific desire to document a coherent
>and consistent standard sponsored by the IUCr.)
>(1) "Stakeholder buy-in" is not something we'll be able to guarantee,
>whatever approach we take. Herbert makes the well-established point
>that demonstrably "better" solutions are not always those that
>flourish. Peter's experiences with CML are also illuminating. On the
>face of it, one would expect CML to attract a huge number of
>stakeholders based on the potential returns from an information
>exchange standard in chemistry, and the adoption of a standard data
>format (XML) that is common currency. And yet, the uptake of CML still
>seems to me modest - I would go so far as to say disappointing.
>(2) There is a real danger that a poor choice for a novel syntactic
>feature could lead to some confusion. But (a) there are relatively few
>available constructs with the necessary aesthetic appeal (if we want
>to keep files reasonably human readable), and any one choice may
>conflict in some way with some other existing convention; and (b)
>people can learn to cope with potential ambiguities when they have
>to. Who are such "people" and when will they "have to"? That will
>depend in part on how the results are used in the real world. If all
>CIF input/output is through computerised pipelines, the onus is on the
>developers writing the standard parsers. If it's envisaged that
>general cut and paste will frequently be involved, then the incidence
>of mistakes is likely to be higher. There's going to be considerable
>scope for judgement in trying to assess the impact of a syntactic
>change in the real world.
>(3) That's related to Peter's notion of toolchains, which is well
>made. The CIF data format is idiosyncratic (a turn-off for many), but
>well specified and relatively lightweight. On the other hand, it is
>supported by a rather small set of tools. Building new bespoke tools
>will be burdensome, because it must be undertaken by a small
>community. In my view, extending that toolchain within a small group,
>if that is what needs to be done, is helped by applying Occam's razor
>("entities should not be unnecessarily multiplied") to new syntactic
>elements - they should contain the smallest amount of complexity needed
>to achieve the purpose for which they have been introduced.
>(4) Then the question arises of whether - and what - we gain by
>introducing syntactic features, not strictly necessary but familiar to
>a larger constituency, in the hope of encouraging greater direct
>involvement in extending the toolchain. Frankly, I am sceptical that
>by itself this would attract many new active programmers, but I'd be
>interested in hearing counter-views, especially from this wider circle
>of COMCIFS advisers with all your experience of your respective
>I think it *might* be more appealing to groups that decline to engage
>directly with CIF as it now stands if there were widespread traffic of
>CIFs expressed in a less idiosyncratic format (e.g. XML). But only
>"might" - there's nothing actually preventing people from developing
>XML files based on CIF tokens and dictionary attributes within the
>"Crystallographic Information Framework". Indeed, there are already
>instances of this: PDBML, the crystal structure content within CML
>files, the symmetry database of the Bilbao group. But none have really
>taken off, and I'm not aware of any community demand for delivery of
>CIF data in XML format, certainly among the people who interact with
>the IUCr journals. My conclusion is that it is not primarily the
>format that is inhibiting greater stakeholder buy-in, certainly to CIF
>as it now exists.
>There might be greater interest from people external to this group if
>and when we have functioning dictionary-based methods evaluation
>through dREL, but on past experience I wouldn't automatically expect
>that to happen. However, we do have within this CIF developer
>community significant experience in implementing dictionary-based
>methods. Doing this properly over the complex objects described by
>related dictionary categories is not at all an easy task. This is why
>we have needed to develop dREL, not as yet another general-purpose
>language, but as one specifically tuned to the data structures
>expressible in CIF. If the consequence of this is the need to continue
>extending our own toolchain, that argues against putting additional
>obstacles in the path of doing so.
>Brian McMahon                                       tel: +44 1244 342878
>Research and Development Officer                    fax: +44 1244 314888
>International Union of Crystallography            e-mail:  bm@iucr.org
>5 Abbey Square, Chester CH1 2HU, England
>comcifs mailing list

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769


Reply to: [list | sender only]