RE: magCIF - policy advice requested
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIF Standard (COMCIFS)" <firstname.lastname@example.org>
- Subject: RE: magCIF - policy advice requested
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Wed, 28 May 2014 08:58:09 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <CAM+dB2dk4s8jqfKf9wBzcUhia6_dCSoV=8TycyTjt5e8dgvdug@mail.gmail.com>
- References: <email@example.com><CAM+dB2dFMAgTXFnnU7tfj-14NotE=LACZANMjzcwOhS8B8firstname.lastname@example.org><CABcsX27=HsJWoTB-1d_yKOTXyr7zbJyoK6RXx+H29iReAnNgEw@mail.gmail.com><CAM+dB2dk4s8jqfKf9wBzcUhia6_dCSoV=8TycyTjt5e8dgvdug@mail.gmail.com>
When you said “the authors wish it to be a single, coherent document,” did you not mean that the mag_CIF dictionary should contain its own entries for all the wanted data items already covered by the core, modulated, and symmetry dictionaries (naming convention aside)? If that’s indeed the case then it seems there is already a commitment to duplicate/convert some definitions. I can’t say that I’m altogether thrilled by that idea, but there is well-established precedent. Whichever DDL is employed, naming conventions and aliases are probably a lesser issue for the conversion, relative to translating other details of the modulated dictionary (into DDL2) or the symmetry dictionary (into DDL1).
I suppose that using non-conventional data names in a DDL2 dictionary might be an amusing exercise to shake out bugs in CIF software, but that sort of exercise is unlikely to be well received if it indeed does shake out any bugs. Mag CIF is going to be an interesting enough beast already if typical instance documents can be expected to use mixed data name conventions. And really, dealing with aliases is nothing new even with DDL1 dictionaries. There are several data names even in the core dictionary that have been deprecated in favor of preferred alternatives. Resilient CIF software must deal with both (all) alternatives.
Anyway, what is the point of creating a dictionary using any given DDL formalism at all, if not to allow for dictionary-aware applications to interpret, validate, and otherwise process CIF documents written against that dictionary? If the point is to serve dictionary-driven software, then should not the dictionary’s form be chosen to work as smoothly as possible with such software? For a DDL2 dictionary, I think that means following DDL2 convention for data names.
An alternative that bears consideration, however, is to use DDL1 for mag CIF. There would be only 29 items to convert from the symmetry dictionary, if even all of them were needed. The names could be converted or not. That seems an easier route for compiling the dictionary; the question is whether the resulting dictionary would serve all the purposes for which it is being built. Would it?
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
(901) 595-3166 [office]
From: email@example.com [mailto:firstname.lastname@example.org]
On Behalf Of James Hester
I believe you are advocating duplicating part or all of the modulated structures dictionary (~100 datanames) within the magnetic structures dictionary, with aliases as necessary. As far as I can see, this buys us no more than 'a loop can be written so that all datanames have dots in them'. I do not even say 'must be written', because the aliases mean that you could continue to use the old-style datanames.
Regarding confusion, and the lack of it in the case of mmCIF/core CIF: the presence of two datanames (the mmCIF version and the core CIF version) for each concept in core CIF has not caused confusion and extra work for the simple reason that there are clear workflow and software demarcations when doing macromolecular work and chemical crystallography. Programmers, being aware of this divide, work with the appropriate datanames. This demarcation is not a result of anything that COMCIFS have done and therefore the lack of confusion is not something that can be taken for granted when moving to a different community.
In contrast to the macromolecular/small molecule case, the modulated structures community and the magnetic structures community are closely intertwined to the extent that the same programs are used (e.g. JANA). Unlike the macromolecular/core CIF case, the program user does not in general know whether they are reading/writing a CIF intended for a modulated structures or a magnetic structures or plain core CIF consumer. Therefore, if ms_cif is rewritten in DDL2, all programs must now be rewritten/recompiled/redistributed to read and write both styles of datanames. And what about the fact that many programs that ingest these magCIFs will be ordinary non-magnetic-aware programs expecting core CIF DDL1-style datanames for e.g. the atom positions?
A first cost benefit analysis then looks like this:
Costs: rewriting 100 definitions and any software that inputs/outputs those datanames and core CIF datanames
Benefits: all datanames in a loop can have dots in them
On the face of it, these costs outweigh the benefit by several orders of magnitude.
As a postscript, I don't know if we quite appreciate the fact that once we have defined a dataname, it is almost impossible to winkle it out of software. Changing a dictionary from DDL1 style to dotted datanames has never been done before (I would assert that mmCIF started with a clean slate as their community path was PDB -> mmCIF, not core CIF -> mmCIF. And it has only taken 15 years to get that to start to happen.) The best I think we can do is to provide a solid and widely-adopted CIF API that can apply aliases behind the scenes, in which case we can have a little more confidence in adoption of replacement datanames.
all the best,
On Wed, May 28, 2014 at 1:43 PM, Herbert J. Bernstein <email@example.com> wrote:
It need not cause any confusion. The core names already in the mmCIF
dictionary have not. Small molecule people use the undotted names.
Macromolecular people use the dotted names. If we simply added aliases
for the modulated structures to the mmCIF dictionary (which probably
should be done anyway) we end up with nice clean magCIF loops and
little or no confusion for modulated structure cifs.
I expect that the magCIF writers would write their datanames to match that part of mmCIF that reproduces core CIF. The only issue then becomes the (DDL1) modulated structures dictionary. As you suggest, the modulated structures dictionary could be rewritten with DDL2-style names, but I don't believe that this additional work is necessary. It would also create unwelcome confusion in the community as to which modulated structure datanames should be used.
On Tue, May 27, 2014 at 10:36 PM, Herbert J. Bernstein <firstname.lastname@example.org> wrote:
My own inclination would be to follow the approach followed by mmcif which provides a rather complete dotted notation mapping of the core so you end up with much cleaner looking loop headers.
Dear COMCIFS members and advisers:
Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________ comcifs mailing list email@example.com http://mailman.iucr.org/mailman/listinfo/comcifs
Reply to: [list | sender only]
- Re: magCIF - policy advice requested (James Hester)