[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: magCIF - policy advice requested

Dear Colleagues,
  We have three DDLs:  DDL1, DDL2 and DDLm.  I thought the ultimateobjective was to move to DDLm so that we are working with a commonDDL.   The advantage of DDL2 over DDL1 is that it encouragesSQL-friendly normalization of tables.  DDLm attempts to merge DDL1with DDL2 while retaining good SQL support.  If a new dictionary isnot going to a DDL2 dictionary, then I would urge that it be a DDLmdictionary, rather than a DDL1 dictionary.
  Regards,    Herbert
On Fri, May 30, 2014 at 11:47 AM, David Brown <idbrown@mcmaster.ca> wrote:> James,>> I would just like to echo John B's suggestion that we write magCIF in DDL1.> magCIF will be used largely for inorganic materials rather than proteins and> so fits more naturally into the same group as coreCIF which already contains> some symmetry items transferred and converted from symCIF.>> David>>>>> On 5/30/2014 10:08 AM, James Hester wrote:>> Hello John and others:>> On Wed, May 28, 2014 at 11:58 PM, Bollinger, John C> <John.Bollinger@stjude.org> wrote:>>>> Hello James,>>>>>>>> When you said “the authors wish it to be a single, coherent document,” did>> you not mean that the mag_CIF dictionary should contain its own entries for>> all the wanted data items already covered by the core, modulated, and>> symmetry dictionaries (naming convention aside)?  If that’s indeed the case>> then it seems there is already a commitment to duplicate/convert some>> definitions.  I can’t say that I’m altogether thrilled by that idea, but>> there is well-established precedent.  Whichever DDL is employed, naming>> conventions and aliases are probably a lesser issue for the conversion,>> relative to translating other details of the modulated dictionary (into>> DDL2) or the symmetry dictionary (into DDL1).>>> No, what I meant was that the magCIF definitions should be contained in a> single dictionary.  You see, one foolproof solution to the current problem> would be to make the magCIF dictionary 3 dictionaries: one an extension to> coreCIF, one an extension to ms_CIF, and one an extension to symCIF.  My> talk of a coherent document was in contrast to that alternative - sorry for> the confusion.  Duplicating datanames is something I think we should avoid> (will discuss in separate post).>>>>>>>> I suppose that using non-conventional data names in a DDL2 dictionary>> might be an amusing exercise to shake out bugs in CIF software, but that>> sort of exercise is unlikely to be well received if it indeed does shake out>> any bugs.  Mag CIF is going to be an interesting enough beast already if>> typical instance documents can be expected to use mixed data name>> conventions.  And really, dealing with aliases is nothing new even with DDL1>> dictionaries.  There are several data names even in the core dictionary that>> have been deprecated in favor of preferred alternatives.  Resilient CIF>> software must deal with both (all) alternatives.>>> The only way for resilient CIF software to deal with all alternatives is to> have an update mechanism linked to the IUCr dictionary register, and parse> any new or updated dictionaries to find out what new names an older name> might appear as, and presumably cache the results if there is access to> local storage.  I think this is within the realms of possibility for> well-designed, network-connected applications, and perhaps even our standard> CIFAPI might be able to build in some similar functionality.  Absent such an> API, we are expecting CIF authors to do quite a lot more than just parse a> file to pull out a dataname.  Let's not forget that the majority of> scientific software is written by one or two people whose main concern is> not dealing with getting the data in and out but with doing something fancy> in between.>> I agree that there may be issues with DDL2 dictionary software (see below),> but I'm guessing that this is a much smaller user base than the application> software.>>> Anyway, what is the point of creating a dictionary using any given DDL>> formalism at all, if not to allow for dictionary-aware applications to>> interpret, validate, and otherwise process CIF documents written against>> that dictionary?  If the point is to serve dictionary-driven software, then>> should not the dictionary’s form be chosen to work as smoothly as possible>> with such software?  For a DDL2 dictionary, I think that means following>> DDL2 convention for data names.>>> I'm happy to chalk up inability to use DDL2 dictionary-aware software that> relies on dots as a cost of what I'm proposing.  I have only a poor idea of> what this software might be, and how useful it might be to the magCIF> community, and how easy it would be to replace/rewrite to ignore dots, so> how great a cost this is needs some input from those who know about such> software.>>> An alternative that bears consideration, however, is to use DDL1 for mag>> CIF.  There would be only 29 items to convert from the symmetry dictionary,>> if even all of them were needed.  The names could be converted or not.  That>> seems an easier route for compiling the dictionary; the question is whether>> the resulting dictionary would serve all the purposes for which it is being>> built.  Would it?>>>>>> An intriguing idea.The essential problem evaporates, as symCIF-aware> programs would also be coreCIF-aware programs and happy with mixed dataname> conventions, and DDL1-dictionary-aware programs are blind to dots. I am> strongly against dataname duplication (separate post coming) so would not> want to see any redefinitions of symCIF datanames.>> An interesting alternative would be a DDLm dictionary, where we can> establish our own convention and are not breaking any software.  In my> opinion, the 'dot' convention should be a choice of the dictionary authors,> not dictated by the DDL but I'm open to explanations of why this should not> be the case.>> James.>>>>>>>> John>>>>>>>> -->>>> John C. Bollinger, Ph.D.>>>> Computing and X-Ray Scientist>>>> Department of Structural Biology>>>> St. Jude Children's Research Hospital>>>> John.Bollinger@StJude.org>>>> (901) 595-3166 [office]>>>> www.stjude.org>>>>>>>>>>>> From: comcifs-bounces@iucr.org [mailto:comcifs-bounces@iucr.org] On Behalf>> Of James Hester>> Sent: Wednesday, May 28, 2014 1:30 AM>> To: Discussion list of the IUCr Committee for the Maintenance of the CIF>> Standard (COMCIFS)>> Subject: Re: magCIF - policy advice requested. .>>>>>>>> Hi Herbert,>>>> I believe you are advocating duplicating part or all of the modulated>> structures dictionary (~100 datanames) within the magnetic structures>> dictionary, with aliases as necessary.  As far as I can see, this buys us no>> more than 'a loop can be written so that all datanames have dots in them'.>> I do not even say 'must be written', because the aliases mean that you could>> continue to use the old-style datanames.>>>> Regarding confusion, and the lack of it in the case of mmCIF/core CIF: the>> presence of two datanames (the mmCIF version and the core CIF version) for>> each concept in core CIF has not caused confusion and extra work for the>> simple reason that there are clear workflow and software demarcations when>> doing macromolecular work and chemical crystallography. Programmers, being>> aware of this divide, work with the appropriate datanames. This demarcation>> is not a result of anything that COMCIFS have done and therefore the lack of>> confusion is not something that can be taken for granted when moving to a>> different community.>>>> In contrast to the macromolecular/small molecule case, the modulated>> structures community and the magnetic structures community are closely>> intertwined to the extent that the same programs are used (e.g. JANA).>> Unlike the macromolecular/core CIF case, the program user does not in>> general know whether they are reading/writing a CIF intended for a modulated>> structures or a magnetic structures or plain core CIF consumer.  Therefore,>> if ms_cif is rewritten in DDL2, all programs must now be>> rewritten/recompiled/redistributed to read and write both styles of>> datanames.  And what about the fact that many programs that ingest these>> magCIFs will be ordinary non-magnetic-aware programs expecting core CIF>> DDL1-style datanames for e.g. the atom positions?>>>> A first cost benefit analysis then looks like this:>>>> Costs: rewriting 100 definitions and any software that inputs/outputs>> those datanames and core CIF datanames>>>> Benefits: all datanames in a loop can have dots in them>>>>>>>> On the face of it, these costs outweigh the benefit by several orders of>> magnitude.>>>> As a postscript, I don't know if we quite appreciate the fact that once we>> have defined a dataname, it is almost impossible to winkle it out of>> software.  Changing a dictionary from DDL1 style to dotted datanames has>> never been done before (I would assert that mmCIF started with a clean slate>> as their community path was PDB -> mmCIF, not core CIF -> mmCIF.  And it has>> only taken 15 years to get that to start to happen.)  The best I think we>> can do is to provide a solid and widely-adopted CIF API that can apply>> aliases behind the scenes, in which case we can have a little more>> confidence in adoption of replacement datanames.>>>> all the best,>> James.>>>>>>>>>>>> On Wed, May 28, 2014 at 1:43 PM, Herbert J. Bernstein <yayahjb@gmail.com>>> wrote:>>>> Dear James,>>>>>>>>   It need not cause any confusion.  The core names already in the mmCIF>>>> dictionary have not.  Small molecule people use the undotted names.>>>> Macromolecular people use the dotted names.  If we simply added aliases>>>> for the modulated structures to the mmCIF dictionary (which probably>>>> should be done anyway) we end up with nice clean magCIF loops and>>>> little or no confusion for modulated structure cifs.>>>>>>>>   Regards,>>>>     Herbert>>>>>>>> On Tuesday, May 27, 2014, James Hester <jamesrhester@gmail.com> wrote:>>>> I expect that the magCIF writers would write their datanames to match that>> part of mmCIF that reproduces core CIF.  The only issue then becomes the>> (DDL1) modulated structures dictionary. As you suggest, the modulated>> structures dictionary could be rewritten with DDL2-style names, but I don't>> believe that this additional work is necessary.  It would also create>> unwelcome confusion in the community as to which modulated structure>> datanames should be used.>>>>>>>> On Tue, May 27, 2014 at 10:36 PM, Herbert J. Bernstein <yayahjb@gmail.com>>> wrote:>>>> My own inclination would be to follow the approach followed by mmcif which>> provides a rather complete dotted notation mapping of the core so you end up>> with much cleaner looking loop headers.>>>> Regards,>> Herbert>>>> Sent from my Xperia™ smartphone>>>>>>>> James Hester <jamesrhester@gmail.com> wrote:>>>> Dear COMCIFS members and advisers:>>>> I am pleased to advise that a CIF dictionary for description of>> magnetic structures (magCIF) is currently in preparation and it is>> expected that a final draft could be ready before the IUCr meeting.>> This has raised a policy issue for COMCIFS that we need to deal with>> in a timely way.>>>> By its nature, the magCIF dictionary builds on the definitions in the>> core CIF dictionary, modulated structures CIF dictionary, and symmetry>> CIF dictionary (including extending looped categories).  At the same>> time, the authors wish it to be a single, coherent document.  Core CIF>> and the modulated structures dictionary use DDL1 naming conventions,>> whereas symCIF is a DDL2 dictionary with DDL2 naming conventions. For>> coherence and convenience, the authors of magCIF should clearly use a>> single DDL and naming convention.>>>> My inclination is to recommend writing magCIF using DDL2.>> Semantically, this will mean that certain DDL2 concepts (e.g. 'key')>> will be implicitly imposed on DDL1 datanames.  This mapping is however>> straightforward and implied by the presence of 'aliases' in mmCIF and>> other DDL2 dictionaries>>>> More trivially, this approach will result in some loops that have>> names not containing a period mixed with names that do contain a>> period, and non-looped datanames in the CIF data file will also>> contain mixtures of such names. I note that the use of a period to>> separate category and item is purely conventional and is not>> syntactically or semantically required by the DDL that the dictionary>> is written in, so I do not consider this to be a problem.>>>> A further advantage of DDL2-style names is that when magCIF is>> translated into DDLm at some not-too-distant point, the same names can>> be used (as DDLm naming conventions are the same as DDL2 naming>> conventions) and software written with the DDL2 magCIF dictionary in>> mind will not require updating to handle files written against the>> 'new' DDLm magCIF.>>>> Does anybody see any issues with this recommendation?>>>> James.>>>>>> -->> T +61 (02) 9717 9907>> F +61 (02) 9717 3145>> M +61 (04) 0249 4148>>>>>> _______________________________________________>> comcifs mailing list>> comcifs@iucr.org>> http://mailman.iucr.org/mailman/listinfo/comcifs>>>>>>>>>> -->> T +61 (02) 9717 9907>> F +61 (02) 9717 3145>> M +61 (04) 0249 4148>>>>>> _______________________________________________>> comcifs mailing list>> comcifs@iucr.org>> http://mailman.iucr.org/mailman/listinfo/comcifs>>>>>>>>>> -->> T +61 (02) 9717 9907>> F +61 (02) 9717 3145>> M +61 (04) 0249 4148>>>>>> ________________________________>> Email Disclaimer: www.stjude.org/emaildisclaimer>> Consultation Disclaimer: www.stjude.org/consultationdisclaimer>>>> _______________________________________________>> comcifs mailing list>> comcifs@iucr.org>> http://mailman.iucr.org/mailman/listinfo/comcifs>>>>>> --> T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148>>> _______________________________________________> comcifs mailing list> comcifs@iucr.org> http://mailman.iucr.org/mailman/listinfo/comcifs>>>> _______________________________________________> comcifs mailing list> comcifs@iucr.org> http://mailman.iucr.org/mailman/listinfo/comcifs>_______________________________________________comcifs mailing listcomcifs@iucr.orghttp://mailman.iucr.org/mailman/listinfo/comcifs

Reply to: [list | sender only]