[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: magCIF - policy advice requested

Hi John W,

I agree that all the new definitions should be collected together in a single file with a single dictionary dialect.  I do not see that repeating definitions from core CIF/ms_CIF/symCIF is necessary, given that we do not do that for other extensions to core CIF.  Our point of contention then becomes whether or not, if the dictionary dialect chosen is DDL2, we have to include periods in the datanames. I say it is not necessary.  If you think that it would be necessary, can you give me some idea of why?  Note that I am not at all against a particular dictionary (e.g. mmCIF, pdbx) adopting 'dot' notation as a rigorous convention.

thanks,
James.

On Wed, May 28, 2014 at 9:25 PM, John Westbrook <jwest@rcsb.rutgers.edu> wrote:
Hi James,

I completely agree with Herbert in this case.  Having all of the data names in a single dictionary with a common
dialect makes the content much more accessible for users as well as existing software tools.   mmCIF incorporated the core
definitions with DDL2 naming conventions and aliases.   The core naming conventions were preserved to the extent possible.
Splitting the definitions across DDL dialects is going to be a significant obstacle to many users at this time.

Regards,

John



On 5/28/14, 2:29 AM, James Hester wrote:
Hi Herbert,

I believe you are advocating duplicating part or all of the modulated structures dictionary (~100 datanames) within the magnetic
structures dictionary, with aliases as necessary.  As far as I can see, this buys us no more than 'a loop /can/ be written so that

all datanames have dots in them'.  I do not even say 'must be written', because the aliases mean that you could continue to use the
old-style datanames.

Regarding confusion, and the lack of it in the case of mmCIF/core CIF: the presence of two datanames (the mmCIF version and the core
CIF version) for each concept in core CIF has not caused confusion and extra work for the simple reason that there are clear
workflow and software demarcations when doing macromolecular work and chemical crystallography. Programmers, being aware of this
divide, work with the appropriate datanames. This demarcation is not a result of anything that COMCIFS have done and therefore the
lack of confusion is not something that can be taken for granted when moving to a different community.

In contrast to the macromolecular/small molecule case, the modulated structures community and the magnetic structures community are
closely intertwined to the extent that the same programs are used (e.g. JANA). Unlike the macromolecular/core CIF case, the program
user does not in general know whether they are reading/writing a CIF intended for a modulated structures or a magnetic structures or
plain core CIF consumer.  Therefore, if ms_cif is rewritten in DDL2, all programs must now be rewritten/recompiled/redistributed to
read and write both styles of datanames.  And what about the fact that many programs that ingest these magCIFs will be ordinary
non-magnetic-aware programs expecting core CIF DDL1-style datanames for e.g. the atom positions?

A first cost benefit analysis then looks like this:
Costs: rewriting 100 definitions and any software that inputs/outputs those datanames and core CIF datanames
Benefits: all datanames in a loop can have dots in them

On the face of it, these costs outweigh the benefit by several orders of magnitude.

As a postscript, I don't know if we quite appreciate the fact that once we have defined a dataname, it is almost impossible to
winkle it out of software.  Changing a dictionary from DDL1 style to dotted datanames has never been done before (I would assert
that mmCIF started with a clean slate as their community path was PDB -> mmCIF, not core CIF -> mmCIF.  And it has only taken 15
years to get /that/ to start to happen.)  The best I think we can do is to provide a solid and widely-adopted CIF API that can apply

aliases behind the scenes, in which case we can have a little more confidence in adoption of replacement datanames.

all the best,
James.



On Wed, May 28, 2014 at 1:43 PM, Herbert J. Bernstein <yayahjb@gmail.com <mailto:yayahjb@gmail.com>> wrote:

    Dear James,

       It need not cause any confusion.  The core names already in the mmCIF
    dictionary have not.  Small molecule people use the undotted names.
    Macromolecular people use the dotted names.  If we simply added aliases
    for the modulated structures to the mmCIF dictionary (which probably
    should be done anyway) we end up with nice clean magCIF loops and
    little or no confusion for modulated structure cifs.

       Regards,
         Herbert


    On Tuesday, May 27, 2014, James Hester <jamesrhester@gmail.com <mailto:jamesrhester@gmail.com>> wrote:

        I expect that the magCIF writers would write their datanames to match that part of mmCIF that reproduces core CIF.  The only
        issue then becomes the (DDL1) modulated structures dictionary. As you suggest, the modulated structures dictionary could be
        rewritten with DDL2-style names, but I don't believe that this additional work is necessary.  It would also create unwelcome
        confusion in the community as to which modulated structure datanames should be used.


        On Tue, May 27, 2014 at 10:36 PM, Herbert J. Bernstein <yayahjb@gmail.com> wrote:

            My own inclination would be to follow the approach followed by mmcif which provides a rather complete dotted notation
            mapping of the core so you end up with much cleaner looking loop headers.

            Regards,
            Herbert

¬† ¬† ¬† ¬† ¬† ¬† Sent from my Xperia‚ĄĘ smartphone


            James Hester <jamesrhester@gmail.com> wrote:

            Dear COMCIFS members and advisers:

            I am pleased to advise that a CIF dictionary for description of
            magnetic structures (magCIF) is currently in preparation and it is
            expected that a final draft could be ready before the IUCr meeting.
            This has raised a policy issue for COMCIFS that we need to deal with
            in a timely way.

            By its nature, the magCIF dictionary builds on the definitions in the
            core CIF dictionary, modulated structures CIF dictionary, and symmetry
            CIF dictionary (including extending looped categories).  At the same
            time, the authors wish it to be a single, coherent document.  Core CIF
            and the modulated structures dictionary use DDL1 naming conventions,
            whereas symCIF is a DDL2 dictionary with DDL2 naming conventions. For
            coherence and convenience, the authors of magCIF should clearly use a
            single DDL and naming convention.

            My inclination is to recommend writing magCIF using DDL2.
            Semantically, this will mean that certain DDL2 concepts (e.g. 'key')
            will be implicitly imposed on DDL1 datanames.  This mapping is however
            straightforward and implied by the presence of 'aliases' in mmCIF and
            other DDL2 dictionaries

            More trivially, this approach will result in some loops that have
            names not containing a period mixed with names that do contain a
            period, and non-looped datanames in the CIF data file will also
            contain mixtures of such names. I note that the use of a period to
            separate category and item is purely conventional and is not
            syntactically or semantically required by the DDL that the dictionary
            is written in, so I do not consider this to be a problem.

            A further advantage of DDL2-style names is that when magCIF is
            translated into DDLm at some not-too-distant point, the same names can
            be used (as DDLm naming conventions are the same as DDL2 naming
            conventions) and software written with the DDL2 magCIF dictionary in
            mind will not require updating to handle files written against the
            'new' DDLm magCIF.

            Does anybody see any issues with this recommendation?

            James.


            --
            T +61 (02) 9717 9907 <tel:%2B61%20%2802%29%209717%209907>
            F +61 (02) 9717 3145 <tel:%2B61%20%2802%29%209717%203145>
            M +61 (04) 0249 4148 <tel:%2B61%20%2804%29%200249%204148>


            _______________________________________________
            comcifs mailing list
            comcifs@iucr.org
            http://mailman.iucr.org/mailman/listinfo/comcifs




        --
        T +61 (02) 9717 9907 <tel:%2B61%20%2802%29%209717%209907>
        F +61 (02) 9717 3145 <tel:%2B61%20%2802%29%209717%203145>
        M +61 (04) 0249 4148 <tel:%2B61%20%2804%29%200249%204148>


    _______________________________________________
    comcifs mailing list
    comcifs@iucr.org <mailto:comcifs@iucr.org>

--

John Westbrook, Ph.D.
RCSB, Protein Data Bank
Rutgers, The State University of New Jersey
Department of Chemistry and Chemical Biology
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087
e-mail: jwest@rcsb.rutgers.edu
Ph: (848) 445-4290 Fax: (732) 445-4320

_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://mailman.iucr.org/mailman/listinfo/comcifs



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://mailman.iucr.org/mailman/listinfo/comcifs

Reply to: [list | sender only]