Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: magCIF - policy advice requested


In fact, element-by-element tags are needed regardless of choice of DDL, as they are the only way for CIF1 applications to consume those data.  If per-element tags can be aliased to individual elements of a separate vector item then well and good, but if not then such a vector item would probably constitute an inappropriate duplicate.

This bears on the question of whether CIF2 can or should be allowed for instance documents at this time or in the near future.  That's a separate issue, because a DDLm dictionary can certainly define data that are expressible completely in CIF1 form.  If the magCIF definitions are restricted to such forms, however, then I see little practical advantage to writing the dictionary in DDLm over writing it, with due care to proper normalization, in DDL1.  On the contrary, I reiterate that DDLm might not even be a viable choice if current dictionary-driven software must be accommodated.

This isn't about moving forward or backward or sideways.  It is about choosing the best tool for the job.  Certainly DDLm is intended to replace both DDL1 and DDL2, but I question whether in practice it is ready to do so just yet.  Perhaps magCIF could serve as a lever to move the DDL-consuming world in that direction, but unless we are willing to use it that way, I have serious reservations about writing it (exclusively) in DDLm.


John

--
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital


-----Original Message-----
From: yayahjb [mailto:yayahjb@gmail.com]
Sent: Monday, June 02, 2014 9:11 AM
To: Discussion list of the IUCr Committee for the Maintenance of the CIF Standard (COMCIFS)
Cc: Bollinger, John C
Subject: Re: magCIF - policy advice requested

Dear Colleagues,

   The only addition I can see that a DDL1 dictionary would make for a DDLm magCIF would be to provide non-vector element-by-element tags for the things that would be naturally presented as vectors in DDLm.  Let's just keep things simple and provide the element-by-element tags directly in the DDLm dictionary with the methods to build and unbuild the vector presentations.  Then there is just one dictionary to maintain, but software that is not yet
CIF-2 aware can be presented
with fully CIF 1.1 compliant files without the need for a duplicate DDL1 dictionary.

   On the issue of dotted notation -- it is perfectly legal in CIF1 file, so there is no need to create extra names with underscores as aliases for any new tags in the dotted notation.  However, to really use DDLm fully, we should try to define dotted notation names clarifying the identification of categories for any existing underscore-only name in a way that lets software function well even when cut-off from other dictionaries.  Most of that job was already done by John W. in the core-cif mapping into the mmCIF dictionary.  Let's just keep adding to that very useful tool.

   On the SQL issue.  The real issue is not so much the dots, but the important conceptual exercise of normalizing the schema which has major practical implications in the design and implementation of databases.  As we move more and more into cross-disciplinary data-mining, keeping the dictionaries SQL-friendly will be increasingly important.

   I urge that we try to move forward, not backwards, and make try to use DDLm as our primary vehicle for presentation of new dictionaries, and that we do so in database-friendly ways.

   Regards,
     Herbert

On 6/2/14 9:46 AM, Bollinger, John C wrote:
> It is certainly true that DDL2 and DDLm provide great support for defining relational data, in the sense of the data definition subset of SQL.  However, let us not conflate CIF DDL capabilities for defining data structure and relationships with the actual structure and relationships, explicit and implicit, in any given dictionary.  One can define poorly-structured data in DDL2 or DDLm, and one can define well-structured data in DDL1.
>
> To me, the importance of the choice of CIF DDL hinges on the current and mid-term future importance of dictionary-driven software to the target community.  If such software is immediately relevant, such as for publication processes, then a DDL that has good current support is the best choice now.  With a modicum of care devoted to the dictionary's structure, it should be possible to write it so that it can be easily converted to DDLm when the time comes.
>
> On the other hand, if dictionary-driven software is not a significant current concern for magCIF, then DDLm with suitable aliases to existing DDL1-defined items is a viable choice.  I suppose the expectation here would be that most instance documents would make heavy use of the aliases, thus yielding some dotted names and some undotted.  That doesn't bother me.
>
> On the third hand, would it be useful to write dual definitions of magCIF from the get-go, one in DDL1 and the other in DDLm?  That should be less onerous than converting an existing dictionary, and it would ensure a DDLm-compatible data model.  At the same time, it would work with current DDL1 software, and it would avoid any need for performing a full conversion in the future.
>
>
> John
>
> --
> John C. Bollinger, Ph.D.
> Computing and X-Ray Scientist
> Department of Structural Biology
> St. Jude Children's Research Hospital
> John.Bollinger@StJude.org
> (901) 595-3166 [office]
> www.stjude.org
>
>
>
>
> -----Original Message-----
> From: comcifs-bounces@iucr.org [mailto:comcifs-bounces@iucr.org] On
> Behalf Of Herbert J. Bernstein
> Sent: Saturday, May 31, 2014 2:48 PM
> To: Discussion list of the IUCr Committee for the Maintenance of the
> CIF Standard (COMCIFS)
> Subject: Re: magCIF - policy advice requested. .
>
> Dear Colleagues,
>
>    We have three DDLs:  DDL1, DDL2 and DDLm.  I thought the ultimate objective was to move to DDLm so that we are working with a common
> DDL.   The advantage of DDL2 over DDL1 is that it encourages
> SQL-friendly normalization of tables.  DDLm attempts to merge DDL1 with DDL2 while retaining good SQL support.  If a new dictionary is not going to a DDL2 dictionary, then I would urge that it be a DDLm dictionary, rather than a DDL1 dictionary.
>
>    Regards,
>      Herbert
>
> On Fri, May 30, 2014 at 11:47 AM, David Brown<idbrown@mcmaster.ca>  wrote:
>
>> James,
>>
>> I would just like to echo John B's suggestion that we write magCIF in DDL1.
>> magCIF will be used largely for inorganic materials rather than
>> proteins and so fits more naturally into the same group as coreCIF
>> which already contains some symmetry items transferred and converted from symCIF.
>>
>> David
>>
>>
>>
>>
>> On 5/30/2014 10:08 AM, James Hester wrote:
>>
>> Hello John and others:
>>
>> On Wed, May 28, 2014 at 11:58 PM, Bollinger, John C
>> <John.Bollinger@stjude.org>  wrote:
>>
>>> Hello James,
>>>
>>>
>>>
>>> When you said “the authors wish it to be a single, coherent
>>> document,” did you not mean that the mag_CIF dictionary should
>>> contain its own entries for all the wanted data items already
>>> covered by the core, modulated, and symmetry dictionaries (naming
>>> convention aside)?  If that’s indeed the case then it seems there is
>>> already a commitment to duplicate/convert some definitions.  I can’t
>>> say that I’m altogether thrilled by that idea, but there is
>>> well-established precedent.  Whichever DDL is employed, naming
>>> conventions and aliases are probably a lesser issue for the
>>> conversion, relative to translating other details of the modulated
>>> dictionary (into
>>> DDL2) or the symmetry dictionary (into DDL1).
>>>
>>
>> No, what I meant was that the magCIF definitions should be contained
>> in a single dictionary.  You see, one foolproof solution to the
>> current problem would be to make the magCIF dictionary 3 dictionaries:
>> one an extension to coreCIF, one an extension to ms_CIF, and one an
>> extension to symCIF.  My talk of a coherent document was in contrast
>> to that alternative - sorry for the confusion.  Duplicating datanames
>> is something I think we should avoid (will discuss in separate post).
>>
>>>
>>>
>>> I suppose that using non-conventional data names in a DDL2
>>> dictionary might be an amusing exercise to shake out bugs in CIF
>>> software, but that sort of exercise is unlikely to be well received
>>> if it indeed does shake out any bugs.  Mag CIF is going to be an
>>> interesting enough beast already if typical instance documents can
>>> be expected to use mixed data name conventions.  And really, dealing
>>> with aliases is nothing new even with DDL1 dictionaries.  There are
>>> several data names even in the core dictionary that have been
>>> deprecated in favor of preferred alternatives.  Resilient CIF software must deal with both (all) alternatives.
>>>
>>
>> The only way for resilient CIF software to deal with all alternatives
>> is to have an update mechanism linked to the IUCr dictionary
>> register, and parse any new or updated dictionaries to find out what
>> new names an older name might appear as, and presumably cache the
>> results if there is access to local storage.  I think this is within
>> the realms of possibility for well-designed, network-connected
>> applications, and perhaps even our standard CIFAPI might be able to
>> build in some similar functionality.  Absent such an API, we are
>> expecting CIF authors to do quite a lot more than just parse a file
>> to pull out a dataname.  Let's not forget that the majority of
>> scientific software is written by one or two people whose main
>> concern is not dealing with getting the data in and out but with doing something fancy in between.
>>
>> I agree that there may be issues with DDL2 dictionary software (see
>> below), but I'm guessing that this is a much smaller user base than
>> the application software.
>>
>>
>>> Anyway, what is the point of creating a dictionary using any given
>>> DDL formalism at all, if not to allow for dictionary-aware
>>> applications to interpret, validate, and otherwise process CIF
>>> documents written against that dictionary?  If the point is to serve
>>> dictionary-driven software, then should not the dictionary’s form be
>>> chosen to work as smoothly as possible with such software?  For a
>>> DDL2 dictionary, I think that means following
>>> DDL2 convention for data names.
>>>
>>
>> I'm happy to chalk up inability to use DDL2 dictionary-aware software
>> that relies on dots as a cost of what I'm proposing.  I have only a
>> poor idea of what this software might be, and how useful it might be
>> to the magCIF community, and how easy it would be to replace/rewrite
>> to ignore dots, so how great a cost this is needs some input from
>> those who know about such software.
>>
>>
>>> An alternative that bears consideration, however, is to use DDL1 for
>>> mag CIF.  There would be only 29 items to convert from the symmetry
>>> dictionary, if even all of them were needed.  The names could be
>>> converted or not.  That seems an easier route for compiling the
>>> dictionary; the question is whether the resulting dictionary would
>>> serve all the purposes for which it is being built.  Would it?
>>>
>>>
>>>
>> An intriguing idea.The essential problem evaporates, as symCIF-aware
>> programs would also be coreCIF-aware programs and happy with mixed
>> dataname conventions, and DDL1-dictionary-aware programs are blind to
>> dots. I am strongly against dataname duplication (separate post
>> coming) so would not want to see any redefinitions of symCIF datanames.
>>
>> An interesting alternative would be a DDLm dictionary, where we can
>> establish our own convention and are not breaking any software.  In
>> my opinion, the 'dot' convention should be a choice of the dictionary
>> authors, not dictated by the DDL but I'm open to explanations of why
>> this should not be the case.
>>
>> James.
>>
>>>
>>>
>>> John
>>>
>>>
>>>
>>> --
>>>
>>> John C. Bollinger, Ph.D.
>>>
>>> Computing and X-Ray Scientist
>>>
>>> Department of Structural Biology
>>>
>>> St. Jude Children's Research Hospital
>>>
>>> John.Bollinger@StJude.org
>>>
>>> (901) 595-3166 [office]
>>>
>>> www.stjude.org
>>>
>>>
>>>
>>>
>>>
>>> From: comcifs-bounces@iucr.org [mailto:comcifs-bounces@iucr.org] On
>>> Behalf Of James Hester
>>> Sent: Wednesday, May 28, 2014 1:30 AM
>>> To: Discussion list of the IUCr Committee for the Maintenance of the
>>> CIF Standard (COMCIFS)
>>> Subject: Re: magCIF - policy advice requested. .
>>>
>>>
>>>
>>> Hi Herbert,
>>>
>>> I believe you are advocating duplicating part or all of the
>>> modulated structures dictionary (~100 datanames) within the magnetic
>>> structures dictionary, with aliases as necessary.  As far as I can
>>> see, this buys us no more than 'a loop can be written so that all datanames have dots in them'.
>>> I do not even say 'must be written', because the aliases mean that
>>> you could continue to use the old-style datanames.
>>>
>>> Regarding confusion, and the lack of it in the case of mmCIF/core
>>> CIF: the presence of two datanames (the mmCIF version and the core
>>> CIF version) for each concept in core CIF has not caused confusion
>>> and extra work for the simple reason that there are clear workflow
>>> and software demarcations when doing macromolecular work and
>>> chemical crystallography. Programmers, being aware of this divide,
>>> work with the appropriate datanames. This demarcation is not a
>>> result of anything that COMCIFS have done and therefore the lack of
>>> confusion is not something that can be taken for granted when moving to a different community.
>>>
>>> In contrast to the macromolecular/small molecule case, the modulated
>>> structures community and the magnetic structures community are
>>> closely intertwined to the extent that the same programs are used (e.g. JANA).
>>> Unlike the macromolecular/core CIF case, the program user does not
>>> in general know whether they are reading/writing a CIF intended for
>>> a modulated structures or a magnetic structures or plain core CIF
>>> consumer.  Therefore, if ms_cif is rewritten in DDL2, all programs
>>> must now be rewritten/recompiled/redistributed to read and write
>>> both styles of datanames.  And what about the fact that many
>>> programs that ingest these magCIFs will be ordinary
>>> non-magnetic-aware programs expecting core CIF DDL1-style datanames for e.g. the atom positions?
>>>
>>> A first cost benefit analysis then looks like this:
>>>
>>> Costs: rewriting 100 definitions and any software that
>>> inputs/outputs those datanames and core CIF datanames
>>>
>>> Benefits: all datanames in a loop can have dots in them
>>>
>>>
>>>
>>> On the face of it, these costs outweigh the benefit by several
>>> orders of magnitude.
>>>
>>> As a postscript, I don't know if we quite appreciate the fact that
>>> once we have defined a dataname, it is almost impossible to winkle
>>> it out of software.  Changing a dictionary from DDL1 style to dotted
>>> datanames has never been done before (I would assert that mmCIF
>>> started with a clean slate as their community path was PDB ->
>>> mmCIF, not core CIF ->  mmCIF.  And it has only taken 15 years to
>>> get that to start to happen.)  The best I think we can do is to
>>> provide a solid and widely-adopted CIF API that can apply aliases
>>> behind the scenes, in which case we can have a little more confidence in adoption of replacement datanames.
>>>
>>> all the best,
>>> James.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, May 28, 2014 at 1:43 PM, Herbert J. Bernstein
>>> <yayahjb@gmail.com>
>>> wrote:
>>>
>>> Dear James,
>>>
>>>
>>>
>>>    It need not cause any confusion.  The core names already in the
>>> mmCIF
>>>
>>> dictionary have not.  Small molecule people use the undotted names.
>>>
>>> Macromolecular people use the dotted names.  If we simply added
>>> aliases
>>>
>>> for the modulated structures to the mmCIF dictionary (which probably
>>>
>>> should be done anyway) we end up with nice clean magCIF loops and
>>>
>>> little or no confusion for modulated structure cifs.
>>>
>>>
>>>
>>>    Regards,
>>>
>>>      Herbert
>>>
>>>
>>>
>>> On Tuesday, May 27, 2014, James Hester<jamesrhester@gmail.com>  wrote:
>>>
>>> I expect that the magCIF writers would write their datanames to
>>> match that part of mmCIF that reproduces core CIF.  The only issue
>>> then becomes the
>>> (DDL1) modulated structures dictionary. As you suggest, the
>>> modulated structures dictionary could be rewritten with DDL2-style
>>> names, but I don't believe that this additional work is necessary.
>>> It would also create unwelcome confusion in the community as to
>>> which modulated structure datanames should be used.
>>>
>>>
>>>
>>> On Tue, May 27, 2014 at 10:36 PM, Herbert J. Bernstein
>>> <yayahjb@gmail.com>
>>> wrote:
>>>
>>> My own inclination would be to follow the approach followed by mmcif
>>> which provides a rather complete dotted notation mapping of the core
>>> so you end up with much cleaner looking loop headers.
>>>
>>> Regards,
>>> Herbert
>>>
>>> Sent from my Xperia™ smartphone
>>>
>>>
>>>
>>> James Hester<jamesrhester@gmail.com>  wrote:
>>>
>>> Dear COMCIFS members and advisers:
>>>
>>> I am pleased to advise that a CIF dictionary for description of
>>> magnetic structures (magCIF) is currently in preparation and it is
>>> expected that a final draft could be ready before the IUCr meeting.
>>> This has raised a policy issue for COMCIFS that we need to deal with
>>> in a timely way.
>>>
>>> By its nature, the magCIF dictionary builds on the definitions in
>>> the core CIF dictionary, modulated structures CIF dictionary, and
>>> symmetry CIF dictionary (including extending looped categories).  At
>>> the same time, the authors wish it to be a single, coherent document.
>>> Core CIF and the modulated structures dictionary use DDL1 naming
>>> conventions, whereas symCIF is a DDL2 dictionary with DDL2 naming
>>> conventions. For coherence and convenience, the authors of magCIF
>>> should clearly use a single DDL and naming convention.
>>>
>>> My inclination is to recommend writing magCIF using DDL2.
>>> Semantically, this will mean that certain DDL2 concepts (e.g. 'key')
>>> will be implicitly imposed on DDL1 datanames.  This mapping is
>>> however straightforward and implied by the presence of 'aliases' in
>>> mmCIF and other DDL2 dictionaries
>>>
>>> More trivially, this approach will result in some loops that have
>>> names not containing a period mixed with names that do contain a
>>> period, and non-looped datanames in the CIF data file will also
>>> contain mixtures of such names. I note that the use of a period to
>>> separate category and item is purely conventional and is not
>>> syntactically or semantically required by the DDL that the
>>> dictionary is written in, so I do not consider this to be a problem.
>>>
>>> A further advantage of DDL2-style names is that when magCIF is
>>> translated into DDLm at some not-too-distant point, the same names
>>> can be used (as DDLm naming conventions are the same as DDL2 naming
>>> conventions) and software written with the DDL2 magCIF dictionary in
>>> mind will not require updating to handle files written against the
>>> 'new' DDLm magCIF.
>>>
>>> Does anybody see any issues with this recommendation?
>>>
>>> James.
>>>
>>>
>>> --
>>> T +61 (02) 9717 9907
>>> F +61 (02) 9717 3145
>>> M +61 (04) 0249 4148
>>>
>>>
>>> _______________________________________________
>>> comcifs mailing list
>>> comcifs@iucr.org
>>> http://mailman.iucr.org/mailman/listinfo/comcifs
>>>
>>>
>>>
>>>
>>> --
>>> T +61 (02) 9717 9907
>>> F +61 (02) 9717 3145
>>> M +61 (04) 0249 4148
>>>
>>>
>>> _______________________________________________
>>> comcifs mailing list
>>> comcifs@iucr.org
>>> http://mailman.iucr.org/mailman/listinfo/comcifs
>>>
>>>
>>>
>>>
>>> --
>>> T +61 (02) 9717 9907
>>> F +61 (02) 9717 3145
>>> M +61 (04) 0249 4148
>>>
>>>
>>> ________________________________
>>> Email Disclaimer: www.stjude.org/emaildisclaimer Consultation
>>> Disclaimer: www.stjude.org/consultationdisclaimer
>>>
>>> _______________________________________________
>>> comcifs mailing list
>>> comcifs@iucr.org
>>> http://mailman.iucr.org/mailman/listinfo/comcifs
>>>
>>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>>
>>
>> _______________________________________________
>> comcifs mailing list
>> comcifs@iucr.org
>> http://mailman.iucr.org/mailman/listinfo/comcifs
>>
>>
>>
>> _______________________________________________
>> comcifs mailing list
>> comcifs@iucr.org
>> http://mailman.iucr.org/mailman/listinfo/comcifs
>>
>>
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://mailman.iucr.org/mailman/listinfo/comcifs
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://mailman.iucr.org/mailman/listinfo/comcifs
>


_______________________________________________comcifs mailing listcomcifs@iucr.orghttp://mailman.iucr.org/mailman/listinfo/comcifs

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.