Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Relationship asmong CIF2, STAR, CIF1 and Python

I strongly reject Herbert's suggestion for a zero-based review or for
an intermediate "DDLm-2011" version that will help us iron out
supposed problems.  Nick, Syd and Ian developed a coherent, working
system 10 years ago, including rewriting the core dictionary in DDLm
and developing attendant software. Everything worked as promised.  I
was shown this system in Florence in 2005, then asked to review it in
2007, in the process producing my own proof-of-concept system which
operated in a different way to Nick's using the same dictionaries.  A
number of modifications resulted from my review. David Brown has
produced a draft coreCIF dictionary written in DDLm and has not
encountered any show-stopping issues.  In other words, this system has
been reviewed and tested extensively and to propose yet another review
shows a cavalier disregard for the amount of time that has already
been put into DDLm.  The system works.

Furthermore, we do not need a DDLm to DDL1/2 translator, as pointed
out tirelessly by David Brown last year and this year.  The alias
mechanism used in DDL2 is adequate to the task.

A CIF1 to CIF2 data converter is trivial, involving at most reading in
a file as CIF1 and then outputting it as CIF2.

In short, I have yet to see any justification for returning CIF2 to committee.

James.
On Sun, Jan 16, 2011 at 12:30 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
> Dear Colleagues,
>
>   I have read both Brian's latest messages, and find them very helpful.
> It now appears that we have several distinct, but related languages to
> understand, define or refine (please pardon the notational changes)
>
>   STAR1
>   CIF1
>   dREL
>   DDLm
>   STAR2
>   CIF2
>
> in such a way that the existing bases of data and software for STAR1 and
> CIF1 can, in some sense be brought forward into a world of STAR2 and CIF2
> and in which CIF2 works "well" with dREL and DDLm.  I think (or at least
> hope) that we agree on that objective.
>
> The point at which I disagree with some of Brian's remarks is whether it
> is best to go forward from this point with a bias towards accepting the
> changes to make CIF2 that were previously agreed, or whether it is best to
> go forward from this point without such a presumption, reviewing the
> entire stucture with an eye towards best functionality.  My view does not
> mean we ignore what has been done.  Some or all of the earlier decisions
> may well turn out to be the best final decisions, but they may well turn
> out not to be.  We won't know until and unless we make such a zero-based
> review.
>
> Certainly we have a lot to gain in that process from working with what has
> been done.  I use James's parser and it is very useful, but I think, in
> view the fact that there will be major changes to exsiting datasets and
> aoftware involved in going to the currently proposed version of CIF2, we
> have an obligation to try to make that part of the overall transition as
> close to "right" as we can make it.
>
> On the other-hand, the DDLm-based dictionaries themselves impact a much
> smaller community, provided we adopt one very important dictum, the one
> already posted on the IUCr web site:
>
> "No changes are required in exsiting archival data files in order to apply
> domain dictionaries written in DDLm"
>
> We need to have a chunk of software that will allow exsiting coreCIF and
> mmCIF and imgCIF and other CIF1 data files to be validated using DDLm/dREL
> based dictionaries.  There are many possible ways in which to skin that
> particular cat:
>
> 1.  We can make DDLm to DDL1 and DDL2 dictionary converters
>
> 2.  We can make CIF1 to CIF2 data converters
>
> 3.  We can make APIs that will allow exsiting CIF1-based applications to
> become CIF2 and DDLm-aware.
>
> ...
>
> I hope and expect that we will do all of the above, but item 1, while the
> least powerful, is a critical necessity to truly meet the promise on the
> IUCr web page, and, because a DDLm dictionary will be in a strong sense a
> CIF2 document, will help us to really work with CIF2 and get it right.
>
> Therefore, I suggest we _not_ put CIF2 forward for general use at this
> time, but try to pull together just enough of CIF2, under the name
> DDLm-2011 to be able to prototype the first round of DDLm dictionaries
> along with a DDLm-2011 to DDL1 translator and a DDLm-2011 to DDL2
> translator and explcitly tell the commnity that the DDLm-2011 format is
> _not_ yet recommended for general use for data files because it is subject
> to possibly significant changes in the future.  This will help us to gain
> experience with a CIF2-candidate within a limited community and to try to
> get it "right," but will also allow the entire community to start gaining
> a benefit from the work done thus far without a major conversion of
> existing data sets to a format that seems highly likely to change.
>
> Regards,
>     Herbert
>
>
>
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                  +1-631-244-3035
>                  yaya@dowling.edu
> =====================================================
>
> On Sat, 15 Jan 2011, Brian McMahon wrote:
>
>> OK, the nature of my particular misunderstanding about the STAR/CIF
>> relationship that came to light in our offline discussions is roughly
>> the following:
>>
>> CIF1 is essentially a proper subset of the STAR format published as:
>>  Hall, S. R. (1991). The STAR File: a new format for electronic data
>>      transfer and archiving. J. Chem. Inf. Comput. Sci. 31, 326-333;
>>  Hall, S. R. & Spadaccini, N. (1994). The STAR File: detailed
>>      specifications. J. Chem. Inf. Comput. Sci. 34, 505-508
>> and described in detail in Chapter 2.1 of International Tables Volume G.
>>
>> This version of STAR is used in the molecular information file, also
>> documented in Volume G ("used" is probably overstating the case; the
>> only application I know that outputs MIF content is the CCDC, which
>> uses tokens from the MIF and CIF core dictionaries but ignores
>> saveframe pointers and nested loops to create files that are
>> syntactically perfectly valid CIFs). There is also nmrSTAR used
>> extensively by BioMagResBank that has supporting libraries and database
>> applications. Also some small-scale experiments in the botanical field
>> (Syd's association with FloraBase) and a couple of demonstrator
>> applications that, so far as I am aware, were never developed (e.g.
>> in quantum chemistry).
>>
>> In prototyping dREL and a DDLm, Syd, Nick and Ian Castleden made
>> ad hoc changes to the STAR syntax to get a workable implementation.
>> (Since their prototyping engine used Jython, they achieved runtime
>> efficiencies by implementing changes that were practicable with
>> Python, echoes of which we're seeing and actively discussing today.
>> Whether their choice was farsighted or purely accidental I don't
>> know.) Let us call this ad hoc version STAR+1 - it was a set of
>> practical syntactic features that would be used mostly in dREL methods
>> but also in proto-DDLm dictionaries, proto-dREL and appropriately
>> modified data files to test the novel methods approach. Most of this
>> work dates back about 10 years. The syntactic changes were not
>> formally published - they were practical "work in progress", though ny
>> the end of this cycle it was conceivable that they could have been
>> systematised and written up as a proper "STAR+1".
>>
>> Since COMCIFS took on the task of developing CIF2/DDLm for
>> crystallography (i.e. the work of this group), we have discussed and
>> agreed many further changes from the original STAR syntax, much of
>> this with active involvement from Nick. When, some time back, Nick
>> said (whether just to me or on the list I don't now remember) that
>> he was focussing on writing up for publication a revised STAR paper, I
>> took that to mean that he wanted to freeze the further modifications
>> that had been agreed to that point as a "STAR+2". From that point I
>> was reluctant to see CIF diverge further from the then-current syntax,
>> and was looking forward to Nick's preprint which would document
>> clearly what that was. I was mistaken - Nick's current project is to
>> write up "STAR+1", leaving open the prospect of further changes to
>> "STAR+2" as required.
>>
>> Note that even "STAR+1" never existed - Nick's paper will be a
>> retrospective consolidation of one set of changes adopted for practical
>> prototyping. In the same way, "STAR+2" need not exist until we
>> actually have a satisfactory CIF2 format that we can retrofit -
>> if that's actually required - to a second-generation STAR complete
>> with saveframes and the rest. Such a "requirement", in my mind, would
>> have to do with an actual need to retain compatibility with those
>> other STAR applications (MIF, FloraBase etc.) that I mentioned before.
>> Realistically, that's probably not going to happen.
>>
>> I think that most people on this list have been much quicker than me
>> to see that demonstrably useful syntax changes should still be made
>> without undue conservatism. The result is that we have been pulling
>> together roughly in the same direction (not always *exactly* in the
>> same direction) and have made real progress.
>>
>> I'm embarrassed by my misunderstanding, and were we to revisit some of
>> our discussions I might now take another view (but only "might").
>> But as I argue elsewhere I think we're better moving on to test the
>> consequences of the solutions we've agreed to adopt, and being open to
>> future revisions in the light of experience, rather than re-running
>> past hypotheticals.
>>
>> Best wishes
>> Brian
>>
>> On Thu, Jan 13, 2011 at 12:17:41PM -0500, Herbert J. Bernstein wrote:
>>> James has requeested that I formally send a message to this list
>>> about a matter discussed recently in independent email in order
>>> to ensure a record.  At first I declined to do so, but after
>>> reflection, I have decided to do as James has asked.
>>>
>>> I have withdrawn my vote in COMCIFS in support of CIF2 going
>>> forward at this time.  I have done so because, after emails
>>> from Nick and Brian, it has become clear to me that I was
>>> making false assumptions about the relationship between
>>> CIF2 and STAR.  I believe that a zero-based discussion is
>>> now needed on what the relationship should be among CIF2,
>>> STAR, CIF1 and Python to best serve the interests
>>> of the crystallographic community.  I do not know what
>>> is best and do not know how long such a discussion may take.
>>> I leave it to James, Nick and Brian to decide if Nick's and
>>> Brian's messages should be posted on this list for the record.
>>>
>>> =====================================================
>>>   Herbert J. Bernstein, Professor of Computer Science
>>>     Dowling College, Kramer Science Center, KSC 121
>>>          Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                   +1-631-244-3035
>>>                   yaya@dowling.edu
>>> =====================================================
>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.