Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Relationship asmong CIF2, STAR, CIF1 and Python

Dear Colleagues,

   I have read both Brian's latest messages, and find them very helpful. 
It now appears that we have several distinct, but related languages to 
understand, define or refine (please pardon the notational changes)

   STAR1
   CIF1
   dREL
   DDLm
   STAR2
   CIF2

in such a way that the existing bases of data and software for STAR1 and 
CIF1 can, in some sense be brought forward into a world of STAR2 and CIF2 
and in which CIF2 works "well" with dREL and DDLm.  I think (or at least 
hope) that we agree on that objective.

The point at which I disagree with some of Brian's remarks is whether it 
is best to go forward from this point with a bias towards accepting the 
changes to make CIF2 that were previously agreed, or whether it is best to 
go forward from this point without such a presumption, reviewing the 
entire stucture with an eye towards best functionality.  My view does not 
mean we ignore what has been done.  Some or all of the earlier decisions 
may well turn out to be the best final decisions, but they may well turn 
out not to be.  We won't know until and unless we make such a zero-based 
review.

Certainly we have a lot to gain in that process from working with what has 
been done.  I use James's parser and it is very useful, but I think, in 
view the fact that there will be major changes to exsiting datasets and 
aoftware involved in going to the currently proposed version of CIF2, we 
have an obligation to try to make that part of the overall transition as 
close to "right" as we can make it.

On the other-hand, the DDLm-based dictionaries themselves impact a much 
smaller community, provided we adopt one very important dictum, the one 
already posted on the IUCr web site:

"No changes are required in exsiting archival data files in order to apply 
domain dictionaries written in DDLm"

We need to have a chunk of software that will allow exsiting coreCIF and 
mmCIF and imgCIF and other CIF1 data files to be validated using DDLm/dREL 
based dictionaries.  There are many possible ways in which to skin that 
particular cat:

1.  We can make DDLm to DDL1 and DDL2 dictionary converters

2.  We can make CIF1 to CIF2 data converters

3.  We can make APIs that will allow exsiting CIF1-based applications to 
become CIF2 and DDLm-aware.

...

I hope and expect that we will do all of the above, but item 1, while the 
least powerful, is a critical necessity to truly meet the promise on the 
IUCr web page, and, because a DDLm dictionary will be in a strong sense a 
CIF2 document, will help us to really work with CIF2 and get it right.

Therefore, I suggest we _not_ put CIF2 forward for general use at this 
time, but try to pull together just enough of CIF2, under the name 
DDLm-2011 to be able to prototype the first round of DDLm dictionaries 
along with a DDLm-2011 to DDL1 translator and a DDLm-2011 to DDL2 
translator and explcitly tell the commnity that the DDLm-2011 format is 
_not_ yet recommended for general use for data files because it is subject 
to possibly significant changes in the future.  This will help us to gain 
experience with a CIF2-candidate within a limited community and to try to 
get it "right," but will also allow the entire community to start gaining 
a benefit from the work done thus far without a major conversion of 
existing data sets to a format that seems highly likely to change.

Regards,
     Herbert



=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Sat, 15 Jan 2011, Brian McMahon wrote:

> OK, the nature of my particular misunderstanding about the STAR/CIF
> relationship that came to light in our offline discussions is roughly
> the following:
>
> CIF1 is essentially a proper subset of the STAR format published as:
>  Hall, S. R. (1991). The STAR File: a new format for electronic data
>      transfer and archiving. J. Chem. Inf. Comput. Sci. 31, 326-333;
>  Hall, S. R. & Spadaccini, N. (1994). The STAR File: detailed
>      specifications. J. Chem. Inf. Comput. Sci. 34, 505-508
> and described in detail in Chapter 2.1 of International Tables Volume G.
>
> This version of STAR is used in the molecular information file, also
> documented in Volume G ("used" is probably overstating the case; the
> only application I know that outputs MIF content is the CCDC, which
> uses tokens from the MIF and CIF core dictionaries but ignores
> saveframe pointers and nested loops to create files that are
> syntactically perfectly valid CIFs). There is also nmrSTAR used
> extensively by BioMagResBank that has supporting libraries and database
> applications. Also some small-scale experiments in the botanical field
> (Syd's association with FloraBase) and a couple of demonstrator
> applications that, so far as I am aware, were never developed (e.g.
> in quantum chemistry).
>
> In prototyping dREL and a DDLm, Syd, Nick and Ian Castleden made
> ad hoc changes to the STAR syntax to get a workable implementation.
> (Since their prototyping engine used Jython, they achieved runtime
> efficiencies by implementing changes that were practicable with
> Python, echoes of which we're seeing and actively discussing today.
> Whether their choice was farsighted or purely accidental I don't
> know.) Let us call this ad hoc version STAR+1 - it was a set of
> practical syntactic features that would be used mostly in dREL methods
> but also in proto-DDLm dictionaries, proto-dREL and appropriately
> modified data files to test the novel methods approach. Most of this
> work dates back about 10 years. The syntactic changes were not
> formally published - they were practical "work in progress", though ny
> the end of this cycle it was conceivable that they could have been
> systematised and written up as a proper "STAR+1".
>
> Since COMCIFS took on the task of developing CIF2/DDLm for
> crystallography (i.e. the work of this group), we have discussed and
> agreed many further changes from the original STAR syntax, much of
> this with active involvement from Nick. When, some time back, Nick
> said (whether just to me or on the list I don't now remember) that
> he was focussing on writing up for publication a revised STAR paper, I
> took that to mean that he wanted to freeze the further modifications
> that had been agreed to that point as a "STAR+2". From that point I
> was reluctant to see CIF diverge further from the then-current syntax,
> and was looking forward to Nick's preprint which would document
> clearly what that was. I was mistaken - Nick's current project is to
> write up "STAR+1", leaving open the prospect of further changes to
> "STAR+2" as required.
>
> Note that even "STAR+1" never existed - Nick's paper will be a
> retrospective consolidation of one set of changes adopted for practical
> prototyping. In the same way, "STAR+2" need not exist until we
> actually have a satisfactory CIF2 format that we can retrofit -
> if that's actually required - to a second-generation STAR complete
> with saveframes and the rest. Such a "requirement", in my mind, would
> have to do with an actual need to retain compatibility with those
> other STAR applications (MIF, FloraBase etc.) that I mentioned before.
> Realistically, that's probably not going to happen.
>
> I think that most people on this list have been much quicker than me
> to see that demonstrably useful syntax changes should still be made
> without undue conservatism. The result is that we have been pulling
> together roughly in the same direction (not always *exactly* in the
> same direction) and have made real progress.
>
> I'm embarrassed by my misunderstanding, and were we to revisit some of
> our discussions I might now take another view (but only "might").
> But as I argue elsewhere I think we're better moving on to test the
> consequences of the solutions we've agreed to adopt, and being open to
> future revisions in the light of experience, rather than re-running
> past hypotheticals.
>
> Best wishes
> Brian
>
> On Thu, Jan 13, 2011 at 12:17:41PM -0500, Herbert J. Bernstein wrote:
>> James has requeested that I formally send a message to this list
>> about a matter discussed recently in independent email in order
>> to ensure a record.  At first I declined to do so, but after
>> reflection, I have decided to do as James has asked.
>>
>> I have withdrawn my vote in COMCIFS in support of CIF2 going
>> forward at this time.  I have done so because, after emails
>> from Nick and Brian, it has become clear to me that I was
>> making false assumptions about the relationship between
>> CIF2 and STAR.  I believe that a zero-based discussion is
>> now needed on what the relationship should be among CIF2,
>> STAR, CIF1 and Python to best serve the interests
>> of the crystallographic community.  I do not know what
>> is best and do not know how long such a discussion may take.
>> I leave it to James, Nick and Brian to decide if Nick's and
>> Brian's messages should be posted on this list for the record.
>>
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.