[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF-2 changes

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] CIF-2 changes
From: "Herbert J. Bernstein" <[email protected]>
Date: Wed, 11 Nov 2009 12:49:18 -0500 (EST)
In-Reply-To: <[email protected]>
References: <C7208339.123F8%[email protected]><[email protected]>

Dear Colleagues,

   For many years to come, we will be dealing with both data sets and 
dctionaries in a mixture of CIF 1, CIF 1.1 and CIF 2 conventions, with 
dictionaries conforming to DDL1, DDL2, DDLm-2007, DDLm-2008, and 
DDLm-2009.  We will need software that is multi-lingual, and David has 
suggested one good use of the alias mechanism to help in that process. 
That having been said, we still need to define cleanly and clearly where 
we want to end up after things are cleaned up and organized. To that end, 
I think it is reasonable to just define what is needed for CIF2/DDLm-2009 
conformance, and then, as a separate issue, work out how best to provide 
the necessary multi-lingual software infrastructure.

While I, as more of an incrementalist, would have preferred not to have 
gone the "maximally disruptive" route, that is what this group decided on. 
Having made that fundamental decision, it really is time to make some 
final (at least for a few years) decisions on what is properly in 
CIF2/DDLm-2009, tell the community about it and see if we can really use 
it.  Right now what is up for the community to see (the August 2008 
version) is clearly very far from what we are now discussing, and the web 
page http://www.iucr.org/resources/cif/ddl/ddlm has the explicit 
bold-faced statement

"No changes are required in existing archival data files in order to apply 
domain dictionaries written in DDLm"

David's third option will allow us to adopt Nick's changes and still 
deliver on that promise.  There are some minor problems with random
data sets that may have non-conforming non-delimited strings, or be
using the CIF 1.1 line folding protocol.  If that proves to be an
issue we can provide front-ends that, in addition to doing the alias
conversions, also quote non-compliant non-delimited strings and
unfold folded lines, but most of the practical issues for journal
CIFs will be resolved by just honoring the aliases at an early stage.

I would suggest we both adopt Nick's changes and adopt David's third 
option, and do so promptly.

Regards,
   Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  [email protected]
=====================================================

On Wed, 11 Nov 2009, David Brown wrote:

> I must be missing something.  I have followed all the discussion about 
> allowed and disallowed characters, which I find fascinating, but what I think 
> seems to be missing from this discussion is an understanding of how a CIF 
> datafile is read using a dictionary.  The problem of reading the dictionary 
> is different.  It contains only CIF2 datanames including those used in dREL. 
> Period.
>
> If you think it is necessary to be able to used CIF1 datanames in dREL, then 
> you must be expecting to write each method using CIF2 datanames, CIR1.0 
> datanames and CIF1.1 datanames, for a total of three different versions of 
> the same expression.  This does not include extra expressions using datanames 
> that have been deprecated in favour of more suitable names . 
> Nick seems to feel that we must abandon the idea that a CIF2 application 
> should be able to read the earlier CIFs directly although the ability to read 
> both CIF1.0 and CIF1.1 was the primary requirement that drove COMCIFS to 
> accept DDLm.  It should not be abandoned lightly - if anything we should 
> abandon dREL first.  It was to accommodate the ability to read the archive 
> CIFs that _aliases were introduced into DDLm. 
> So from my viewpoint (as a dictionary writer) we have the following options.
>
> 1 Abandon compatibility with CIF1 and require all the CIF1 datafiles to be 
> converted to CIF2 files (if such a conversion is possible) before being fed 
> into CIF2 application.  I.e., we abandon the primary reason for introducing 
> DDLm.
>
> 2. Allow CIF2 applications to read in CIF1 datafiles with all their 
> non-conforming datanames, and duplicate all the methods to capture all 
> possible combinations of CIF1 and CIF2 datanames (in general at least three 
> versions of each method would be needed).
>
> 3. Make use of the _aliases in the CIF2 dictionaries to allow a CIF2 
> application to recognize any of the earlier CIF1 datanames and internally 
> convert the name to the standard CIF2 dataname, which is also the (only) name 
> that will appear in the dREL method.  That is, we accept the multitude of 
> earlier datanames and clean then up as soon as the old name is recognized.
>
> Options 1 and 3 are similar, the difference being that option 1 requires a 
> separate program to generate a CIF2 datafile which is then read in, while 
> option 3 does the same thing as part of the CIF reading routine.  Under 
> option 3 therefore, the ONLY time that CIF1 datanames would need to be read 
> would be during the input of the CIF1 datafile.  After that all references 
> would use the CIF2 datanames.  A parser that could recognize the earlier 
> datanames could certainly be used to read a CIF2 dictionary as well as a CIF2 
> datafile.
>
> Option 3 is the most elegent way of handling the problem.  In that way dREL 
> never has to be concerned about embedded characters that CIF2 does not like.
>
> Options 2, is the only option that would require datanames with the 
> disallowed characters in dREL, but it is the absurd case of cutting off your 
> nose to spite your face.  It is a wonderfully comples solution to a problem 
> that does not even exist.
>
> David
>
>
> Nick Spadaccini wrote:
>
>> 
>> Unfortunately David there seems to be a (yet confirmed) expectation that 
>> the existing CIF1 data names can be used in a new DDLm/dREL world. Hence 
>> the dilemma.
>> 
>> On 10/11/09 10:53 PM, "David Brown" <[email protected]> wrote:
>>
>>     Surely dREL is not compromised by what have been used as datanames
>>     in the past.  dREL apppears only in CIF2 dictionaries and uses
>>     only the standard datanames that appear in the CIF2 dictionaries.
>>      The only place where it is necessary to be concerned about [] and
>>     / appearing in datanames is when reading in CIF1 data files.  All
>>     the datanames that appear in the  CIF1 dictionaries are aliased in
>>     the CIF2 dictionaries. This means that the the abilitiy to read
>>     datanames containing [] and / is only required when reading in
>>     CIF1 data files, not when reading dictionaries (the old datanames
>>     only appear in CIF2 dictionaries as delimited values in the _alias
>>     loops).  At the point where the CIF value of _sint/lambda is read
>>     in, its internal name has in any case to be equivalenced to the
>>     CIF2 dictionary name (_sintoverlambda) which is data name used in
>>     the dREL stantements.  Thus we are still free to place any
>>     limitations we choose on the datanames used in dREL (except for _
>>     and . which, being punctuation, may cause problems with the names
>>     used in programming languages).  However, the CIF2 dictionaries
>>     also define a _description.common dataname that contains only
>>     letters (and numbers?) and these names could be used just as
>>     easily in dREL if that were an advantage.
>>
>>     David
>>
>>     ------------------------------------------------------------------------
>>     _______________________________________________
>>     ddlm-group mailing list
>>     [email protected]
>>     http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> 
>> 
>> cheers
>> 
>> Nick
>> 
>> --------------------------------
>> Associate Professor N. Spadaccini, PhD
>> School of Computer Science & Software Engineering
>> 
>> The University of Western Australia    t: +61 (0)8 6488 3452
>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>> MBDP  M002
>> 
>> CRICOS Provider Code: 00126G
>> 
>> e: [email protected]
>> 
>> 
>> 
>> ------------------------------------------------------------------------
>> 
>> _______________________________________________
>> ddlm-group mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> 
>
>
>
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] CIF-2 changes (John Westbrook)

References:

Re: [ddlm-group] CIF-2 changes (Nick Spadaccini)

Re: [ddlm-group] CIF-2 changes (David Brown)

Prev by Date: Re: [ddlm-group] CIF-2 changes

Next by Date: Re: [ddlm-group] Relationship of CIF2 to legacy platforms

Prev by thread: Re: [ddlm-group] CIF-2 changes

Next by thread: Re: [ddlm-group] CIF-2 changes

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] CIF-2 changes