[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF-2 changes

To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] CIF-2 changes
From: David Brown <idbrown@mcmaster.ca>
Date: Thu, 12 Nov 2009 16:56:49 -0500
In-Reply-To: <4AFC7619.3040008@rcsb.rutgers.edu>
References: <C7208339.123F8%nick@csse.uwa.edu.au> <4AFAEE38.2020502@mcmaster.ca> <20091111122545.V9195@epsilon.pair.com><4AFC7619.3040008@rcsb.rutgers.edu>

Title:

John Westbrook wrote Do I understand your summary suggestion is to normalize/standardize naming based on new CIF2 naming conventions using the rewrite rules stored in the dictionary. This would permit recoding special characters like square braces [] so that these could become core parts of the syntax. This would then avoid having to make the curly brace substitutions for square braces. Would you envision that you would first pass through any file with a CIF1x parser to perform name substitutions and then re-parse using CIF2 supporting software? This approach will also require processing software to interpret dictionary content to get the name remapping. While dictionary development is the focus of this group, prior CIF applications have been able to parse data without access to dictionary content. If you wish to parse without consulting the dictionary, you need to hard code the datanames in the application. This can always be done, but keeping an application up to date requires adding new datanames as the dictionaries change. If you intend to work this way, the dictionaries become irrelevant. If you want to access methods you have to use the CIF2 dictionary so you might as well use it for locating the appropriate aliases. The whole point of using CIF dictionaries is that CIF can develop and change without invalidating earlier software. The change from CIF1 to CIF2 will require new software, but if we do the job properly the CIF2 applications should have a long life. David Herbert J. Bernstein wrote: Dear Colleagues, For many years to come, we will be dealing with both data sets and dctionaries in a mixture of CIF 1, CIF 1.1 and CIF 2 conventions, with dictionaries conforming to DDL1, DDL2, DDLm-2007, DDLm-2008, and DDLm-2009. We will need software that is multi-lingual, and David has suggested one good use of the alias mechanism to help in that process. That having been said, we still need to define cleanly and clearly where we want to end up after things are cleaned up and organized. To that end, I think it is reasonable to just define what is needed for CIF2/DDLm-2009 conformance, and then, as a separate issue, work out how best to provide the necessary multi-lingual software infrastructure. While I, as more of an incrementalist, would have preferred not to have gone the "maximally disruptive" route, that is what this group decided on. Having made that fundamental decision, it really is time to make some final (at least for a few years) decisions on what is properly in CIF2/DDLm-2009, tell the community about it and see if we can really use it. Right now what is up for the community to see (the August 2008 version) is clearly very far from what we are now discussing, and the web page http://www.iucr.org/resources/cif/ddl/ddlm has the explicit bold-faced statement "No changes are required in existing archival data files in order to apply domain dictionaries written in DDLm" David's third option will allow us to adopt Nick's changes and still deliver on that promise. There are some minor problems with random data sets that may have non-conforming non-delimited strings, or be using the CIF 1.1 line folding protocol. If that proves to be an issue we can provide front-ends that, in addition to doing the alias conversions, also quote non-compliant non-delimited strings and unfold folded lines, but most of the practical issues for journal CIFs will be resolved by just honoring the aliases at an early stage. I would suggest we both adopt Nick's changes and adopt David's third option, and do so promptly. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Wed, 11 Nov 2009, David Brown wrote: I must be missing something. I have followed all the discussion about allowed and disallowed characters, which I find fascinating, but what I think seems to be missing from this discussion is an understanding of how a CIF datafile is read using a dictionary. The problem of reading the dictionary is different. It contains only CIF2 datanames including those used in dREL. Period. If you think it is necessary to be able to used CIF1 datanames in dREL, then you must be expecting to write each method using CIF2 datanames, CIR1.0 datanames and CIF1.1 datanames, for a total of three different versions of the same expression. This does not include extra expressions using datanames that have been deprecated in favour of more suitable names . Nick seems to feel that we must abandon the idea that a CIF2 application should be able to read the earlier CIFs directly although the ability to read both CIF1.0 and CIF1.1 was the primary requirement that drove COMCIFS to accept DDLm. It should not be abandoned lightly - if anything we should abandon dREL first. It was to accommodate the ability to read the archive CIFs that _aliases were introduced into DDLm. So from my viewpoint (as a dictionary writer) we have the following options. 1 Abandon compatibility with CIF1 and require all the CIF1 datafiles to be converted to CIF2 files (if such a conversion is possible) before being fed into CIF2 application. I.e., we abandon the primary reason for introducing DDLm. 2. Allow CIF2 applications to read in CIF1 datafiles with all their non-conforming datanames, and duplicate all the methods to capture all possible combinations of CIF1 and CIF2 datanames (in general at least three versions of each method would be needed). 3. Make use of the _aliases in the CIF2 dictionaries to allow a CIF2 application to recognize any of the earlier CIF1 datanames and internally convert the name to the standard CIF2 dataname, which is also the (only) name that will appear in the dREL method. That is, we accept the multitude of earlier datanames and clean then up as soon as the old name is recognized. Options 1 and 3 are similar, the difference being that option 1 requires a separate program to generate a CIF2 datafile which is then read in, while option 3 does the same thing as part of the CIF reading routine. Under option 3 therefore, the ONLY time that CIF1 datanames would need to be read would be during the input of the CIF1 datafile. After that all references would use the CIF2 datanames. A parser that could recognize the earlier datanames could certainly be used to read a CIF2 dictionary as well as a CIF2 datafile. Option 3 is the most elegent way of handling the problem. In that way dREL never has to be concerned about embedded characters that CIF2 does not like. Options 2, is the only option that would require datanames with the disallowed characters in dREL, but it is the absurd case of cutting off your nose to spite your face. It is a wonderfully comples solution to a problem that does not even exist. David Nick Spadaccini wrote: Unfortunately David there seems to be a (yet confirmed) expectation that the existing CIF1 data names can be used in a new DDLm/dREL world. Hence the dilemma. On 10/11/09 10:53 PM, "David Brown" <idbrown@mcmaster.ca> wrote: Surely dREL is not compromised by what have been used as datanames in the past. dREL apppears only in CIF2 dictionaries and uses only the standard datanames that appear in the CIF2 dictionaries. The only place where it is necessary to be concerned about [] and / appearing in datanames is when reading in CIF1 data files. All the datanames that appear in the CIF1 dictionaries are aliased in the CIF2 dictionaries. This means that the the abilitiy to read datanames containing [] and / is only required when reading in CIF1 data files, not when reading dictionaries (the old datanames only appear in CIF2 dictionaries as delimited values in the _alias loops). At the point where the CIF value of _sint/lambda is read in, its internal name has in any case to be equivalenced to the CIF2 dictionary name (_sintoverlambda) which is data name used in the dREL stantements. Thus we are still free to place any limitations we choose on the datanames used in dREL (except for _ and . which, being punctuation, may cause problems with the names used in programming languages). However, the CIF2 dictionaries also define a _description.common dataname that contains only letters (and numbers?) and these names could be used just as easily in dREL if that were an advantage. David ------------------------------------------------------------------------ _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au ------------------------------------------------------------------------ _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group

begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] CIF-2 changes (Nick Spadaccini)

Re: [ddlm-group] CIF-2 changes (David Brown)

Re: [ddlm-group] CIF-2 changes (Herbert J. Bernstein)

Re: [ddlm-group] CIF-2 changes (John Westbrook)

Prev by Date: Re: [ddlm-group] CIF-2 changes

Next by Date: Re: [ddlm-group] CIF-2 changes

Prev by thread: Re: [ddlm-group] CIF-2 changes

Next by thread: Re: [ddlm-group] CIF-2 changes

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] CIF-2 changes