Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF-2 changes

Title:

John Westbrook wrote
Do I understand your summary suggestion is to normalize/standardize
naming based on new CIF2 naming conventions using the rewrite rules
stored in the dictionary.  This would permit recoding special characters
like square braces [] so that these could become core parts of the syntax.
This would then avoid having to make the curly brace substitutions for square
braces.

Would you envision that you would first pass through any
file with a CIF1x parser to perform name substitutions and then
re-parse using CIF2 supporting software?   This approach will also
require processing software to interpret dictionary content to get
the name remapping.   While dictionary development is the focus of
this group, prior CIF applications have been able to parse data without
access to dictionary content.

If you wish to parse without consulting the dictionary, you need to hard code the datanames in the application.  This can always be done, but keeping an application up to date requires adding new datanames as the dictionaries change.  If you intend to work this way, the dictionaries become irrelevant.  If you want to access methods you have to use the CIF2 dictionary so you might as well use it for locating the appropriate aliases.  The whole point of using CIF dictionaries is that CIF can develop and change without invalidating earlier software.  The change from CIF1 to CIF2 will require new software, but if we do the job properly the CIF2 applications should have a long life.

David


Herbert J. Bernstein wrote:
Dear Colleagues,

   For many years to come, we will be dealing with both data sets and 
dctionaries in a mixture of CIF 1, CIF 1.1 and CIF 2 conventions, with 
dictionaries conforming to DDL1, DDL2, DDLm-2007, DDLm-2008, and 
DDLm-2009.  We will need software that is multi-lingual, and David has 
suggested one good use of the alias mechanism to help in that process. 
That having been said, we still need to define cleanly and clearly where 
we want to end up after things are cleaned up and organized. To that end, 
I think it is reasonable to just define what is needed for CIF2/DDLm-2009 
conformance, and then, as a separate issue, work out how best to provide 
the necessary multi-lingual software infrastructure.

While I, as more of an incrementalist, would have preferred not to have 
gone the "maximally disruptive" route, that is what this group decided on. 
Having made that fundamental decision, it really is time to make some 
final (at least for a few years) decisions on what is properly in 
CIF2/DDLm-2009, tell the community about it and see if we can really use 
it.  Right now what is up for the community to see (the August 2008 
version) is clearly very far from what we are now discussing, and the web 
page http://www.iucr.org/resources/cif/ddl/ddlm has the explicit 
bold-faced statement

"No changes are required in existing archival data files in order to apply 
domain dictionaries written in DDLm"

David's third option will allow us to adopt Nick's changes and still 
deliver on that promise.  There are some minor problems with random
data sets that may have non-conforming non-delimited strings, or be
using the CIF 1.1 line folding protocol.  If that proves to be an
issue we can provide front-ends that, in addition to doing the alias
conversions, also quote non-compliant non-delimited strings and
unfold folded lines, but most of the practical issues for journal
CIFs will be resolved by just honoring the aliases at an early stage.

I would suggest we both adopt Nick's changes and adopt David's third 
option, and do so promptly.

Regards,
   Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Wed, 11 Nov 2009, David Brown wrote:

I must be missing something.  I have followed all the discussion about 
allowed and disallowed characters, which I find fascinating, but what I think 
seems to be missing from this discussion is an understanding of how a CIF 
datafile is read using a dictionary.  The problem of reading the dictionary 
is different.  It contains only CIF2 datanames including those used in dREL. 
Period.

If you think it is necessary to be able to used CIF1 datanames in dREL, then 
you must be expecting to write each method using CIF2 datanames, CIR1.0 
datanames and CIF1.1 datanames, for a total of three different versions of 
the same expression.  This does not include extra expressions using datanames 
that have been deprecated in favour of more suitable names . 
Nick seems to feel that we must abandon the idea that a CIF2 application 
should be able to read the earlier CIFs directly although the ability to read 
both CIF1.0 and CIF1.1 was the primary requirement that drove COMCIFS to 
accept DDLm.  It should not be abandoned lightly - if anything we should 
abandon dREL first.  It was to accommodate the ability to read the archive 
CIFs that _aliases were introduced into DDLm. 
So from my viewpoint (as a dictionary writer) we have the following options.

1 Abandon compatibility with CIF1 and require all the CIF1 datafiles to be 
converted to CIF2 files (if such a conversion is possible) before being fed 
into CIF2 application.  I.e., we abandon the primary reason for introducing 
DDLm.

2. Allow CIF2 applications to read in CIF1 datafiles with all their 
non-conforming datanames, and duplicate all the methods to capture all 
possible combinations of CIF1 and CIF2 datanames (in general at least three 
versions of each method would be needed).

3. Make use of the _aliases in the CIF2 dictionaries to allow a CIF2 
application to recognize any of the earlier CIF1 datanames and internally 
convert the name to the standard CIF2 dataname, which is also the (only) name 
that will appear in the dREL method.  That is, we accept the multitude of 
earlier datanames and clean then up as soon as the old name is recognized.

Options 1 and 3 are similar, the difference being that option 1 requires a 
separate program to generate a CIF2 datafile which is then read in, while 
option 3 does the same thing as part of the CIF reading routine.  Under 
option 3 therefore, the ONLY time that CIF1 datanames would need to be read 
would be during the input of the CIF1 datafile.  After that all references 
would use the CIF2 datanames.  A parser that could recognize the earlier 
datanames could certainly be used to read a CIF2 dictionary as well as a CIF2 
datafile.

Option 3 is the most elegent way of handling the problem.  In that way dREL 
never has to be concerned about embedded characters that CIF2 does not like.

Options 2, is the only option that would require datanames with the 
disallowed characters in dREL, but it is the absurd case of cutting off your 
nose to spite your face.  It is a wonderfully comples solution to a problem 
that does not even exist.

David


Nick Spadaccini wrote:

Unfortunately David there seems to be a (yet confirmed) expectation that 
the existing CIF1 data names can be used in a new DDLm/dREL world. Hence 
the dilemma.

On 10/11/09 10:53 PM, "David Brown" <idbrown@mcmaster.ca> wrote:

    Surely dREL is not compromised by what have been used as datanames
    in the past.  dREL apppears only in CIF2 dictionaries and uses
    only the standard datanames that appear in the CIF2 dictionaries.
     The only place where it is necessary to be concerned about [] and
    / appearing in datanames is when reading in CIF1 data files.  All
    the datanames that appear in the  CIF1 dictionaries are aliased in
    the CIF2 dictionaries. This means that the the abilitiy to read
    datanames containing [] and / is only required when reading in
    CIF1 data files, not when reading dictionaries (the old datanames
    only appear in CIF2 dictionaries as delimited values in the _alias
    loops).  At the point where the CIF value of _sint/lambda is read
    in, its internal name has in any case to be equivalenced to the
    CIF2 dictionary name (_sintoverlambda) which is data name used in
    the dREL stantements.  Thus we are still free to place any
    limitations we choose on the datanames used in dREL (except for _
    and . which, being punctuation, may cause problems with the names
    used in programming languages).  However, the CIF2 dictionaries
    also define a _description.common dataname that contains only
    letters (and numbers?) and these names could be used just as
    easily in dREL if that were an advantage.

    David

    ------------------------------------------------------------------------
    _______________________________________________
    ddlm-group mailing list
    ddlm-group@iucr.org
    http://scripts.iucr.org/mailman/listinfo/ddlm-group


cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au



------------------------------------------------------------------------

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

      
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
  


begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.