Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF-2 changes

Dear Colleagues,

   Would somebody please put one firm, final and complete specification of 
the charactar set, syntax and semantics of what may appear in a CIF 
document to be processed against a CIF2 dictionary on the table?

   At the very least, may we have one firm, final and complete 
specification of the character set, syntax and semantics of the tags that 
may appear in a CIF document to be processed against a CIF2 dictionary?

   Then, we should have a meeting to decide if we all understand and agree 
on whatever it is that is being proposed.

   Coding against vague and changing specifications is one classic cause of 
failure of software engineering efforts.

   Regards,
     Herbert
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Tue, 17 Nov 2009, David Brown wrote:

> I support Nick's summary and I don't see any problem with datanames.  In CIF2 
> we use only the restricted ASCII set.  When reading a CIF1 file we need to be 
> aware that [] may appear in a dataname but using the the alias list we 
> immediately convert it to the corresponding CIF2 dataname for all further 
> purposes.  A CIF2 file will only have fully conforming datanames.
>
> I am not sure why Nick is so concerned over about implicit recognition of 
> mislabelled CIF1 files.  All CIF2 files are required to be so identified in 
> the first line.  Anything without the identifier is presumed to be CIF1.  Any 
> other combination is illegal, but if one want to recover from such an error 
> there are two possibilities:
>
> 1. A CIF2 is not so identified.  This should not raise any problems since the 
> datanames will all be recognized and reading can carry on as normal.  Or 
> would this stumble over lists etc. which it might not be expecting even 
> though it would find all the necessary information in the dictionary?
>
> 2. A CIF1 is labelled as CIF2.  This would seem to be highly unlikely and 
> would be in danger of crashing in certain situations.  This is the only case 
> where Nick's rescue package would be used but it would be rarely needed.
>
> David
>
>
>
> Herbert J. Bernstein wrote:
>
>> Dear Nick,
>>
>>   The question on which we are circling is the valid data names.  Look over 
>> the chain of emails -- every possible combination is still on the
>> table.  We need to get everybody to sign on to one clear, complete and
>> final specification.
>>
>>   We need a meeting.
>>
>>   Regards,
>>     Herbert
>> 
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>    Dowling College, Kramer Science Center, KSC 121
>>         Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                  +1-631-244-3035
>>                  yaya@dowling.edu
>> =====================================================
>> 
>> On Tue, 17 Nov 2009, Nick Spadaccini wrote:
>> 
>>> 
>>> Sorry Herb, how is this full circle with no agreement? I have suggested we
>>> adopt almost all of the changes we discussed except that with David's 
>>> option
>>> we can now simply enforce a more limited character set on data names so 
>>> that
>>> the parsing problems within dREL for names with included [] are now
>>> eliminated, while still making it possible to handle legacy names. As a
>>> consequence of not needing to support [] in names we can now revert back 
>>> to
>>> using them for list delimiters.
>>> 
>>> The latter is the only circle, otherwise what was generally agreed in
>>> discussion is still there. My discussion below was for STAR which is the
>>> superset of CIF. The CIF2 specific stuff is still on the table.
>>> 
>>> Have I missed something?
>>> 
>>> On 17/11/09 9:11 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
>>> wrote:
>>> 
>>>> Dear Colleagues,
>>>>
>>>>    We have now come full circle with no agreement on anything.  I for 
>>>> one,
>>>> for the reasons outlined in many prior messages, do not think this
>>>> latest (=oldest) approach to be a good idea.
>>>>
>>>>    Clearly, if we are ever to resolve this, we need to get all the 
>>>> players
>>>> into a meeting at one time and work things out.  I suspect we will not be
>>>> able to arrange a timely physical meeting.  Perhaps some sort of an
>>>> emeeting (Ajax, Skype or somesuch) would work.
>>>>
>>>>    Regards,
>>>>      Herbert
>>>> 
>>>> =====================================================
>>>>   Herbert J. Bernstein, Professor of Computer Science
>>>>     Dowling College, Kramer Science Center, KSC 121
>>>>          Idle Hour Blvd, Oakdale, NY, 11769
>>>>
>>>>                   +1-631-244-3035
>>>>                   yaya@dowling.edu
>>>> =====================================================
>>>> 
>>>> On Tue, 17 Nov 2009, Nick Spadaccini wrote:
>>>> 
>>>>> Davidąs Option 3 is the simplest way forward, and actually revisits much 
>>>>> of
>>>>> what was discussed back in 2007-08. Somehow those discussions were 
>>>>> locked
>>>>> far back in my brain, only to be awakened by Davidąs summary. Thanks for
>>>>> that.
>>>>> 
>>>>> So now I return to the STAR syntax. DDLm is part of STAR and hence
>>>>> restrictions on data names so they can be parsed etc is a STAR issue. I 
>>>>> am
>>>>> brought around to Joeąs idea that STAR accepts any 8 bit character 
>>>>> sequence
>>>>> since that is the most complete set ­ and that this will be restricted 
>>>>> to
>>>>> UTF-8 within the CIF specification. Any other adoptee of STAR can choose
>>>>> whatever restricted encoding they wish.
>>>>> 
>>>>> I still need to treat data names as programming identifiers within dREL 
>>>>> so
>>>>> accordingly I propose we restrict the data names in STAR (and all 
>>>>> variants)
>>>>> to be ASCII [A-Za-z0-9_.] as we have used in the sample dictionaries, 
>>>>> DDLm
>>>>> and dREL.
>>>>> 
>>>>> The data values will be represented as discussed in previous threads and
>>>>> that the reverse solidus and the token delimiters discussed will be 
>>>>> ASCII
>>>>> characters. We can now return to [] as the list delimiters, and {} as 
>>>>> the
>>>>> associative array delimiters.
>>>>> 
>>>>> Backward compatibility to CIF1 names is handled by exploiting the _alias
>>>>> attributes in the definition. A CIF2 parser with dictionary can handle
>>>>> everything. Any CIF1 parser can handle CIF1 data files (also CIF2 data 
>>>>> files
>>>>> up to a point, but wonąt know what the data names mean ­ unless they 
>>>>> have
>>>>> hardcoded it).
>>>>> 
>>>>> A CIF2 parser would like a leading comment to tell it what sort of file 
>>>>> it
>>>>> is parsing. It the absence of that comment, a pre-scan will need to be 
>>>>> done.
>>>>> The telltale indicators it is a CIF1 data file are multiple occurrences 
>>>>> of,
>>>>> 
>>>>> (1) data names that potentially contain [] or /
>>>>> (2) unquoted strings with illegal characters
>>>>> (3) quoted strings that result in parse failure (typically because they 
>>>>> must
>>>>> have an embedded [but not elided] quote character as allowed in CIF1).
>>>>> 
>>>>> It needs to be a pre-scan because all 3 of the above in an identified 
>>>>> CIF2
>>>>> data file would result in something quite different since there are 
>>>>> coercion
>>>>> rules for when the whitespace separator is missing.
>>>>> 
>>>>> For instance IF I KNOW it is a CIF2 file and I read
>>>>> 
>>>>> _name[1]
>>>>> 
>>>>> Then this can only be an error and I coerce into
>>>>> 
>>>>> _name   [1]
>>>>> 
>>>>> IF I DONąT KNOW the file type, the occurrence of _name[1] flags it as
>>>>> potentially a CIF1 file. If _name[1] is in an alias list, this 
>>>>> re-enforces
>>>>> the likelihood of CIF1. Multiple instances of these łerrors˛ (or any 
>>>>> others
>>>>> in the above list) indicate it is a CIF1 file (my only other conclusion
>>>>> would be it is a VERY BADLY written CIF2).
>>>>> 
>>>>> I think this takes us back to a very simple rule set, and I donąt think 
>>>>> the
>>>>> restriction in the character set for data names will cause problems. For 
>>>>> all
>>>>> the excitement of UTF-8 etc I know of programming languages that support
>>>>> reading and writing data in such encodings but I havenąt seen one that
>>>>> allows/encourages one to write programmes declaring identifiers in UTF-8
>>>>> character sets. (They well exist I just havenąt seen them).
>>>>> 
>>>>> 
>>>>> On 17/11/09 12:04 AM, "David Brown" <idbrown@mcmaster.ca> wrote:
>>>>> 
>>>>>> James,
>>>>>> 
>>>>>> There seems to be a lull in the discussions on CIF2 syntax so this 
>>>>>> would be
>>>>>> a
>>>>>> good time for you, or appointed chosen by you, to summarize where we 
>>>>>> are at
>>>>>> and propose a set of rules that will can work with as we move forward. 
>>>>>> I
>>>>>> realize that much of the work I have already done on dictionaries will 
>>>>>> need
>>>>>> to
>>>>>> be revisited, and Herbert also seems anxious to have some decisions on 
>>>>>> the
>>>>>> various topics that have been discussed.
>>>>>> 
>>>>>> I believe we have a consensus on a number of points, but these need to 
>>>>>> be
>>>>>> written down clearly and need our formal agreement so we can move 
>>>>>> ahead.
>>>>>> 
>>>>>> David
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> ddlm-group mailing list
>>>>>> ddlm-group@iucr.org
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>> 
>>>>> 
>>>>> cheers
>>>>> 
>>>>> Nick
>>>>> 
>>>>> --------------------------------
>>>>> Associate Professor N. Spadaccini, PhD
>>>>> School of Computer Science & Software Engineering
>>>>> 
>>>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>>>>> MBDP  M002
>>>>> 
>>>>> CRICOS Provider Code: 00126G
>>>>> 
>>>>> e: Nick.Spadaccini@uwa.edu.au
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> cheers
>>> 
>>> Nick
>>> 
>>> --------------------------------
>>> Associate Professor N. Spadaccini, PhD
>>> School of Computer Science & Software Engineering
>>> 
>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>>> MBDP  M002
>>> 
>>> CRICOS Provider Code: 00126G
>>> 
>>> e: Nick.Spadaccini@uwa.edu.au
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> 
>> 
>>> 
>> 
>> ------------------------------------------------------------------------
>> 
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> 
>
>
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.