[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF-2 changes

Dear Nick,

   Thank you.  What we need is a clean clear precise document that will
serve two purposes:

   1.  Show software writers the rules for the documents that their 
software will have to process; and
   2.  Show users the rules for the documents that they will be allowed to 
write.

With the divergence between CIF 1.1 and CIF 2 and the application of the
aliases, this has now become challenging.  We want the software writers to 
handle both CIF 1.1 and CIF 2 documents, but we want users to be 
encouraged to write clean CIF 2 documents, not relying on aliases except
when necessary.

   Regards,
     Herbert
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Wed, 18 Nov 2009, Nick Spadaccini wrote:

> I will do this shortly. I will write it for CIF2, and separately for STAR
> which will be slightly different with regarded to accepted encoding
> (attempting to not preclude any other encodings - but I am sure we finally
> agreed UTF-8 would be the ONLY CIF2 encoding). The other STAR application I
> will need to specify (broadly at this stage because the detail is enormous)
> is DDLm which will be syntactly identical to CIF2 (I can see no other reason
> for it not to be).
>
>
> On 17/11/09 11:23 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
> wrote:
>
>> Dear Colleagues,
>>
>>    Would somebody please put one firm, final and complete specification of
>> the charactar set, syntax and semantics of what may appear in a CIF
>> document to be processed against a CIF2 dictionary on the table?
>>
>>    At the very least, may we have one firm, final and complete
>> specification of the character set, syntax and semantics of the tags that
>> may appear in a CIF document to be processed against a CIF2 dictionary?
>>
>>    Then, we should have a meeting to decide if we all understand and agree
>> on whatever it is that is being proposed.
>>
>>    Coding against vague and changing specifications is one classic cause of
>> failure of software engineering efforts.
>>
>>    Regards,
>>      Herbert
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
>>
>> On Tue, 17 Nov 2009, David Brown wrote:
>>
>>> I support Nick's summary and I don't see any problem with datanames.  In CIF2
>>> we use only the restricted ASCII set.  When reading a CIF1 file we need to be
>>> aware that [] may appear in a dataname but using the the alias list we
>>> immediately convert it to the corresponding CIF2 dataname for all further
>>> purposes.  A CIF2 file will only have fully conforming datanames.
>>>
>>> I am not sure why Nick is so concerned over about implicit recognition of
>>> mislabelled CIF1 files.  All CIF2 files are required to be so identified in
>>> the first line.  Anything without the identifier is presumed to be CIF1.  Any
>>> other combination is illegal, but if one want to recover from such an error
>>> there are two possibilities:
>>>
>>> 1. A CIF2 is not so identified.  This should not raise any problems since the
>>> datanames will all be recognized and reading can carry on as normal.  Or
>>> would this stumble over lists etc. which it might not be expecting even
>>> though it would find all the necessary information in the dictionary?
>>>
>>> 2. A CIF1 is labelled as CIF2.  This would seem to be highly unlikely and
>>> would be in danger of crashing in certain situations.  This is the only case
>>> where Nick's rescue package would be used but it would be rarely needed.
>>>
>>> David
>>>
>>>
>>>
>>> Herbert J. Bernstein wrote:
>>>
>>>> Dear Nick,
>>>>
>>>>   The question on which we are circling is the valid data names.  Look over
>>>> the chain of emails -- every possible combination is still on the
>>>> table.  We need to get everybody to sign on to one clear, complete and
>>>> final specification.
>>>>
>>>>   We need a meeting.
>>>>
>>>>   Regards,
>>>>     Herbert
>>>>
>>>> =====================================================
>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>
>>>>                  +1-631-244-3035
>>>>                  yaya@dowling.edu
>>>> =====================================================
>>>>
>>>> On Tue, 17 Nov 2009, Nick Spadaccini wrote:
>>>>
>>>>>
>>>>> Sorry Herb, how is this full circle with no agreement? I have suggested we
>>>>> adopt almost all of the changes we discussed except that with David's
>>>>> option
>>>>> we can now simply enforce a more limited character set on data names so
>>>>> that
>>>>> the parsing problems within dREL for names with included [] are now
>>>>> eliminated, while still making it possible to handle legacy names. As a
>>>>> consequence of not needing to support [] in names we can now revert back
>>>>> to
>>>>> using them for list delimiters.
>>>>>
>>>>> The latter is the only circle, otherwise what was generally agreed in
>>>>> discussion is still there. My discussion below was for STAR which is the
>>>>> superset of CIF. The CIF2 specific stuff is still on the table.
>>>>>
>>>>> Have I missed something?
>>>>>
>>>>> On 17/11/09 9:11 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
>>>>> wrote:
>>>>>
>>>>>> Dear Colleagues,
>>>>>>
>>>>>>    We have now come full circle with no agreement on anything.  I for
>>>>>> one,
>>>>>> for the reasons outlined in many prior messages, do not think this
>>>>>> latest (=oldest) approach to be a good idea.
>>>>>>
>>>>>>    Clearly, if we are ever to resolve this, we need to get all the
>>>>>> players
>>>>>> into a meeting at one time and work things out.  I suspect we will not be
>>>>>> able to arrange a timely physical meeting.  Perhaps some sort of an
>>>>>> emeeting (Ajax, Skype or somesuch) would work.
>>>>>>
>>>>>>    Regards,
>>>>>>      Herbert
>>>>>>
>>>>>> =====================================================
>>>>>>   Herbert J. Bernstein, Professor of Computer Science
>>>>>>     Dowling College, Kramer Science Center, KSC 121
>>>>>>          Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>
>>>>>>                   +1-631-244-3035
>>>>>>                   yaya@dowling.edu
>>>>>> =====================================================
>>>>>>
>>>>>> On Tue, 17 Nov 2009, Nick Spadaccini wrote:
>>>>>>
>>>>>>> David?s Option 3 is the simplest way forward, and actually revisits much
>>>>>>> of
>>>>>>> what was discussed back in 2007-08. Somehow those discussions were
>>>>>>> locked
>>>>>>> far back in my brain, only to be awakened by David?s summary. Thanks for
>>>>>>> that.
>>>>>>>
>>>>>>> So now I return to the STAR syntax. DDLm is part of STAR and hence
>>>>>>> restrictions on data names so they can be parsed etc is a STAR issue. I
>>>>>>> am
>>>>>>> brought around to Joe?s idea that STAR accepts any 8 bit character
>>>>>>> sequence
>>>>>>> since that is the most complete set ? and that this will be restricted
>>>>>>> to
>>>>>>> UTF-8 within the CIF specification. Any other adoptee of STAR can choose
>>>>>>> whatever restricted encoding they wish.
>>>>>>>
>>>>>>> I still need to treat data names as programming identifiers within dREL
>>>>>>> so
>>>>>>> accordingly I propose we restrict the data names in STAR (and all
>>>>>>> variants)
>>>>>>> to be ASCII [A-Za-z0-9_.] as we have used in the sample dictionaries,
>>>>>>> DDLm
>>>>>>> and dREL.
>>>>>>>
>>>>>>> The data values will be represented as discussed in previous threads and
>>>>>>> that the reverse solidus and the token delimiters discussed will be
>>>>>>> ASCII
>>>>>>> characters. We can now return to [] as the list delimiters, and {} as
>>>>>>> the
>>>>>>> associative array delimiters.
>>>>>>>
>>>>>>> Backward compatibility to CIF1 names is handled by exploiting the _alias
>>>>>>> attributes in the definition. A CIF2 parser with dictionary can handle
>>>>>>> everything. Any CIF1 parser can handle CIF1 data files (also CIF2 data
>>>>>>> files
>>>>>>> up to a point, but won?t know what the data names mean ? unless they
>>>>>>> have
>>>>>>> hardcoded it).
>>>>>>>
>>>>>>> A CIF2 parser would like a leading comment to tell it what sort of file
>>>>>>> it
>>>>>>> is parsing. It the absence of that comment, a pre-scan will need to be
>>>>>>> done.
>>>>>>> The telltale indicators it is a CIF1 data file are multiple occurrences
>>>>>>> of,
>>>>>>>
>>>>>>> (1) data names that potentially contain [] or /
>>>>>>> (2) unquoted strings with illegal characters
>>>>>>> (3) quoted strings that result in parse failure (typically because they
>>>>>>> must
>>>>>>> have an embedded [but not elided] quote character as allowed in CIF1).
>>>>>>>
>>>>>>> It needs to be a pre-scan because all 3 of the above in an identified
>>>>>>> CIF2
>>>>>>> data file would result in something quite different since there are
>>>>>>> coercion
>>>>>>> rules for when the whitespace separator is missing.
>>>>>>>
>>>>>>> For instance IF I KNOW it is a CIF2 file and I read
>>>>>>>
>>>>>>> _name[1]
>>>>>>>
>>>>>>> Then this can only be an error and I coerce into
>>>>>>>
>>>>>>> _name   [1]
>>>>>>>
>>>>>>> IF I DON?T KNOW the file type, the occurrence of _name[1] flags it as
>>>>>>> potentially a CIF1 file. If _name[1] is in an alias list, this
>>>>>>> re-enforces
>>>>>>> the likelihood of CIF1. Multiple instances of these ?errors? (or any
>>>>>>> others
>>>>>>> in the above list) indicate it is a CIF1 file (my only other conclusion
>>>>>>> would be it is a VERY BADLY written CIF2).
>>>>>>>
>>>>>>> I think this takes us back to a very simple rule set, and I don?t think
>>>>>>> the
>>>>>>> restriction in the character set for data names will cause problems. For
>>>>>>> all
>>>>>>> the excitement of UTF-8 etc I know of programming languages that support
>>>>>>> reading and writing data in such encodings but I haven?t seen one that
>>>>>>> allows/encourages one to write programmes declaring identifiers in UTF-8
>>>>>>> character sets. (They well exist I just haven?t seen them).
>>>>>>>
>>>>>>>
>>>>>>> On 17/11/09 12:04 AM, "David Brown" <idbrown@mcmaster.ca> wrote:
>>>>>>>
>>>>>>>> James,
>>>>>>>>
>>>>>>>> There seems to be a lull in the discussions on CIF2 syntax so this
>>>>>>>> would be
>>>>>>>> a
>>>>>>>> good time for you, or appointed chosen by you, to summarize where we
>>>>>>>> are at
>>>>>>>> and propose a set of rules that will can work with as we move forward.
>>>>>>>> I
>>>>>>>> realize that much of the work I have already done on dictionaries will
>>>>>>>> need
>>>>>>>> to
>>>>>>>> be revisited, and Herbert also seems anxious to have some decisions on
>>>>>>>> the
>>>>>>>> various topics that have been discussed.
>>>>>>>>
>>>>>>>> I believe we have a consensus on a number of points, but these need to
>>>>>>>> be
>>>>>>>> written down clearly and need our formal agreement so we can move
>>>>>>>> ahead.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ddlm-group mailing list
>>>>>>>> ddlm-group@iucr.org
>>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>
>>>>>>>
>>>>>>> cheers
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> --------------------------------
>>>>>>> Associate Professor N. Spadaccini, PhD
>>>>>>> School of Computer Science & Software Engineering
>>>>>>>
>>>>>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>>>>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>>>>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>>>>>>> MBDP  M002
>>>>>>>
>>>>>>> CRICOS Provider Code: 00126G
>>>>>>>
>>>>>>> e: Nick.Spadaccini@uwa.edu.au
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> cheers
>>>>>
>>>>> Nick
>>>>>
>>>>> --------------------------------
>>>>> Associate Professor N. Spadaccini, PhD
>>>>> School of Computer Science & Software Engineering
>>>>>
>>>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>>>>> MBDP  M002
>>>>>
>>>>> CRICOS Provider Code: 00126G
>>>>>
>>>>> e: Nick.Spadaccini@uwa.edu.au
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>
>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
>
> CRICOS Provider Code: 00126G
>
> e: Nick.Spadaccini@uwa.edu.au
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]