[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] CIF-2 changes
- To: [email protected], Group finalising DDLm and associated dictionaries <[email protected]>
- Subject: Re: [ddlm-group] CIF-2 changes
- From: "Herbert J. Bernstein" <[email protected]>
- Date: Wed, 18 Nov 2009 07:16:05 -0500 (EST)
- In-Reply-To: <C7297866.124AC%[email protected]>
- References: <C7297866.124AC%[email protected]>
Dear Nick,
Thank you. What we need is a clean clear precise document that will
serve two purposes:
1. Show software writers the rules for the documents that their
software will have to process; and
2. Show users the rules for the documents that they will be allowed to
write.
With the divergence between CIF 1.1 and CIF 2 and the application of the
aliases, this has now become challenging. We want the software writers to
handle both CIF 1.1 and CIF 2 documents, but we want users to be
encouraged to write clean CIF 2 documents, not relying on aliases except
when necessary.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[email protected]
=====================================================
On Wed, 18 Nov 2009, Nick Spadaccini wrote:
> I will do this shortly. I will write it for CIF2, and separately for STAR
> which will be slightly different with regarded to accepted encoding
> (attempting to not preclude any other encodings - but I am sure we finally
> agreed UTF-8 would be the ONLY CIF2 encoding). The other STAR application I
> will need to specify (broadly at this stage because the detail is enormous)
> is DDLm which will be syntactly identical to CIF2 (I can see no other reason
> for it not to be).
>
>
> On 17/11/09 11:23 PM, "Herbert J. Bernstein" <[email protected]>
> wrote:
>
>> Dear Colleagues,
>>
>> Would somebody please put one firm, final and complete specification of
>> the charactar set, syntax and semantics of what may appear in a CIF
>> document to be processed against a CIF2 dictionary on the table?
>>
>> At the very least, may we have one firm, final and complete
>> specification of the character set, syntax and semantics of the tags that
>> may appear in a CIF document to be processed against a CIF2 dictionary?
>>
>> Then, we should have a meeting to decide if we all understand and agree
>> on whatever it is that is being proposed.
>>
>> Coding against vague and changing specifications is one classic cause of
>> failure of software engineering efforts.
>>
>> Regards,
>> Herbert
>> =====================================================
>> Herbert J. Bernstein, Professor of Computer Science
>> Dowling College, Kramer Science Center, KSC 121
>> Idle Hour Blvd, Oakdale, NY, 11769
>>
>> +1-631-244-3035
>> [email protected]
>> =====================================================
>>
>> On Tue, 17 Nov 2009, David Brown wrote:
>>
>>> I support Nick's summary and I don't see any problem with datanames. In CIF2
>>> we use only the restricted ASCII set. When reading a CIF1 file we need to be
>>> aware that [] may appear in a dataname but using the the alias list we
>>> immediately convert it to the corresponding CIF2 dataname for all further
>>> purposes. A CIF2 file will only have fully conforming datanames.
>>>
>>> I am not sure why Nick is so concerned over about implicit recognition of
>>> mislabelled CIF1 files. All CIF2 files are required to be so identified in
>>> the first line. Anything without the identifier is presumed to be CIF1. Any
>>> other combination is illegal, but if one want to recover from such an error
>>> there are two possibilities:
>>>
>>> 1. A CIF2 is not so identified. This should not raise any problems since the
>>> datanames will all be recognized and reading can carry on as normal. Or
>>> would this stumble over lists etc. which it might not be expecting even
>>> though it would find all the necessary information in the dictionary?
>>>
>>> 2. A CIF1 is labelled as CIF2. This would seem to be highly unlikely and
>>> would be in danger of crashing in certain situations. This is the only case
>>> where Nick's rescue package would be used but it would be rarely needed.
>>>
>>> David
>>>
>>>
>>>
>>> Herbert J. Bernstein wrote:
>>>
>>>> Dear Nick,
>>>>
>>>> The question on which we are circling is the valid data names. Look over
>>>> the chain of emails -- every possible combination is still on the
>>>> table. We need to get everybody to sign on to one clear, complete and
>>>> final specification.
>>>>
>>>> We need a meeting.
>>>>
>>>> Regards,
>>>> Herbert
>>>>
>>>> =====================================================
>>>> Herbert J. Bernstein, Professor of Computer Science
>>>> Dowling College, Kramer Science Center, KSC 121
>>>> Idle Hour Blvd, Oakdale, NY, 11769
>>>>
>>>> +1-631-244-3035
>>>> [email protected]
>>>> =====================================================
>>>>
>>>> On Tue, 17 Nov 2009, Nick Spadaccini wrote:
>>>>
>>>>>
>>>>> Sorry Herb, how is this full circle with no agreement? I have suggested we
>>>>> adopt almost all of the changes we discussed except that with David's
>>>>> option
>>>>> we can now simply enforce a more limited character set on data names so
>>>>> that
>>>>> the parsing problems within dREL for names with included [] are now
>>>>> eliminated, while still making it possible to handle legacy names. As a
>>>>> consequence of not needing to support [] in names we can now revert back
>>>>> to
>>>>> using them for list delimiters.
>>>>>
>>>>> The latter is the only circle, otherwise what was generally agreed in
>>>>> discussion is still there. My discussion below was for STAR which is the
>>>>> superset of CIF. The CIF2 specific stuff is still on the table.
>>>>>
>>>>> Have I missed something?
>>>>>
>>>>> On 17/11/09 9:11 PM, "Herbert J. Bernstein" <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Dear Colleagues,
>>>>>>
>>>>>> We have now come full circle with no agreement on anything. I for
>>>>>> one,
>>>>>> for the reasons outlined in many prior messages, do not think this
>>>>>> latest (=oldest) approach to be a good idea.
>>>>>>
>>>>>> Clearly, if we are ever to resolve this, we need to get all the
>>>>>> players
>>>>>> into a meeting at one time and work things out. I suspect we will not be
>>>>>> able to arrange a timely physical meeting. Perhaps some sort of an
>>>>>> emeeting (Ajax, Skype or somesuch) would work.
>>>>>>
>>>>>> Regards,
>>>>>> Herbert
>>>>>>
>>>>>> =====================================================
>>>>>> Herbert J. Bernstein, Professor of Computer Science
>>>>>> Dowling College, Kramer Science Center, KSC 121
>>>>>> Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>
>>>>>> +1-631-244-3035
>>>>>> [email protected]
>>>>>> =====================================================
>>>>>>
>>>>>> On Tue, 17 Nov 2009, Nick Spadaccini wrote:
>>>>>>
>>>>>>> David?s Option 3 is the simplest way forward, and actually revisits much
>>>>>>> of
>>>>>>> what was discussed back in 2007-08. Somehow those discussions were
>>>>>>> locked
>>>>>>> far back in my brain, only to be awakened by David?s summary. Thanks for
>>>>>>> that.
>>>>>>>
>>>>>>> So now I return to the STAR syntax. DDLm is part of STAR and hence
>>>>>>> restrictions on data names so they can be parsed etc is a STAR issue. I
>>>>>>> am
>>>>>>> brought around to Joe?s idea that STAR accepts any 8 bit character
>>>>>>> sequence
>>>>>>> since that is the most complete set ? and that this will be restricted
>>>>>>> to
>>>>>>> UTF-8 within the CIF specification. Any other adoptee of STAR can choose
>>>>>>> whatever restricted encoding they wish.
>>>>>>>
>>>>>>> I still need to treat data names as programming identifiers within dREL
>>>>>>> so
>>>>>>> accordingly I propose we restrict the data names in STAR (and all
>>>>>>> variants)
>>>>>>> to be ASCII [A-Za-z0-9_.] as we have used in the sample dictionaries,
>>>>>>> DDLm
>>>>>>> and dREL.
>>>>>>>
>>>>>>> The data values will be represented as discussed in previous threads and
>>>>>>> that the reverse solidus and the token delimiters discussed will be
>>>>>>> ASCII
>>>>>>> characters. We can now return to [] as the list delimiters, and {} as
>>>>>>> the
>>>>>>> associative array delimiters.
>>>>>>>
>>>>>>> Backward compatibility to CIF1 names is handled by exploiting the _alias
>>>>>>> attributes in the definition. A CIF2 parser with dictionary can handle
>>>>>>> everything. Any CIF1 parser can handle CIF1 data files (also CIF2 data
>>>>>>> files
>>>>>>> up to a point, but won?t know what the data names mean ? unless they
>>>>>>> have
>>>>>>> hardcoded it).
>>>>>>>
>>>>>>> A CIF2 parser would like a leading comment to tell it what sort of file
>>>>>>> it
>>>>>>> is parsing. It the absence of that comment, a pre-scan will need to be
>>>>>>> done.
>>>>>>> The telltale indicators it is a CIF1 data file are multiple occurrences
>>>>>>> of,
>>>>>>>
>>>>>>> (1) data names that potentially contain [] or /
>>>>>>> (2) unquoted strings with illegal characters
>>>>>>> (3) quoted strings that result in parse failure (typically because they
>>>>>>> must
>>>>>>> have an embedded [but not elided] quote character as allowed in CIF1).
>>>>>>>
>>>>>>> It needs to be a pre-scan because all 3 of the above in an identified
>>>>>>> CIF2
>>>>>>> data file would result in something quite different since there are
>>>>>>> coercion
>>>>>>> rules for when the whitespace separator is missing.
>>>>>>>
>>>>>>> For instance IF I KNOW it is a CIF2 file and I read
>>>>>>>
>>>>>>> _name[1]
>>>>>>>
>>>>>>> Then this can only be an error and I coerce into
>>>>>>>
>>>>>>> _name [1]
>>>>>>>
>>>>>>> IF I DON?T KNOW the file type, the occurrence of _name[1] flags it as
>>>>>>> potentially a CIF1 file. If _name[1] is in an alias list, this
>>>>>>> re-enforces
>>>>>>> the likelihood of CIF1. Multiple instances of these ?errors? (or any
>>>>>>> others
>>>>>>> in the above list) indicate it is a CIF1 file (my only other conclusion
>>>>>>> would be it is a VERY BADLY written CIF2).
>>>>>>>
>>>>>>> I think this takes us back to a very simple rule set, and I don?t think
>>>>>>> the
>>>>>>> restriction in the character set for data names will cause problems. For
>>>>>>> all
>>>>>>> the excitement of UTF-8 etc I know of programming languages that support
>>>>>>> reading and writing data in such encodings but I haven?t seen one that
>>>>>>> allows/encourages one to write programmes declaring identifiers in UTF-8
>>>>>>> character sets. (They well exist I just haven?t seen them).
>>>>>>>
>>>>>>>
>>>>>>> On 17/11/09 12:04 AM, "David Brown" <[email protected]> wrote:
>>>>>>>
>>>>>>>> James,
>>>>>>>>
>>>>>>>> There seems to be a lull in the discussions on CIF2 syntax so this
>>>>>>>> would be
>>>>>>>> a
>>>>>>>> good time for you, or appointed chosen by you, to summarize where we
>>>>>>>> are at
>>>>>>>> and propose a set of rules that will can work with as we move forward.
>>>>>>>> I
>>>>>>>> realize that much of the work I have already done on dictionaries will
>>>>>>>> need
>>>>>>>> to
>>>>>>>> be revisited, and Herbert also seems anxious to have some decisions on
>>>>>>>> the
>>>>>>>> various topics that have been discussed.
>>>>>>>>
>>>>>>>> I believe we have a consensus on a number of points, but these need to
>>>>>>>> be
>>>>>>>> written down clearly and need our formal agreement so we can move
>>>>>>>> ahead.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ddlm-group mailing list
>>>>>>>> [email protected]
>>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>
>>>>>>>
>>>>>>> cheers
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> --------------------------------
>>>>>>> Associate Professor N. Spadaccini, PhD
>>>>>>> School of Computer Science & Software Engineering
>>>>>>>
>>>>>>> The University of Western Australia t: +61 (0)8 6488 3452
>>>>>>> 35 Stirling Highway f: +61 (0)8 6488 1089
>>>>>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick
>>>>>>> MBDP M002
>>>>>>>
>>>>>>> CRICOS Provider Code: 00126G
>>>>>>>
>>>>>>> e: [email protected]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> cheers
>>>>>
>>>>> Nick
>>>>>
>>>>> --------------------------------
>>>>> Associate Professor N. Spadaccini, PhD
>>>>> School of Computer Science & Software Engineering
>>>>>
>>>>> The University of Western Australia t: +61 (0)8 6488 3452
>>>>> 35 Stirling Highway f: +61 (0)8 6488 1089
>>>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick
>>>>> MBDP M002
>>>>>
>>>>> CRICOS Provider Code: 00126G
>>>>>
>>>>> e: [email protected]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> [email protected]
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>
>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> [email protected]
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> ddlm-group mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia t: +61 (0)8 6488 3452
> 35 Stirling Highway f: +61 (0)8 6488 1089
> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick
> MBDP M002
>
> CRICOS Provider Code: 00126G
>
> e: [email protected]
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] CIF-2 changes (David Brown)
- References:
- Re: [ddlm-group] CIF-2 changes (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] CIF-2 changes
- Next by Date: Re: [ddlm-group] CIF-2 changes
- Prev by thread: Re: [ddlm-group] CIF-2 changes
- Next by thread: Re: [ddlm-group] CIF-2 changes
- Index(es):

