[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] CIF-2 changes
- To: Nick.Spadaccini@uwa.edu.au, Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] CIF-2 changes
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Wed, 18 Nov 2009 07:16:05 -0500 (EST)
- In-Reply-To: <C7297866.124AC%nick@csse.uwa.edu.au>
- References: <C7297866.124AC%nick@csse.uwa.edu.au>
Dear Nick, Thank you. What we need is a clean clear precise document that will serve two purposes: 1. Show software writers the rules for the documents that their software will have to process; and 2. Show users the rules for the documents that they will be allowed to write. With the divergence between CIF 1.1 and CIF 2 and the application of the aliases, this has now become challenging. We want the software writers to handle both CIF 1.1 and CIF 2 documents, but we want users to be encouraged to write clean CIF 2 documents, not relying on aliases except when necessary. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Wed, 18 Nov 2009, Nick Spadaccini wrote: > I will do this shortly. I will write it for CIF2, and separately for STAR > which will be slightly different with regarded to accepted encoding > (attempting to not preclude any other encodings - but I am sure we finally > agreed UTF-8 would be the ONLY CIF2 encoding). The other STAR application I > will need to specify (broadly at this stage because the detail is enormous) > is DDLm which will be syntactly identical to CIF2 (I can see no other reason > for it not to be). > > > On 17/11/09 11:23 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com> > wrote: > >> Dear Colleagues, >> >> Would somebody please put one firm, final and complete specification of >> the charactar set, syntax and semantics of what may appear in a CIF >> document to be processed against a CIF2 dictionary on the table? >> >> At the very least, may we have one firm, final and complete >> specification of the character set, syntax and semantics of the tags that >> may appear in a CIF document to be processed against a CIF2 dictionary? >> >> Then, we should have a meeting to decide if we all understand and agree >> on whatever it is that is being proposed. >> >> Coding against vague and changing specifications is one classic cause of >> failure of software engineering efforts. >> >> Regards, >> Herbert >> ===================================================== >> Herbert J. Bernstein, Professor of Computer Science >> Dowling College, Kramer Science Center, KSC 121 >> Idle Hour Blvd, Oakdale, NY, 11769 >> >> +1-631-244-3035 >> yaya@dowling.edu >> ===================================================== >> >> On Tue, 17 Nov 2009, David Brown wrote: >> >>> I support Nick's summary and I don't see any problem with datanames. In CIF2 >>> we use only the restricted ASCII set. When reading a CIF1 file we need to be >>> aware that [] may appear in a dataname but using the the alias list we >>> immediately convert it to the corresponding CIF2 dataname for all further >>> purposes. A CIF2 file will only have fully conforming datanames. >>> >>> I am not sure why Nick is so concerned over about implicit recognition of >>> mislabelled CIF1 files. All CIF2 files are required to be so identified in >>> the first line. Anything without the identifier is presumed to be CIF1. Any >>> other combination is illegal, but if one want to recover from such an error >>> there are two possibilities: >>> >>> 1. A CIF2 is not so identified. This should not raise any problems since the >>> datanames will all be recognized and reading can carry on as normal. Or >>> would this stumble over lists etc. which it might not be expecting even >>> though it would find all the necessary information in the dictionary? >>> >>> 2. A CIF1 is labelled as CIF2. This would seem to be highly unlikely and >>> would be in danger of crashing in certain situations. This is the only case >>> where Nick's rescue package would be used but it would be rarely needed. >>> >>> David >>> >>> >>> >>> Herbert J. Bernstein wrote: >>> >>>> Dear Nick, >>>> >>>> The question on which we are circling is the valid data names. Look over >>>> the chain of emails -- every possible combination is still on the >>>> table. We need to get everybody to sign on to one clear, complete and >>>> final specification. >>>> >>>> We need a meeting. >>>> >>>> Regards, >>>> Herbert >>>> >>>> ===================================================== >>>> Herbert J. Bernstein, Professor of Computer Science >>>> Dowling College, Kramer Science Center, KSC 121 >>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>> >>>> +1-631-244-3035 >>>> yaya@dowling.edu >>>> ===================================================== >>>> >>>> On Tue, 17 Nov 2009, Nick Spadaccini wrote: >>>> >>>>> >>>>> Sorry Herb, how is this full circle with no agreement? I have suggested we >>>>> adopt almost all of the changes we discussed except that with David's >>>>> option >>>>> we can now simply enforce a more limited character set on data names so >>>>> that >>>>> the parsing problems within dREL for names with included [] are now >>>>> eliminated, while still making it possible to handle legacy names. As a >>>>> consequence of not needing to support [] in names we can now revert back >>>>> to >>>>> using them for list delimiters. >>>>> >>>>> The latter is the only circle, otherwise what was generally agreed in >>>>> discussion is still there. My discussion below was for STAR which is the >>>>> superset of CIF. The CIF2 specific stuff is still on the table. >>>>> >>>>> Have I missed something? >>>>> >>>>> On 17/11/09 9:11 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com> >>>>> wrote: >>>>> >>>>>> Dear Colleagues, >>>>>> >>>>>> We have now come full circle with no agreement on anything. I for >>>>>> one, >>>>>> for the reasons outlined in many prior messages, do not think this >>>>>> latest (=oldest) approach to be a good idea. >>>>>> >>>>>> Clearly, if we are ever to resolve this, we need to get all the >>>>>> players >>>>>> into a meeting at one time and work things out. I suspect we will not be >>>>>> able to arrange a timely physical meeting. Perhaps some sort of an >>>>>> emeeting (Ajax, Skype or somesuch) would work. >>>>>> >>>>>> Regards, >>>>>> Herbert >>>>>> >>>>>> ===================================================== >>>>>> Herbert J. Bernstein, Professor of Computer Science >>>>>> Dowling College, Kramer Science Center, KSC 121 >>>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>>> >>>>>> +1-631-244-3035 >>>>>> yaya@dowling.edu >>>>>> ===================================================== >>>>>> >>>>>> On Tue, 17 Nov 2009, Nick Spadaccini wrote: >>>>>> >>>>>>> David?s Option 3 is the simplest way forward, and actually revisits much >>>>>>> of >>>>>>> what was discussed back in 2007-08. Somehow those discussions were >>>>>>> locked >>>>>>> far back in my brain, only to be awakened by David?s summary. Thanks for >>>>>>> that. >>>>>>> >>>>>>> So now I return to the STAR syntax. DDLm is part of STAR and hence >>>>>>> restrictions on data names so they can be parsed etc is a STAR issue. I >>>>>>> am >>>>>>> brought around to Joe?s idea that STAR accepts any 8 bit character >>>>>>> sequence >>>>>>> since that is the most complete set ? and that this will be restricted >>>>>>> to >>>>>>> UTF-8 within the CIF specification. Any other adoptee of STAR can choose >>>>>>> whatever restricted encoding they wish. >>>>>>> >>>>>>> I still need to treat data names as programming identifiers within dREL >>>>>>> so >>>>>>> accordingly I propose we restrict the data names in STAR (and all >>>>>>> variants) >>>>>>> to be ASCII [A-Za-z0-9_.] as we have used in the sample dictionaries, >>>>>>> DDLm >>>>>>> and dREL. >>>>>>> >>>>>>> The data values will be represented as discussed in previous threads and >>>>>>> that the reverse solidus and the token delimiters discussed will be >>>>>>> ASCII >>>>>>> characters. We can now return to [] as the list delimiters, and {} as >>>>>>> the >>>>>>> associative array delimiters. >>>>>>> >>>>>>> Backward compatibility to CIF1 names is handled by exploiting the _alias >>>>>>> attributes in the definition. A CIF2 parser with dictionary can handle >>>>>>> everything. Any CIF1 parser can handle CIF1 data files (also CIF2 data >>>>>>> files >>>>>>> up to a point, but won?t know what the data names mean ? unless they >>>>>>> have >>>>>>> hardcoded it). >>>>>>> >>>>>>> A CIF2 parser would like a leading comment to tell it what sort of file >>>>>>> it >>>>>>> is parsing. It the absence of that comment, a pre-scan will need to be >>>>>>> done. >>>>>>> The telltale indicators it is a CIF1 data file are multiple occurrences >>>>>>> of, >>>>>>> >>>>>>> (1) data names that potentially contain [] or / >>>>>>> (2) unquoted strings with illegal characters >>>>>>> (3) quoted strings that result in parse failure (typically because they >>>>>>> must >>>>>>> have an embedded [but not elided] quote character as allowed in CIF1). >>>>>>> >>>>>>> It needs to be a pre-scan because all 3 of the above in an identified >>>>>>> CIF2 >>>>>>> data file would result in something quite different since there are >>>>>>> coercion >>>>>>> rules for when the whitespace separator is missing. >>>>>>> >>>>>>> For instance IF I KNOW it is a CIF2 file and I read >>>>>>> >>>>>>> _name[1] >>>>>>> >>>>>>> Then this can only be an error and I coerce into >>>>>>> >>>>>>> _name [1] >>>>>>> >>>>>>> IF I DON?T KNOW the file type, the occurrence of _name[1] flags it as >>>>>>> potentially a CIF1 file. If _name[1] is in an alias list, this >>>>>>> re-enforces >>>>>>> the likelihood of CIF1. Multiple instances of these ?errors? (or any >>>>>>> others >>>>>>> in the above list) indicate it is a CIF1 file (my only other conclusion >>>>>>> would be it is a VERY BADLY written CIF2). >>>>>>> >>>>>>> I think this takes us back to a very simple rule set, and I don?t think >>>>>>> the >>>>>>> restriction in the character set for data names will cause problems. For >>>>>>> all >>>>>>> the excitement of UTF-8 etc I know of programming languages that support >>>>>>> reading and writing data in such encodings but I haven?t seen one that >>>>>>> allows/encourages one to write programmes declaring identifiers in UTF-8 >>>>>>> character sets. (They well exist I just haven?t seen them). >>>>>>> >>>>>>> >>>>>>> On 17/11/09 12:04 AM, "David Brown" <idbrown@mcmaster.ca> wrote: >>>>>>> >>>>>>>> James, >>>>>>>> >>>>>>>> There seems to be a lull in the discussions on CIF2 syntax so this >>>>>>>> would be >>>>>>>> a >>>>>>>> good time for you, or appointed chosen by you, to summarize where we >>>>>>>> are at >>>>>>>> and propose a set of rules that will can work with as we move forward. >>>>>>>> I >>>>>>>> realize that much of the work I have already done on dictionaries will >>>>>>>> need >>>>>>>> to >>>>>>>> be revisited, and Herbert also seems anxious to have some decisions on >>>>>>>> the >>>>>>>> various topics that have been discussed. >>>>>>>> >>>>>>>> I believe we have a consensus on a number of points, but these need to >>>>>>>> be >>>>>>>> written down clearly and need our formal agreement so we can move >>>>>>>> ahead. >>>>>>>> >>>>>>>> David >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ddlm-group mailing list >>>>>>>> ddlm-group@iucr.org >>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>>> >>>>>>> >>>>>>> cheers >>>>>>> >>>>>>> Nick >>>>>>> >>>>>>> -------------------------------- >>>>>>> Associate Professor N. Spadaccini, PhD >>>>>>> School of Computer Science & Software Engineering >>>>>>> >>>>>>> The University of Western Australia t: +61 (0)8 6488 3452 >>>>>>> 35 Stirling Highway f: +61 (0)8 6488 1089 >>>>>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >>>>>>> MBDP M002 >>>>>>> >>>>>>> CRICOS Provider Code: 00126G >>>>>>> >>>>>>> e: Nick.Spadaccini@uwa.edu.au >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> cheers >>>>> >>>>> Nick >>>>> >>>>> -------------------------------- >>>>> Associate Professor N. Spadaccini, PhD >>>>> School of Computer Science & Software Engineering >>>>> >>>>> The University of Western Australia t: +61 (0)8 6488 3452 >>>>> 35 Stirling Highway f: +61 (0)8 6488 1089 >>>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >>>>> MBDP M002 >>>>> >>>>> CRICOS Provider Code: 00126G >>>>> >>>>> e: Nick.Spadaccini@uwa.edu.au >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> ddlm-group mailing list >>>>> ddlm-group@iucr.org >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >>>> >>>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >>> >>> >>> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group > > cheers > > Nick > > -------------------------------- > Associate Professor N. Spadaccini, PhD > School of Computer Science & Software Engineering > > The University of Western Australia t: +61 (0)8 6488 3452 > 35 Stirling Highway f: +61 (0)8 6488 1089 > CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick > MBDP M002 > > CRICOS Provider Code: 00126G > > e: Nick.Spadaccini@uwa.edu.au > > > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] CIF-2 changes (David Brown)
- References:
- Re: [ddlm-group] CIF-2 changes (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] CIF-2 changes
- Next by Date: Re: [ddlm-group] CIF-2 changes
- Prev by thread: Re: [ddlm-group] CIF-2 changes
- Next by thread: Re: [ddlm-group] CIF-2 changes
- Index(es):