[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Space as a list item separator
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Space as a list item separator
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Mon, 30 Nov 2009 16:01:44 -0500 (EST)
- In-Reply-To: <436862.35640.qm@web87015.mail.ird.yahoo.com>
- References: <C735A4E4.12669%nick@csse.uwa.edu.au><773849.42639.qm@web87014.mail.ird.yahoo.com><alpine.BSF.2.00.0911270628060.81324@epsilon.pair.com><434207.86524.qm@web87015.mail.ird.yahoo.com><183781.58939.qm@web87001.mail.ird.yahoo.com><279aad2a0911281945v4a7a3b37tf39ca4b45baf3478@mail.gmail.com><455583.44145.qm@web87005.mail.ird.yahoo.com><alpine.BSF.2.00.0911290910310.2441@epsilon.pair.com><4B14236E.10202@mcmaster.ca><436862.35640.qm@web87015.mail.ird.yahoo.com>
The problem is more a matter of legacy people and legacy experimental practices than legacy data sets. These are legacies I think we should retain and respect. These legacy people doing things with legacy practices do very new and exciting science, for which CIF 2 will, hopefully be a useful tool, if we make it relatively easy for them to integrate CIF 2 into their work flows. CIF 1.5 will help some of them to do that. ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Mon, 30 Nov 2009, SIMON WESTRIP wrote: > Dear all > > One point I read in David's comments is that there are no legacy issues with > respect to lists, associative arrays etc. > Does anyone disagree? Obviously it makes life easier when considering lists > etc if the 'legacy' word doesnt rear its head. > > ____________________________________________________________________________ > From: David Brown <idbrown@mcmaster.ca> > To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> > Sent: Monday, 30 November, 2009 19:56:30 > Subject: Re: [ddlm-group] Space as a list item separator > > Pleasse forgive me, everyone, but what is all this CIF1.5 about? > > Why do we need it? > > If a DDLm application is presented with with a CIF data file written using a > DDL1 or DDL2 dictionary, which I assume uses CIF1.1 syntax, why can't we > continue to use CIF1.1 since this works just fine for these files? Why do > we need CI1.5? > > CIF data files written using DDL1 and DDL2 dictionaries do not contain lists > and arrays because lists and arrays were not invented when these files were > written, and any data files written with these dictionaries in the future > (and there may be many of them) will still use the CIF1.1 syntax. There is > no danger of arrays slipping into these data files unnoticed because they > are not defined (and never will be) in DDL1 and DDL2 dictionaries (CIF1.1 > does not allow it.) > > Of course our DDLm application (if we ever get it off the ground) will need > to be able to read data files written with CIF1.1 syntax because we are > required to ensure that this application can read in any existing CIF data > file. It will also need to be able to read files written in CIF2 syntax > because CIF2 will be needed for reading in the DDLm dictionaries (the only > dictionaries that contain dREL) and the CIF2 data files (which may, unlike > the CIF1.1 data files, also contain arrays and lists). > > As I pointed out earlier (and it seems to have come as something of a shock > or epiphany to some), the DDLm dictionaries include very nice lists of > aliases that contain every data name that was ever used for a given item. > The data names in this alias list are, of course, quoted data values within > the DDLm dictionary. and some contain characters that CIF2 would not > recognize in a data name, but that is fine because they appear only in data > values, and quoted data values no less, > > When confronted with a datafile written in CIF1.1, our hypothetical > application would switch on its CIF1.1 lexer to read in the CIF1 data file, > and pass the results into a preparser which would match the data name in the > CIF1.1 data file with an alias name in the DDLm dictionary, and immediately > substitute the DDLm data name for the original DDL1 or DDL2 data mame. Now > all the problem with the old data names has disappeared. The preparser > might have to make other changes to the data value (I am not sure that there > are any, perhaps adding delimiters to all strings so they could be stripped > away by the parser?). At this point you have a fully compliant CIF2-DDLm > data set, which you can dREL to your heart's content. In particular, if > dREL calls for an array, the item associated with that array will contain a > dREL mothod for assembling the array from the individual data items that > were originally stored in the input CIF and are now stored under a DDLm > defined name. The only thing that would be difficult to do would be to > reconstruct a DDL1 or DDL2 compliant data output file, but even this could > be done if it was thought necessary. > > Please let's not make this exercise more confusing than necessary. > > You guys need to get on with defining what you want in CIF2. CIF1 can then > look after itself using the existing tools together with the aliases for > renaming the items. > > David > > Herbert J. Bernstein wrote: > Dear Colleagues, > > Instead of looking at the minimally disruptive approach as a > modification to CIF 2, in order to in fact be minimally > disruptive, I would suggest looking at CIF 1.5 in terms if what > would need to be changed in CIF 1.1 in order to support DDLm. > > I think the following will do it: > > For data values, only, recognize three new initial string > delimiters in addition to the existing single quote ("'"), > double quote ("\"") and newline-semicolon ("\n;"): > > left brace ("{") > left square bracket ("[") > > Unless these are encountered in a left to right scan at a point > at which the first character if a data value is expected, the > parse remains the same as for CIF 1.1. > > Once the left brace or left square bracket is encountered, then > whatever the formally agreed rules for the CIF2 parse are would > apply until the balancing terminal right brace or right square > bracket. It is only the top level terminal right brace or right > square bracket that would be required to be followed by > whitespace. > > The new dictionaries would _not_ be written in CIF 1.5, only in > full CIF 2, but parsers would be expected to process any CIF not > clearly self-identifying as a CIF 2 file as a CIF 1.5 file. > This means that the only major use of CIF 2 constructs in CIF > 1.5 would be to allow users to provide list, matrix and vector > data values. > > This also means, for example, as per David's suggestion, that > the only way a tag with embedded square brackets or embedded > braces would be handled in a new dictionary would be as an > alias, but the formality of CIF 1.5 would give applications a > clean way to make use of those aliases in parsing data files. > > If we follow this approach, then we would be honoring the > published commitment to be able to keep essentially all exsiting > data files unchanged, and still be able to handle them with > DDLm. The only exception would be data files that happen to > include data values that begin with '{' or '[', which would now > have to be quoted. I do not believe that there are many such > cases, and I believe that there would be acceptance of the need > to add such quoting if encountered. > > To summarize: > > Development of CIF 2 with DDLm support would continue and be > used for > new dictionaires; and > > Development of CIF 1.5 to serve as a bridge between CIF 1.1 > and DDLm would start, primarily giving uses the ability to > provide list, matrix and vector data values, would be started to > allow for a smooth transition to wider use of DDLm and CIF 2 > > Regards, > Herbert > > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Sun, 29 Nov 2009, SIMON WESTRIP wrote: > > Yes that summarizes the differences. Unfortunately, > the single-byte > non-delimited strings have to be separated by > white space in this approach, which is perhaps > counter-intuitive and mght > have some legacy issues? > > ___________________________________________________________________________ > _ > From: James Hester <jamesrhester@gmail.com> > To: Group finalising DDLm and associated > dictionaries <ddlm-group@iucr.org> > Sent: Sunday, 29 November, 2009 3:45:18 > Subject: Re: [ddlm-group] Space as a list item > separator > > Hi Simon: I'm trying to read between the lines here > as to how the syntax we > have been discussing diverges from what you have > described, and have come up > with the following list: > > 1. Presumably the []{} characters must be surrounded > by whitespace in your > version > 2. We have restricted the character sets of the > non-delimited strings and > tags more than strictly necessary. > 3. Comma might be included in the single-byte > non-delimited string list > > Are there any other differences that you would > identify? > > On Sat, Nov 28, 2009 at 10:58 PM, SIMON WESTRIP > <simonwestrip@btinternet.com> wrote: > Dear all > > I was chatting with the man who 'writes the > cheques' yesterday > about some of the > changes he might expect with CIF2, and based > on this I feel I > ought to at least have > a go at exploring a 'minimally disruptive' > approach, so at the > risk of being shouted at, > here goes at a slightly different way of > looking at CIF: > > CIF contains a list of strings separated by > whitespace. > > A string can be nondelimited or delimited. > > Nondelimited strings have a restricted > character set (minimally > whitespace is excluded) > > A nondelimited string cannot start with any of > the delimiters > (obviously) > > Nondelimited strings can have special meaning > governing what > follows them: > > reserved words, e.g. loop_ > > tags, e.g. data_ , _foo > > single-byte nondelimited strings, e.g. [ ] > { } : > > All other strings are treated as raw data > values > > > There, least I can say I tried :-) > > Cheers > > Simon > > ___________________________________________________________________________ > _ > From: SIMON WESTRIP <simonwestrip@btinternet.com> > To: Group finalising DDLm and associated > dictionaries > <ddlm-group@iucr.org> > Sent: Saturday, 28 November, 2009 10:01:38 > > Subject: Re: [ddlm-group] Space as a list item > separator > > I had been under the assumption that the separation > of list items by a > comma was 'set in stone' > (and was one reason for dropping the CIF1 syntax of > requiring space > after data values), > but if its up for negotiation I would opt for using > the space as a > separator as elsewhere in the CIF, > partly because then a list can essentially be > treated much like a > single-item loop - i.e. same basic parsing > of <value><space><value><space>... > > Cheers > > Simon > > ___________________________________________________________________________ > _ > From: Herbert J. Bernstein > <yaya@bernstein-plus-sons.com> > To: Group finalising DDLm and associated > dictionaries > <ddlm-group@iucr.org> > Cc: Nick.Spadaccini@uwa.edu.au > Sent: Friday, 27 November, 2009 11:43:10 > Subject: Re: [ddlm-group] Space as a list item > separator > > Dear Colleagues, > > I have no objection to accepting either comma or > whitespace > as a valid separator in a list. I can't object -- I > have been > coding to that standard since 1997, and now would > only have to > remove the message generated for the case of the > space. We already > accept multiple glyphs as valid separators at all > levels: > > whitespace itself it one of several character > sequences in rather > complex combinations: any number of blanks, tabs, > newlines and > comments. > The comma itself is handled in a complex way. We > accept (or should > accept) any whitespace before and after a comma as > valid, as in > {a,b} versus {a , b }. Adding the option of leaving > out the comma > itself and just having the whitespace as the > separator make just > as much sense. > > I see nothing to be gained by now forbidding the > comma. The meaning > of {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or, > under this new > (and I think more sensibsle and realistic approach) > {a . b .} or {a ? > b ?}. > > The blank reads particularly well in dealing with > vectors and > matrices. The comma reads well when dealing with > strings. > > I think we would do best with both as valid > alternatives (no error, > no warning for either one). > > Regards, > Herbert > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Fri, 27 Nov 2009, SIMON WESTRIP wrote: > > > At first glance, you're considering using space > instead of commas as > list > > separators? > > which is not so far away from the CIF1 requirement > of space > following a > > delimiter? > > > > But I'm only on my first cup of coffee this > morning :-) > > > >__________________________________________________________________________ > _ > _ > > From: Nick Spadaccini <nick@csse.uwa.edu.au> > > To: Group finalising DDLm and associated > dictionaries > <ddlm-group@iucr.org> > > Sent: Friday, 27 November, 2009 7:46:44 > > Subject: Re: [ddlm-group] Space as a list item > separator > > > > > > > > > > On 27/11/09 2:32 PM, "James Hester" > <jamesrhester@gmail.com> wrote: > > > > > See comments below: > > > > > > On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini > <nick@csse.uwa.edu.au> > > wrote: > > >> Timely email, come in just after the one I > sent. > > >> > > >> My position is if we specify the syntax then we > encourage its > correct use > > but > > >> acknowledge that there may be cases where one > might be able to > recover > > >> intent. But I wouldn?t encourage those cases. > > > > > > Absolutely, which is why I would like to elevate > space-separated > list > > items to > > > be correct syntax rather than 'wrong but intent > is clear' syntax. > > >> > > >> You could say that token separator in lists are > a or b or c, but > that > > just > > >> adds a level of complexity for very little > gain. The choice of > comma > > makes it > > >> seamless to translate from the raw CIF data > straight in to most > language > > >> specific data declaration. The only language I > know that accepts > one or > > the > > >> other or both is MatLab. > > > > > > Re ease of translation: you speak as if a viable > approach to a CIF > data > > file > > > is to take whole text chunks and throw them at > some language > interpreter, > > > without doing your own parse. Quite apart from > being a rather > unlikely > > > approach, this is impossible, as without parsing > you won't know > where the > > list > > > finishes. If you do do your own parse, you can > populate your > > datastructures > > > directly during the parse, and what list > separator was originally > used in > > the > > > data file is completely irrelevant. > > > > > > Re complexity: not sure how you are planning to > deal with > whitespace in > > the > > > formal grammar, but consider the following, > where I have assumed > that each > > > token 'eats up' the following whitespace. > > > > > > <dataitem> = <dataname><whitespace>+<datavalue> > > > <datavalue> = {<list>|<string>}<whitespace>+ > > > <listdatavalue> = {<list>|<string>}<whitespace>* > > > <list> = '[' <whitespace>* {<listdatavalue> > > > {<comma><whitespace>*<listdatavalue>}*}* ']' > > > > > > If we make comma or whitespace possible > separators, the last > production > > > becomes: > > > <list> = '[' <whitespace>* {<listdatavalue> > {<comma or > > > whitespace><listdatavalue>}*}* ']' > > > > > > This looks like no extra complexity, and from a > user's point of > view > > > whitespace as an alternative separator is simple > to understand and > > consistent > > > with space as a token separator used everywhere > else in CIF. > Anyway, if > > > reduction of grammar complexity is your goal, > you can just > completely > > exclude > > > commas as list separators! > > > > Why not? Make them spaces only, and you become > consistent across the > board. > > I have to think about the possibility of > pathological cases where > spaces > > won't work. I can't think of any at the moment. > > > > > > > > Some questions about how commas behave: > > > 1: is a trailing comma e.g. [1,2,3,4,] a syntax > error? > > > 2. are two commas in a row a syntax error? E.g. > [1,2,3,,4] > > > > I would say yes to syntax error. I an easily > determine they may need > to be > > an additional list value, but can't determine > what. > > > > > Note the above productions assume that the > answer to both is yes. > > > > > >> > > >> What big advantage to a language is there to > specify you can use > a comma > > or > > >> whitespace as a token separator? Will you be > happy with the first > person > > who > > >> interprets this as being ok > > >> > > >> loop_ > > >> _severalvalues 1,2,3,4,5,6,7 # these being > the 7 values of > > severalvalues > > >> > > > Note sure what you are getting at here: I am > proposing the > following: > > > > > > _nicelist [1 2 3 4 5 6 7] > > > > > > being the same as > > > > > > _nicelist [1,2,3,4,5,6,7] > > > > > > Don't see how this relates to loops. > > > > The point was, once you say a space and comma are > equivalent token > > separators then will it be an interpretation that > they are always so > even in > > loops? My example was not a list, just 7 values > that were separated > by > > commas not spaces. > > > > > > > > James. > > > ------ > > >> > > >> On 27/11/09 11:41 AM, "James Hester" > <jamesrhester@gmail.com > > >> <http://jamesrhester@gmail.com> > wrote: > > >> > > >>> Dear All: looking over the list I posted > previously of items > left to > > >>> resolve, I see only one serious one > outstanding: whether or not > to allow > > >>> space as a separator between list items. Nick > has stated: > > >>> > > >>> " I will propose it has to be a comma, but > make the coercion > rule that > > space > > >>> separated values in a list-type object be > coerced into comma > separated > > >>> values. That is, read spaces as you want, but > don't encourage > them." > > >>> > > >>> I would like to counter-propose, as Joe did > originally, that > whitespace > > be > > >>> elevated to equal status with comma as a valid > list separator. > I see no > > >>> downside to this. Would anyone else like to > speak to this issue > before > > we > > >>> vote? In particular, I would be interested to > hear why Nick > doesn't > > want to > > >>> encourage spaces. > > >> > > >> cheers > > >> > > >> Nick > > >> > > >> -------------------------------- > > >> Associate Professor N. Spadaccini, PhD > > >> School of Computer Science & Software > Engineering > > >> > > >> The University of Western Australia t: +61 > (0)8 6488 3452 > > >> 35 Stirling Highway f: +61 > (0)8 6488 1089 > > >> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: > www.csse.uwa.edu.au/~nick > > >> <http://www.csse.uwa.edu.au/%7Enick> > > >> MBDP M002 > > >> > > >> CRICOS Provider Code: 00126G > > >> > > >> e: Nick.Spadaccini@uwa.edu.au > <http://Nick.Spadaccini@uwa.edu.au> > > >> > > >> > > >> > > >> _______________________________________________ > > >> ddlm-group mailing list > > >> ddlm-group@iucr.org > > >> > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > >> > > > > > > > > > > cheers > > > > Nick > > > > -------------------------------- > > Associate Professor N. Spadaccini, PhD > > School of Computer Science & Software Engineering > > > > The University of Western Australia t: +61 (0)8 > 6488 3452 > > 35 Stirling Highway f: +61 (0)8 > 6488 1089 > > CRAWLEY, Perth, WA 6009 AUSTRALIA w3: > www.csse.uwa.edu.au/~nick > > MBDP M002 > > > > CRICOS Provider Code: 00126G > > > > e: Nick.Spadaccini@uwa.edu.au > > > > > > > > > > _______________________________________________ > > ddlm-group mailing list > > ddlm-group@iucr.org > > > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > > > > ____________________________________________________________________ > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > > >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] Space as a list item separator (Nick Spadaccini)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Re: [ddlm-group] Space as a list item separator (Herbert J. Bernstein)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Re: [ddlm-group] Space as a list item separator (James Hester)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Re: [ddlm-group] Space as a list item separator (Herbert J. Bernstein)
- Re: [ddlm-group] Space as a list item separator (David Brown)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Prev by Date: Re: [ddlm-group] Space as a list item separator
- Next by Date: Re: [ddlm-group] Space as a list item separator
- Prev by thread: Re: [ddlm-group] Space as a list item separator
- Next by thread: Re: [ddlm-group] Space as a list item separator
- Index(es):