[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Space as a list item separator

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Space as a list item separator
From: SIMON WESTRIP <[email protected]>
Date: Mon, 30 Nov 2009 20:29:18 +0000 (GMT)
In-Reply-To: <[email protected]>
References: <C735A4E4.12669%[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]>

Title:

Dear all

One point I read in David's comments is that there are no legacy issues with respect to lists, associative arrays etc.
Does anyone disagree? Obviously it makes life easier when considering lists etc if the 'legacy' word doesnt rear its head.

From: David Brown <[email protected]>
To: Group finalising DDLm and associated dictionaries <[email protected]>
Sent: Monday, 30 November, 2009 19:56:30
Subject: Re: [ddlm-group] Space as a list item separator

Pleasse forgive me, everyone, but what is all this CIF1.5 about? Why do we need it? If a DDLm application is presented with with a CIF data file written using a DDL1 or DDL2 dictionary, which I assume uses CIF1.1 syntax, why can't we continue to use CIF1.1 since this works just fine for these files? Why do we need CI1.5? CIF data files written using DDL1 and DDL2 dictionaries do not contain lists and arrays because lists and arrays were not invented when these files were written, and any data files written with these dictionaries in the future (and there may be many of them) will still use the CIF1.1 syntax. There is no danger of arrays slipping into these data files unnoticed because they are not defined (and never will be) in DDL1 and DDL2 dictionaries (CIF1.1 does not allow it.) Of course our DDLm application (if we ever get it off the ground) will need to be able to read data files written with CIF1.1 syntax because we are required to ensure that this application can read in any existing CIF data file. It will also need to be able to read files written in CIF2 syntax because CIF2 will be needed for reading in the DDLm dictionaries (the only dictionaries that contain dREL) and the CIF2 data files (which may, unlike the CIF1.1 data files, also contain arrays and lists). As I pointed out earlier (and it seems to have come as something of a shock or epiphany to some), the DDLm dictionaries include very nice lists of aliases that contain every data name that was ever used for a given item. The data names in this alias list are, of course, quoted data values within the DDLm dictionary. and some contain characters that CIF2 would not recognize in a data name, but that is fine because they appear only in data values, and quoted data values no less, When confronted with a datafile written in CIF1.1, our hypothetical application would switch on its CIF1.1 lexer to read in the CIF1 data file, and pass the results into a preparser which would match the data name in the CIF1.1 data file with an alias name in the DDLm dictionary, and immediately substitute the DDLm data name for the original DDL1 or DDL2 data mame. Now all the problem with the old data names has disappeared. The preparser might have to make other changes to the data value (I am not sure that there are any, perhaps adding delimiters to all strings so they could be stripped away by the parser?). At this point you have a fully compliant CIF2-DDLm data set, which you can dREL to your heart's content. In particular, if dREL calls for an array, the item associated with that array will contain a dREL mothod for assembling the array from the individual data items that were originally stored in the input CIF and are now stored under a DDLm defined name. The only thing that would be difficult to do would be to reconstruct a DDL1 or DDL2 compliant data output file, but even this could be done if it was thought necessary. Please let's not make this exercise more confusing than necessary. You guys need to get on with defining what you want in CIF2. CIF1 can then look after itself using the existing tools together with the aliases for renaming the items. David Herbert J. Bernstein wrote: Dear Colleagues, Instead of looking at the minimally disruptive approach as a modification to CIF 2, in order to in fact be minimally disruptive, I would suggest looking at CIF 1.5 in terms if what would need to be changed in CIF 1.1 in order to support DDLm. I think the following will do it: For data values, only, recognize three new initial string delimiters in addition to the existing single quote ("'"), double quote ("\"") and newline-semicolon ("\n;"): left brace ("{") left square bracket ("[") Unless these are encountered in a left to right scan at a point at which the first character if a data value is expected, the parse remains the same as for CIF 1.1. Once the left brace or left square bracket is encountered, then whatever the formally agreed rules for the CIF2 parse are would apply until the balancing terminal right brace or right square bracket. It is only the top level terminal right brace or right square bracket that would be required to be followed by whitespace. The new dictionaries would _not_ be written in CIF 1.5, only in full CIF 2, but parsers would be expected to process any CIF not clearly self-identifying as a CIF 2 file as a CIF 1.5 file. This means that the only major use of CIF 2 constructs in CIF 1.5 would be to allow users to provide list, matrix and vector data values. This also means, for example, as per David's suggestion, that the only way a tag with embedded square brackets or embedded braces would be handled in a new dictionary would be as an alias, but the formality of CIF 1.5 would give applications a clean way to make use of those aliases in parsing data files. If we follow this approach, then we would be honoring the published commitment to be able to keep essentially all exsiting data files unchanged, and still be able to handle them with DDLm. The only exception would be data files that happen to include data values that begin with '{' or '[', which would now have to be quoted. I do not believe that there are many such cases, and I believe that there would be acceptance of the need to add such quoting if encountered. To summarize: Development of CIF 2 with DDLm support would continue and be used for new dictionaires; and Development of CIF 1.5 to serve as a bridge between CIF 1.1 and DDLm would start, primarily giving uses the ability to provide list, matrix and vector data values, would be started to allow for a smooth transition to wider use of DDLm and CIF 2 Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 [email protected] ===================================================== On Sun, 29 Nov 2009, SIMON WESTRIP wrote: Yes that summarizes the differences. Unfortunately, the single-byte non-delimited strings have to be separated by white space in this approach, which is perhaps counter-intuitive and mght have some legacy issues? ____________________________________________________________________________ From: James Hester <[email protected]> To: Group finalising DDLm and associated dictionaries <[email protected]> Sent: Sunday, 29 November, 2009 3:45:18 Subject: Re: [ddlm-group] Space as a list item separator Hi Simon: I'm trying to read between the lines here as to how the syntax we have been discussing diverges from what you have described, and have come up with the following list: 1. Presumably the []{} characters must be surrounded by whitespace in your version 2. We have restricted the character sets of the non-delimited strings and tags more than strictly necessary. 3. Comma might be included in the single-byte non-delimited string list Are there any other differences that you would identify? On Sat, Nov 28, 2009 at 10:58 PM, SIMON WESTRIP <[email protected]> wrote: Dear all I was chatting with the man who 'writes the cheques' yesterday about some of the changes he might expect with CIF2, and based on this I feel I ought to at least have a go at exploring a 'minimally disruptive' approach, so at the risk of being shouted at, here goes at a slightly different way of looking at CIF: CIF contains a list of strings separated by whitespace. A string can be nondelimited or delimited. Nondelimited strings have a restricted character set (minimally whitespace is excluded) A nondelimited string cannot start with any of the delimiters (obviously) Nondelimited strings can have special meaning governing what follows them: reserved words, e.g. loop_ tags, e.g. data_ , _foo single-byte nondelimited strings, e.g. [ ] { } : All other strings are treated as raw data values There, least I can say I tried :-) Cheers Simon ____________________________________________________________________________ From: SIMON WESTRIP <[email protected]> To: Group finalising DDLm and associated dictionaries <[email protected]> Sent: Saturday, 28 November, 2009 10:01:38 Subject: Re: [ddlm-group] Space as a list item separator I had been under the assumption that the separation of list items by a comma was 'set in stone' (and was one reason for dropping the CIF1 syntax of requiring space after data values), but if its up for negotiation I would opt for using the space as a separator as elsewhere in the CIF, partly because then a list can essentially be treated much like a single-item loop - i.e. same basic parsing of <value><space><value><space>... Cheers Simon ____________________________________________________________________________ From: Herbert J. Bernstein <[email protected]> To: Group finalising DDLm and associated dictionaries <[email protected]> Cc: [email protected] Sent: Friday, 27 November, 2009 11:43:10 Subject: Re: [ddlm-group] Space as a list item separator Dear Colleagues, I have no objection to accepting either comma or whitespace as a valid separator in a list. I can't object -- I have been coding to that standard since 1997, and now would only have to remove the message generated for the case of the space. We already accept multiple glyphs as valid separators at all levels: whitespace itself it one of several character sequences in rather complex combinations: any number of blanks, tabs, newlines and comments. The comma itself is handled in a complex way. We accept (or should accept) any whitespace before and after a comma as valid, as in {a,b} versus {a , b }. Adding the option of leaving out the comma itself and just having the whitespace as the separator make just as much sense. I see nothing to be gained by now forbidding the comma. The meaning of {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or, under this new (and I think more sensibsle and realistic approach) {a . b .} or {a ? b ?}. The blank reads particularly well in dealing with vectors and matrices. The comma reads well when dealing with strings. I think we would do best with both as valid alternatives (no error, no warning for either one). Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 [email protected] ===================================================== On Fri, 27 Nov 2009, SIMON WESTRIP wrote: > At first glance, you're considering using space instead of commas as list > separators? > which is not so far away from the CIF1 requirement of space following a > delimiter? > > But I'm only on my first cup of coffee this morning :-) > >___________________________________________________________________________ _ > From: Nick Spadaccini <[email protected]> > To: Group finalising DDLm and associated dictionaries <[email protected]> > Sent: Friday, 27 November, 2009 7:46:44 > Subject: Re: [ddlm-group] Space as a list item separator > > > > > On 27/11/09 2:32 PM, "James Hester" <[email protected]> wrote: > > > See comments below: > > > > On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini <[email protected]> > wrote: > >> Timely email, come in just after the one I sent. > >> > >> My position is if we specify the syntax then we encourage its correct use > but > >> acknowledge that there may be cases where one might be able to recover > >> intent. But I wouldn?t encourage those cases. > > > > Absolutely, which is why I would like to elevate space-separated list > items to > > be correct syntax rather than 'wrong but intent is clear' syntax. > >> > >> You could say that token separator in lists are a or b or c, but that > just > >> adds a level of complexity for very little gain. The choice of comma > makes it > >> seamless to translate from the raw CIF data straight in to most language > >> specific data declaration. The only language I know that accepts one or > the > >> other or both is MatLab. > > > > Re ease of translation: you speak as if a viable approach to a CIF data > file > > is to take whole text chunks and throw them at some language interpreter, > > without doing your own parse. Quite apart from being a rather unlikely > > approach, this is impossible, as without parsing you won't know where the > list > > finishes. If you do do your own parse, you can populate your > datastructures > > directly during the parse, and what list separator was originally used in > the > > data file is completely irrelevant. > > > > Re complexity: not sure how you are planning to deal with whitespace in > the > > formal grammar, but consider the following, where I have assumed that each > > token 'eats up' the following whitespace. > > > > <dataitem> = <dataname><whitespace>+<datavalue> > > <datavalue> = {<list>|<string>}<whitespace>+ > > <listdatavalue> = {<list>|<string>}<whitespace>* > > <list> = '[' <whitespace>* {<listdatavalue> > > {<comma><whitespace>*<listdatavalue>}*}* ']' > > > > If we make comma or whitespace possible separators, the last production > > becomes: > > <list> = '[' <whitespace>* {<listdatavalue> {<comma or > > whitespace><listdatavalue>}*}* ']' > > > > This looks like no extra complexity, and from a user's point of view > > whitespace as an alternative separator is simple to understand and > consistent > > with space as a token separator used everywhere else in CIF. Anyway, if > > reduction of grammar complexity is your goal, you can just completely > exclude > > commas as list separators! > > Why not? Make them spaces only, and you become consistent across the board. > I have to think about the possibility of pathological cases where spaces > won't work. I can't think of any at the moment. > > > > > Some questions about how commas behave: > > 1: is a trailing comma e.g. [1,2,3,4,] a syntax error? > > 2. are two commas in a row a syntax error? E.g. [1,2,3,,4] > > I would say yes to syntax error. I an easily determine they may need to be > an additional list value, but can't determine what. > > > Note the above productions assume that the answer to both is yes. > > > >> > >> What big advantage to a language is there to specify you can use a comma > or > >> whitespace as a token separator? Will you be happy with the first person > who > >> interprets this as being ok > >> > >> loop_ > >> _severalvalues 1,2,3,4,5,6,7 # these being the 7 values of > severalvalues > >> > > Note sure what you are getting at here: I am proposing the following: > > > > _nicelist [1 2 3 4 5 6 7] > > > > being the same as > > > > _nicelist [1,2,3,4,5,6,7] > > > > Don't see how this relates to loops. > > The point was, once you say a space and comma are equivalent token > separators then will it be an interpretation that they are always so even in > loops? My example was not a list, just 7 values that were separated by > commas not spaces. > > > > > James. > > ------ > >> > >> On 27/11/09 11:41 AM, "James Hester" <[email protected] > >> <http://[email protected]> > wrote: > >> > >>> Dear All: looking over the list I posted previously of items left to > >>> resolve, I see only one serious one outstanding: whether or not to allow > >>> space as a separator between list items. Nick has stated: > >>> > >>> " I will propose it has to be a comma, but make the coercion rule that > space > >>> separated values in a list-type object be coerced into comma separated > >>> values. That is, read spaces as you want, but don't encourage them." > >>> > >>> I would like to counter-propose, as Joe did originally, that whitespace > be > >>> elevated to equal status with comma as a valid list separator. I see no > >>> downside to this. Would anyone else like to speak to this issue before > we > >>> vote? In particular, I would be interested to hear why Nick doesn't > want to > >>> encourage spaces. > >> > >> cheers > >> > >> Nick > >> > >> -------------------------------- > >> Associate Professor N. Spadaccini, PhD > >> School of Computer Science & Software Engineering > >> > >> The University of Western Australia t: +61 (0)8 6488 3452 > >> 35 Stirling Highway f: +61 (0)8 6488 1089 > >> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick > >> <http://www.csse.uwa.edu.au/%7Enick> > >> MBDP M002 > >> > >> CRICOS Provider Code: 00126G > >> > >> e: [email protected] <http://[email protected]> > >> > >> > >> > >> _______________________________________________ > >> ddlm-group mailing list > >> [email protected] > >> http://scripts.iucr.org/mailman/listinfo/ddlm-group > >> > > > > > > cheers > > Nick > > -------------------------------- > Associate Professor N. Spadaccini, PhD > School of Computer Science & Software Engineering > > The University of Western Australia t: +61 (0)8 6488 3452 > 35 Stirling Highway f: +61 (0)8 6488 1089 > CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick > MBDP M002 > > CRICOS Provider Code: 00126G > > e: [email protected] > > > > > _______________________________________________ > ddlm-group mailing list > [email protected] > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > _______________________________________________ ddlm-group mailing list [email protected] http://scripts.iucr.org/mailman/listinfo/ddlm-group -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list [email protected] http://scripts.iucr.org/mailman/listinfo/ddlm-group

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Space as a list item separator (James Hester)

Re: [ddlm-group] Space as a list item separator (Herbert J. Bernstein)

References:

Re: [ddlm-group] Space as a list item separator (Nick Spadaccini)

Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)

Re: [ddlm-group] Space as a list item separator (Herbert J. Bernstein)

Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)

Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)

Re: [ddlm-group] Space as a list item separator (James Hester)

Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)

Re: [ddlm-group] Space as a list item separator (Herbert J. Bernstein)

Re: [ddlm-group] Space as a list item separator (David Brown)

Prev by Date: Re: [ddlm-group] Space as a list item separator

Next by Date: Re: [ddlm-group] Space as a list item separator

Prev by thread: Re: [ddlm-group] Space as a list item separator

Next by thread: Re: [ddlm-group] Space as a list item separator

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Space as a list item separator