[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Space as a list item separator
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Space as a list item separator
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Mon, 30 Nov 2009 15:08:13 +0800
- Authentication-Results: postfix;
- In-Reply-To: <279aad2a0911280347m2e7b4261m2f75af20ddfc8387@mail.gmail.com>
On 28/11/09 7:47 PM, "James Hester" <jamesrhester@gmail.com> wrote: > It seems that most of us are in favour of spaces as separators, with Herbert > at least in favour of spaces and commas. Some things to decide: > > 1. Do we want spaces and commas, or only spaces? > 2. Space syntax: while all primitive values should be separated from > neighbouring primitive values by spaces, what about compound values (i.e. > lists). So for example, is > > [[1 2 3][4 5 6]] > > acceptable or should it be > > [[1 2 3] [4 5 6]] ? Many languages will simply return a fatal error. However Herb quite correctly says we can infer what was intended. This is equivalent to the case when [[1,2,3][4,5,6]] from which we infer [[1,2,3],[4,5,6]]. > (I have added a space between the neighbouring lists in the second version). > > 3. If commas are acceptable, we need to decide on the two cases that I > brought up recently: are multiple commas in a row acceptable (like [1,,2,3])? > Are trailing commas acceptable - [1,2,3,]? Herbert appears to favour > inferring a missing value in these cases, and Nick thinks they should both be > syntax errors. I favour Nick's interpretation, and Herbert's interpretations > could then be coercion rules. Of course, if we drop commas altogether, this > is a moot point. Spaces would make this discussion a moot point. However the comma has support so addressing this question is important. Herb suggests the values inserted are ? or . These are two quite different entities. The ? indicates there is a value but it is unknown, and often the default can be used. The . indicates there is no value that makes sense here. Consider [1.2(3),"a flag",,key1,'Hello'], then in this case I simply have no idea what the type could be, never mind the value. In such a case what would ? actually mean? There are defaults for defined data items, but none for types. My view of this is [1,,2,3] is an error because you have no way to know what to substitute in for a value. [1,2,3,] can be handled as an error, though a common coercion is in to [1,2,3], where the trailing , is treated as an unnecessary extra character. > > My votes would be: > > For 1: prefer spaces only, but absolutely no problems with including commas if > that is what is preferred by the rest of you. My preference for spaces only > is entirely for simplicity and consistency with the rest of the CIF syntax. > For 2: allow non-primitive values to have no space between them > For 3: as I said, these examples should be syntax errors. > > Simon: I sincerely hope we have not dropped space as a separator in CIF2; we > have reduced its role as a delimiter, which makes it possible to recover from > certain syntax errors and ever so slightly simplifies the grammar. > > On Sat, Nov 28, 2009 at 9:01 PM, SIMON WESTRIP <simonwestrip@btinternet.com> > wrote: >> I had been under the assumption that the separation of list items by a comma >> was 'set in stone' >> (and was one reason for dropping the CIF1 syntax of requiring space after >> data values), >> but if its up for negotiation I would opt for using the space as a separator >> as elsewhere in the CIF, >> partly because then a list can essentially be treated much like a single-item >> loop - i.e. same basic parsing >> of <value><space><value><space>... >> >> Cheers >> >> Simon >> >> >> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> >> >> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> >> Cc: Nick.Spadaccini@uwa.edu.au >> Sent: Friday, 27 November, 2009 11:43:10 >> >> Subject: Re: [ddlm-group] Space as a list item separator >> >> Dear Colleagues, >> >> I have no objection to accepting either comma or whitespace >> as a valid separator in a list. I can't object -- I have been >> coding to that standard since 1997, and now would only have to >> remove the message generated for the case of the space. We already >> accept multiple glyphs as valid separators at all levels: >> >> whitespace itself it one of several character sequences in rather >> complex combinations: any number of blanks, tabs, newlines and comments. >> The comma itself is handled in a complex way. We accept (or should accept) >> any whitespace before and after a comma as valid, as in >> {a,b} versus {a , b }. Adding the option of leaving out the comma >> itself and just having the whitespace as the separator make just >> as much sense. >> >> I see nothing to be gained by now forbidding the comma. The meaning of >> {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or, under this new (and I think >> more sensibsle and realistic approach) {a . b .} or {a ? b ?}. >> >> The blank reads particularly well in dealing with vectors and matrices. The >> comma reads well when dealing with strings. >> >> I think we would do best with both as valid alternatives (no error, no >> warning for either one). >> >> Regards, >> Herbert ===================================================== >> Herbert J. Bernstein, Professor of Computer Science >> Dowling College, Kramer Science Center, KSC 121 >> Idle Hour Blvd, Oakdale, NY, 11769 >> >> +1-631-244-3035 >> yaya@dowling.edu >> ===================================================== >> >> On Fri, 27 Nov 2009, SIMON WESTRIP wrote: >> >>> At first glance, you're considering using space instead of commas as list >>> separators? >>> which is not so far away from the CIF1 requirement of space following a >>> delimiter? >>> >>> But I'm only on my first cup of coffee this morning :-) >>> >>> ____________________________________________________________________________ >>> From: Nick Spadaccini <nick@csse.uwa.edu.au> >>> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> >>> Sent: Friday, 27 November, 2009 7:46:44 >>> Subject: Re: [ddlm-group] Space as a list item separator >>> >>> >>> >>> >>> On 27/11/09 2:32 PM, "James Hester" <jamesrhester@gmail.com> wrote: >>> >>>> See comments below: >>>> >>>> On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini <nick@csse.uwa.edu.au> >>> wrote: >>>>> Timely email, come in just after the one I sent. >>>>> >>>>> My position is if we specify the syntax then we encourage its correct use >>> but >>>>> acknowledge that there may be cases where one might be able to recover >>>>> intent. But I wouldn?t encourage those cases. >>>> >>>> Absolutely, which is why I would like to elevate space-separated list >>> items to >>>> be correct syntax rather than 'wrong but intent is clear' syntax. >>>>> >>>>> You could say that token separator in lists are a or b or c, but that >>> just >>>>> adds a level of complexity for very little gain. The choice of comma >>> makes it >>>>> seamless to translate from the raw CIF data straight in to most language >>>>> specific data declaration. The only language I know that accepts one or >>> the >>>>> other or both is MatLab. >>>> >>>> Re ease of translation: you speak as if a viable approach to a CIF data >>> file >>>> is to take whole text chunks and throw them at some language interpreter, >>>> without doing your own parse. Quite apart from being a rather unlikely >>>> approach, this is impossible, as without parsing you won't know where the >>> list >>>> finishes. If you do do your own parse, you can populate your >>> datastructures >>>> directly during the parse, and what list separator was originally used in >>> the >>>> data file is completely irrelevant. >>>> >>>> Re complexity: not sure how you are planning to deal with whitespace in >>> the >>>> formal grammar, but consider the following, where I have assumed that each >>>> token 'eats up' the following whitespace. >>>> >>>> <dataitem> = <dataname><whitespace>+<datavalue> >>>> <datavalue> = {<list>|<string>}<whitespace>+ >>>> <listdatavalue> = {<list>|<string>}<whitespace>* >>>> <list> = '[' <whitespace>* {<listdatavalue> >>>> {<comma><whitespace>*<listdatavalue>}*}* ']' >>>> >>>> If we make comma or whitespace possible separators, the last production >>>> becomes: >>>> <list> = '[' <whitespace>* {<listdatavalue> {<comma or >>>> whitespace><listdatavalue>}*}* ']' >>>> >>>> This looks like no extra complexity, and from a user's point of view >>>> whitespace as an alternative separator is simple to understand and >>> consistent >>>> with space as a token separator used everywhere else in CIF. Anyway, if >>>> reduction of grammar complexity is your goal, you can just completely >>> exclude >>>> commas as list separators! >>> >>> Why not? Make them spaces only, and you become consistent across the board. >>> I have to think about the possibility of pathological cases where spaces >>> won't work. I can't think of any at the moment. >>> >>>> >>>> Some questions about how commas behave: >>>> 1: is a trailing comma e.g. [1,2,3,4,] a syntax error? >>>> 2. are two commas in a row a syntax error? E.g. [1,2,3,,4] >>> >>> I would say yes to syntax error. I an easily determine they may need to be >>> an additional list value, but can't determine what. >>> >>>> Note the above productions assume that the answer to both is yes. >>>> >>>>> >>>>> What big advantage to a language is there to specify you can use a comma >>> or >>>>> whitespace as a token separator? Will you be happy with the first person >>> who >>>>> interprets this as being ok >>>>> >>>>> loop_ >>>>> _severalvalues 1,2,3,4,5,6,7 # these being the 7 values of >>> severalvalues >>>>> >>>> Note sure what you are getting at here: I am proposing the following: >>>> >>>> _nicelist [1 2 3 4 5 6 7] >>>> >>>> being the same as >>>> >>>> _nicelist [1,2,3,4,5,6,7] >>>> >>>> Don't see how this relates to loops. >>> >>> The point was, once you say a space and comma are equivalent token >>> separators then will it be an interpretation that they are always so even in >>> loops? My example was not a list, just 7 values that were separated by >>> commas not spaces. >>> >>>> >>>> James. >>>> ------ >>>>> >>>>> On 27/11/09 11:41 AM, "James Hester" <jamesrhester@gmail.com >>>>> <http://jamesrhester@gmail.com> > wrote: >>>>> >>>>>> Dear All: looking over the list I posted previously of items left to >>>>>> resolve, I see only one serious one outstanding: whether or not to allow >>>>>> space as a separator between list items. Nick has stated: >>>>>> >>>>>> " I will propose it has to be a comma, but make the coercion rule that >>> space >>>>>> separated values in a list-type object be coerced into comma separated >>>>>> values. That is, read spaces as you want, but don't encourage them." >>>>>> >>>>>> I would like to counter-propose, as Joe did originally, that whitespace >>> be >>>>>> elevated to equal status with comma as a valid list separator. I see no >>>>>> downside to this. Would anyone else like to speak to this issue before >>> we >>>>>> vote? In particular, I would be interested to hear why Nick doesn't >>> want to >>>>>> encourage spaces. >>>>> >>>>> cheers >>>>> >>>>> Nick >>>>> >>>>> -------------------------------- >>>>> Associate Professor N. Spadaccini, PhD >>>>> School of Computer Science & Software Engineering >>>>> >>>>> The University of Western Australia t: +61 (0)8 6488 3452 >>>>> 35 Stirling Highway f: +61 (0)8 6488 1089 >>>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >>>>> <http://www.csse.uwa.edu.au/%7Enick> >>>>> <http://www.csse.uwa.edu.au/%7Enick> >>>>> MBDP M002 >>>>> >>>>> CRICOS Provider Code: 00126G >>>>> >>>>> e: Nick.Spadaccini@uwa.edu.au <http://Nick.Spadaccini@uwa.edu.au> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> ddlm-group mailing list >>>>> ddlm-group@iucr.org >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>> >>>> >>>> >>> >>> cheers >>> >>> Nick >>> >>> -------------------------------- >>> Associate Professor N. Spadaccini, PhD >>> School of Computer Science & Software Engineering >>> >>> The University of Western Australia t: +61 (0)8 6488 3452 >>> 35 Stirling Highway f: +61 (0)8 6488 1089 >>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >>> <http://www.csse.uwa.edu.au/%7Enick> >>> MBDP M002 >>> >>> CRICOS Provider Code: 00126G >>> >>> e: Nick.Spadaccini@uwa.edu.au >>> >>> >>> >>> >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>> >>> >> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> > > cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] Space as a list item separator (James Hester)
- Prev by Date: Re: [ddlm-group] Space as a list item separator
- Next by Date: Re: [ddlm-group] Space as a list item separator
- Prev by thread: Re: [ddlm-group] Space as a list item separator
- Next by thread: Re: [ddlm-group] Space as a list item separator
- Index(es):