Re: [ddlm-group] Space as a list item separator

On 28/11/09 7:47 PM, "James Hester" <jamesrhester@gmail.com> wrote:

> It seems that most of us are in favour of spaces as separators, with Herbert
> at least in favour of spaces and commas.  Some things to decide:
> 1. Do we want spaces and commas, or only spaces?
> 2. Space syntax: while all primitive values should be separated from
> neighbouring primitive values by spaces, what about compound values (i.e.
> lists).  So for example, is
> [[1 2 3][4 5 6]] 
> acceptable or should it be
> [[1 2 3] [4 5 6]] ?

Many languages will simply return a fatal error. However Herb quite
correctly says we can infer what was intended. This is equivalent to the
case when [[1,2,3][4,5,6]] from which we infer [[1,2,3],[4,5,6]].

> (I have added a space between the neighbouring lists in the second version).
> 3.  If commas are acceptable, we need to decide on the two cases that I
> brought up recently: are multiple commas in a row acceptable (like [1,,2,3])? 
> Are trailing commas acceptable - [1,2,3,]?  Herbert appears to favour
> inferring a missing value in these cases, and Nick thinks they should both be
> syntax errors.  I favour Nick's interpretation, and Herbert's interpretations
> could then be coercion rules.  Of course, if we drop commas altogether, this
> is a moot point.

Spaces would make this discussion a moot point. However the comma has
support so addressing this question is important. Herb suggests the values
inserted are ? or . These are two quite different entities. The ? indicates
there is a value but it is unknown, and often the default can be used. The .
indicates there is no value that makes sense here.

Consider [1.2(3),"a flag",,key1,'Hello'], then in this case I simply have no
idea what the type could be, never mind the value. In such a case what would
? actually mean? There are defaults for defined data items, but none for

My view of this is [1,,2,3] is an error because you have no way to know what
to substitute in for a value. [1,2,3,] can be handled as an error, though a
common coercion is in to [1,2,3], where the trailing , is treated as an
unnecessary extra character.

> My votes would be:
> For 1: prefer spaces only, but absolutely no problems with including commas if
> that is what is preferred by the rest of you.  My preference for spaces only
> is entirely for simplicity and consistency with the rest of the CIF syntax.
> For 2: allow non-primitive values to have no space between them
> For 3: as I said, these examples should be syntax errors.
> Simon: I sincerely hope we have not dropped space as a separator in CIF2; we
> have reduced its role as a delimiter, which makes it possible to recover from
> certain syntax errors and ever so slightly simplifies the grammar.
> On Sat, Nov 28, 2009 at 9:01 PM, SIMON WESTRIP <simonwestrip@btinternet.com>
> wrote:
>> I had been under the assumption that the separation of list items by a comma
>> was 'set in stone'
>> (and was one reason for dropping the CIF1 syntax of requiring space after
>> data values),
>> but if its up for negotiation I would opt for using the space as a separator
>> as elsewhere in the CIF,
>> partly because then a list can essentially be treated much like a single-item
>> loop - i.e. same basic parsing
>> of <value><space><value><space>...
>> Cheers
>> Simon
>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
>> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
>> Cc: Nick.Spadaccini@uwa.edu.au
>> Sent: Friday, 27 November, 2009 11:43:10
>> Subject: Re: [ddlm-group] Space as a list item separator
>> Dear Colleagues,
>>    I have no objection to accepting either comma or whitespace
>> as a valid separator in a list.  I can't object -- I have been
>> coding to that standard since 1997, and now would only have to
>> remove the message generated for the case of the space.  We already
>> accept multiple glyphs as valid separators at all levels:
>>   whitespace itself it one of several character sequences in rather
>> complex combinations:  any number of blanks, tabs, newlines and comments.
>> The comma itself is handled in a complex way.  We accept (or should accept)
>> any whitespace before and after a comma as valid, as in
>> {a,b} versus {a , b }.  Adding the option of leaving out the comma
>> itself and just having the whitespace as the separator make just
>> as much sense.
>>   I see nothing to be gained by now forbidding the comma.  The meaning of
>> {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or, under this new (and I think
>> more sensibsle and realistic approach) {a . b .} or {a ? b ?}.
>>   The blank reads particularly well in dealing with vectors and matrices. The
>> comma reads well when dealing with strings.
>>   I think we would do best with both as valid alternatives (no error, no
>> warning for either one).
>>   Regards,
>>     Herbert =====================================================
>> On Fri, 27 Nov 2009, SIMON WESTRIP wrote:
>>> At first glance, you're considering using space instead of commas as list
>>> separators?
>>> which is not so far away from the CIF1 requirement of space following a
>>> delimiter?
>>> But I'm only on my first cup of coffee this morning :-)
>>> ____________________________________________________________________________
>>> From: Nick Spadaccini <nick@csse.uwa.edu.au>
>>> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
>>> Sent: Friday, 27 November, 2009 7:46:44
>>> Subject: Re: [ddlm-group] Space as a list item separator
>>> On 27/11/09 2:32 PM, "James Hester" <jamesrhester@gmail.com> wrote:
>>>> See comments below:
>>>> On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini <nick@csse.uwa.edu.au>
>>> wrote:
>>>>> Timely email, come in just after the one I sent.
>>>>> My position is if we specify the syntax then we encourage its correct use
>>> but
>>>>> acknowledge that there may be cases where one might be able to recover
>>>>> intent. But I wouldn?t encourage those cases.
>>>> Absolutely, which is why I would like to elevate space-separated list
>>> items to
>>>> be correct syntax rather than 'wrong but intent is clear' syntax.
>>>>> You could say that token separator in lists are a or b or c, but that
>>> just
>>>>> adds a level of complexity for very little gain. The choice of comma
>>> makes it
>>>>> seamless to translate from the raw CIF data straight in to most language
>>>>> specific data declaration. The only language I know that accepts one or
>>> the
>>>>> other or both is MatLab.
>>>> Re ease of translation: you speak as if a viable approach to a CIF data
>>> file
>>>> is to take whole text chunks and throw them at some language interpreter,
>>>> without doing your own parse.  Quite apart from being a rather unlikely
>>>> approach, this is impossible, as without parsing you won't know where the
>>> list
>>>> finishes.  If you do do your own parse, you can populate your
>>> datastructures
>>>> directly during the parse, and what list separator was originally used in
>>> the
>>>> data file is completely irrelevant.
>>>> Re complexity: not sure how you are planning to deal with whitespace in
>>> the
>>>> formal grammar, but consider the following, where I have assumed that each
>>>> token 'eats up' the following whitespace.
>>>> <dataitem> = <dataname><whitespace>+<datavalue>
>>>> <datavalue> = {<list>|<string>}<whitespace>+
>>>> <listdatavalue> = {<list>|<string>}<whitespace>*
>>>> <list> = '[' <whitespace>* {<listdatavalue>
>>>> {<comma><whitespace>*<listdatavalue>}*}* ']'
>>>> If we make comma or whitespace possible separators, the last production
>>>> becomes:
>>>> <list> =  '[' <whitespace>* {<listdatavalue> {<comma or
>>>> whitespace><listdatavalue>}*}* ']'
>>>> This looks like no extra complexity, and from a user's point of view
>>>> whitespace as an alternative separator is simple to understand and
>>> consistent
>>>> with space as a token separator used everywhere else in CIF.  Anyway, if
>>>> reduction of grammar complexity is your goal, you can just completely
>>> exclude
>>>> commas as list separators!
>>> Why not? Make them spaces only, and you become consistent across the board.
>>> I have to think about the possibility of pathological cases where spaces
>>> won't work. I can't think of any at the moment.
>>>> Some questions about how commas behave:
>>>> 1: is a trailing comma e.g. [1,2,3,4,] a syntax error?
>>>> 2. are two commas in a row a syntax error? E.g. [1,2,3,,4]
>>> I would say yes to syntax error. I an easily determine they may need to be
>>> an additional list value, but can't determine what.
>>>> Note the above productions assume that the answer to both is yes.
>>>>> What big advantage to a language is there to specify you can use a comma
>>> or
>>>>> whitespace as a token separator? Will you be happy with the first person
>>> who
>>>>> interprets this as being ok
>>>>> loop_
>>>>>   _severalvalues 1,2,3,4,5,6,7 # these being the 7 values of
>>> severalvalues
>>>> Note sure what you are getting at here: I am proposing the following:
>>>> _nicelist      [1 2 3 4 5 6 7]
>>>> being the same as
>>>> _nicelist      [1,2,3,4,5,6,7]
>>>>  Don't see how this relates to loops.
>>> The point was, once you say a space and comma are equivalent token
>>> separators then will it be an interpretation that they are always so even in
>>> loops? My example was not a list, just 7 values that were separated by
>>> commas not spaces.
>>>> James.
>>>> ------
>>>>> On 27/11/09 11:41 AM, "James Hester" <jamesrhester@gmail.com
>>>>> <http://jamesrhester@gmail.com> > wrote:
>>>>>> Dear All: looking over the list I posted previously of items left to
>>>>>> resolve, I see only one serious one outstanding: whether or not to allow
>>>>>> space as a separator between list items.  Nick has stated:
>>>>>> " I will propose it has to be a comma, but make the coercion rule that
>>> space
>>>>>> separated values in a list-type object be coerced into comma separated
>>>>>> values. That is, read spaces as you want, but don't encourage them."
>>>>>> I would like to counter-propose, as Joe did originally, that whitespace
>>> be
>>>>>> elevated to equal status with comma as a valid list separator.  I see no
>>>>>> downside to this.  Would anyone else like to speak to this issue before
>>> we
>>>>>> vote?  In particular, I would be interested to hear why Nick doesn't
>>> want to
>>>>>> encourage spaces.
>>>>> cheers
>>>>> Nick
>>>>> --------------------------------
>>> cheers
>>> Nick
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au

ddlm-group mailing list

