[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Revisiting list delimiters. .. .

Please - one or the other - comma or whitespace - not both - far too confusing and a totally unnecessary
burden to have to accept whitespace or comma-delimited lists!.



From: James Hester <jamesrhester@gmail.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Friday, 8 April, 2011 1:27:10
Subject: Re: [ddlm-group] Revisiting list delimiters. .. .

I agree that focussing on the new information is appropriate.

On Fri, Apr 8, 2011 at 3:31 AM, Bollinger, John C
<John.Bollinger@stjude.org> wrote:


> I agree that it is not worthwhile to repeat the earlier discussion, but that is no reason to jump directly to a vote.  It seems reasonable to instead focus on any new information or insights that did not inform the previous discussion, and then to consider whether their combination with the considerations already discussed leads anyone to change their previous opinion.
> So, what is the new information we should consider?  James raised these points:
> JH> the only fully-functional software for processing DDLm domain dictionaries (Nick, Syd and Ian's demonstration software) expects a comma separator
> JH> [James's] understanding is that Syd and Nick (now) are strongly in favour of sticking with comma as the list separator for STAR2
> JH> other non-CIF domains are already using comma as a list separator in STAR2 data files.
> JH> for some, a comma may be a useful visual aid for distinguishing looped items and listed items.
> I responded to each of those points in my first message yesterday: http://www.iucr.org/__data/iucr/lists/ddlm-group/msg01244.html.  The short form is: (a) the direction of STAR2 is not persuasive (and I now add that James's proposal still diverges from STAR2), (b) the demo software will have to be changed anyway, including in this area, and (c) syntactically distinguishing looped and listed items has significant drawbacks directly associated with it.

We are not in a situation where we would produce a standard that
matched exactly either STAR2 or Nick's program.  I am happy to drop
the STAR2 conformance argument in (a), but for (b) there is practical
value in reducing the mismatch with the current software.  For
example, with commas reinstated I believe that it would be possible to
write a CIF2 syntax, DDLm-based dictionary that could be processed by
Nick et. al's software as is.  There are alternative workarounds of
course, such as preprocessing CIF2 syntax into STAR2 syntax.

Regarding (c):  I reproduce your points from yesterday below:

JB> I don't personally see a significant advantage in visually
distinguishing looped items from list elements.  Indeed, there are
disadvantages springing directly from such a distinction, among them:

JB> a) any visual distinction places a burden on parsers to make the
same distinction
JB> b) a distinction here seems arbitrary and inconsistent.  Why
should CIF use differing syntax for the same function (delimiting a
sequence of values)?
JB> c) if adding comma delimiters means further restricting the
character set for whitespace-delimited values, then we thereby
increase CIF2's incompatibility with CIF1

The semantic meaning of a sequence of list values is fundamentally
different from that of a sequence of looped values.  There is no
inherent order for looped values, as the column and row order is
completely arbitrary.  This is not true for lists.  There is
admittedly a certain duality in the separation of values in a list: at
the very basic level they are a sequence of tokens, so whitespace
would be the CIF2 way of separating them (as I argued previously,
before my road to Damascus moment); but on the other hand, unlike most
(all?) other values in a CIF block, the actual order that they are
presented in the data file must be preserved, so it would be desirable
to indicate this.

So, in response to (a) I would say: yes, but the parsers must
distinguish loops and lists anyway, and may store them differently.
For example, a loop value might go directly into a database table, but
a list value must be accumulated somewhere first.

In response to (b) I repeat that a sequence of looped values and a
sequence of listed values are semantically different.  If anything,
the tendency to see them as identical would suggest a comma is a
useful reminder that they are not.

As for (c) I think we might want to in any case remove comma from the
non-delimited string character set, because if we stick with the
current CIF2 spec, the following is a legitimate single-element list:

JB> Are you?  I know you favor allowing both whitespace and comma
separators, but I think you misread JB> James' productions when you
assert (elsewhere) that [,,] would match them.  I don't read them that
JB> way, and James previously wrote that it was not his intention to
allow that sort of construct.

You are right, it was not my intention (although I have no particular
issue with allowing it). I think that can be cleared up during a
semantic tidy-up phase rather than right now.

JB> Furthermore, the productions as currently written are flawed at
least because they do not permit
JB> tables as list items.  They also yield odd results for where
whitespace is allowed relative to commas
JB> (allowed before, but not after).  Those issues can be addressed
with relative ease, of course, but
JB> they're a good reason to defer voting on specific productions.

Indeed. Try these:

<list> = '[' <whitespace>* {<listdatavalue> {<comma or
whitespace><listdatavalue>}*}* ']'
<listdatavalue> = <whitespace>*{<list>|<string>|<table>}<whitespace>*
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list
ddlm-group mailing list

Reply to: [list | sender only]