Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Revisiting list delimiters. .

On Wednesday, April 06, 2011 1:36 AM, James Hester wrote:

>In the process of preparing for a vote on accepting the DDLm
>dictionary, I have come to the conclusion that we need to revisit the
>question of the separator character for lists.  This is because the
>only fully-functional software for processing DDLm domain dictionaries
>(Nick, Syd and Ian's demonstration software) expects a comma
>separator, and my understanding is that Syd and Nick (now) are
>strongly in favour of sticking with comma as the list separator for
>STAR2.  Furthermore, other non-CIF domains collaborating with Nick and
>Syd are already using comma as a list separator in STAR2 data files.
>Additionally, I've formed the view that a comma is a useful visual aid for
>distinguishing looped items and listed items.

I have much less regard for STAR compatibility in CIF2 than I once did, given that we don't have it anyway, so the direction of STAR2 in this area is not persuasive.  The behavior of the existing demonstration software is a more viable consideration, but

1) Would it really be so hard to modify the demo software to conform to the CIF2 syntax specs as they are currently defined?  Isn't it true that some such modifications will be required anyway?

2) If complying with the current behavior of the demo software is the alternative being proposed, then there are a host of technical details that need to be specified, large among them:

a) is whitespace allowed before and/or after delimiting comma?
b) will the comma be forbidden from appearing in *all* whitespace-delimited values, or just those in lists?
c) are doubled commas a syntax error or delimiters of empty values?
d) are trailing commas a syntax error, delimiters of trailing empty values, or ignorable?
e) are commas used as separators only in lists, or also in tables?

I don't personally see a significant advantage in visually distinguishing looped items from list elements.  Indeed, there are disadvantages springing directly from such a distinction, among them:

a) any visual distinction places a burden on parsers to make the same distinction
b) a distinction here seems arbitrary and inconsistent.  Why should CIF use differing syntax for the same function (delimiting a sequence of values)?
c) if adding comma delimiters means further restricting the character set for whitespace-delimited values, then we thereby increase CIF2's incompatibility with CIF1

>I've reviewed our previous discussion starting at message:
>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00338.html and
>culminating in a tally at
>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00406.html (with a
>late vote after this from John W. for spaces only).  It seems that the
>strongest preferences expressed were from Herb (for comma and space)
>and from John W (for space only in order to avoid mixed-delimiter

I don't think much has changed from a technical perspective.  Here are some highlights of the previous discussion:

"You could say that token separator in lists are a or b or c, but that just adds a level of complexity for very little gain."  (Nick, before the alternative of only space separators was proposed)

"Make [list delimiters] spaces only, and you become consistent across the board."  (Nick)

"[It is an open question that] once you say a space and comma are equivalent token separators then will it be an interpretation that they are always so even in loops?"  (Nick)

"I would opt for using the space as a separator as elsewhere in the CIF, partly because then a list can essentially be treated much like a single-item loop - i.e. same basic parsing of <value><space><value><space>... "  (Simon)

"This more consistent approach lead to grammar rules that were the same whether tokens were inside the new compound data types of not."  (Nick)

"The blank reads particularly well in dealing with vectors and matrices.  The comma reads well when dealing with strings."  (Herbert, arguing for allowing both space and comma as delimiters)

>I would therefore like to propose that we switch to allowing comma
>*or* space as list item delimiters.  This will considerably simplify
>the work needed to adapt the current DDLm/dREL software and
>documentation.  I am also open to switching back to comma only, but think
>that that might meet with some resistance.

Could you explain that a bit more?  It sounds like you are proposing to allow syntax (optional space delimiters) that doesn't exactly match the existing software anyway.  What, then, do we actually stand to gain?  And how does the work to adapt the software and documentation stand to be lessened by the proposed change?

>I apologise for reopening this old discussion, but it looks like
>reintroducing commas will produce the best practical outcome.  Note
>that I would propose keeping the behaviour that was generally accepted
>in the previous discussion, i.e.
>* two commas without an intervening value is a syntax error, as is a
>trailing comma
>* lists may use a combination of comma and whitespace separation
>(although one might expect that to be vanishingly rare in practice)
>but this should be discouraged.

Those are only some of the details that need to be documented.  What about the others listed under (2) above (which may not be an exhaustive list)?

Also, I don't read the previous discussion as agreeing on those details.  The question became moot when commas were dropped as delimiters, but Herbert had written "I see nothing to be gained by now forbidding the comma.  The meaning of {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or, under this new (and I think more sensible and realistic approach) {a . b .} or {a ? b ?}."  In other words, (as he viewed it then) a value always logically follows a delimiting comma, even when none is literally present.  That's a potentially viable option, and I'm inclined to like its logical consistency.  There was an argument against, but no real conclusion.

Furthermore, I am not at all persuaded that the proposal will produce a better outcome than the syntax as presently defined.  Pending further clarification, I am inclined to believe it would produce a worse outcome.

>If I hear no strong dissenting voices, I will produce some draft text
>for your comment
>then edit it into the draft standard when it next comes before COMCIFS.

I think that would be premature at this point.


John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.