Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] String concatenation operator in CIF2

The problem with the following is that they are all valid non-delimited (whitespace-delimited) strings:

(a) //  (double forward slash, as in fortran)
(b) >> (double greater than)
(c) |concat|
(d) =concat=

So though unlikely, they might appear in a loop and lead to ambiguity?



From: James Hester <jamesrhester@gmail.com>
To: ddlm-group <ddlm-group@iucr.org>
Sent: Thursday, 14 October, 2010 13:47:48
Subject: [ddlm-group] String concatenation operator in CIF2

I've started a new thread on this for simplicity.

There are three separate issues: (1) do we want a string concatenation
operator? (2) If so, what is the grammar for this operator (3) what
character(s) will be used for this operator?

Regarding (1), only Nick has expressed opposition  (I should advise
you all that he has asked to be unsubscribed from the group, so we do
not have an opportunity to dialogue with him).  I do not find his
objections particularly convincing. For the record, I support adding a
concatenation operator because:
(i) it is one solution to the 'long regex' problem
(ii) it solves the theoretical problem of not being able to represent
arbitrary strings in CIF2 (including CIF in CIF)
(iii) it simplifies some problems Simon has been having with CIF text processing
(iv) it is simple to implement in even simple-minded parsers
(v) the concatenation operation is simple for human readers to carry
out, so readability is not hindered (indeed, in many cases it is

I believe these considerations outweigh the following objections:
(i) added complexity
(ii) no longer a plain tag-value format (Nick's objection)
(iii) adding operators to data format files (not sure why this is exactly bad)

So, unless somebody objects soon I think we can declare that such an
operator will be included in the spec.

Regarding the particular grammar of concatenation, I believe we are
all in agreement that the concatenation operator should be separated
by whitespace from all neighbouring tokens.  Again: unless there are
objections soon, we can also declare the grammar for the operator to
have been accepted.  Incidentally, do we all agree that _item_q and
_item_s in the following have the same value?

_item_q        3  <concat> 4
_item_s        34

Regarding the particular characters to use to represent concatenation:
'+' is a poor choice given its possible uses as a datavalue in its own
right, and I find that  '_' is a little unintuitive and unnecessarily
overloads underscore. Note that there is no reason to limit ourselves
to a single character as we expect this operator to be used very
sparingly: we can use a digraph or trigraph if we so desire,
especially if it makes the meaning clearer to a naive user.  How about
one of the following options?

(a) //  (double forward slash, as in fortran)
(b) >> (double greater than)
(c) |concat|
(d) =concat=

I find (a) most appealing, but perhaps I'm showing my age?

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.