Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] String concatenation operator in CIF2

Dear all

I believe that a string concatenation operator should meet the following criteria:

1) totally unambiguous (not open to interpretation as any other CIF element)
2) not require further restrictions on character sets etc.

To this end, I think we are left with a couple of options:

i) work with the characters that currently have no syntactic meaning

ii) introduce a new keyword

Approach (i) boils down to using characters that cannot commence a non-delimited string.
Obviously the delimiter characters ' " are of no use, nor the [ { list delimiters - these have syntactic meaning.
This leaves the underscore or a dollar to commence the operator.
If we use an underscore followed by any other characters it could be read as a dataname
(note that *dictionaries* place restrictions on the character sets of datanames - i.e.
at a higher level - beyond syntax).
So that leaves us with the 'lonely' underscore (which to my mind works unquestionably).
If we use the dollar followed by other characters, we do open up the possibility of defining as
many operators as we like (I've mentioned before that I have plans for the $ in this respect, though
more along the lines of its perhaps familiar role in identifying variables :-).

Approach (ii) again opens up possibilities to define all sorts of operators; however, I think
there should be a distinction between 'keywords' in the traditional STAR/CIF sense and
these operators (i.e. such an 'operator' does not really have the same fundamental significance as a

So, as I see it, we're left with:

(a) _ (i.e. solitary underscore)

(b) $ (solitary)

(c) $ followed by some other character(s) (e.g. $// ...)

Options (b) and (c) still have the drawback that they may be valid CIF1 values - so if we use the dollar
I would suggest using it as in (c), to create a token that is highly unlikely to be found in
legacy CIFs (i.e. respecting that legacy as we have tried to do in many other aspects of CIF2).



From: SIMON WESTRIP <simonwestrip@btinternet.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Thursday, 14 October, 2010 14:20:10
Subject: Re: [ddlm-group] String concatenation operator in CIF2

The problem with the following is that they are all valid non-delimited (whitespace-delimited) strings:

(a) //  (double forward slash, as in fortran)
(b) >> (double greater than)
(c) |concat|
(d) =concat=

So though unlikely, they might appear in a loop and lead to ambiguity?



From: James Hester <jamesrhester@gmail.com>
To: ddlm-group <ddlm-group@iucr.org>
Sent: Thursday, 14 October, 2010 13:47:48
Subject: [ddlm-group] String concatenation operator in CIF2

I've started a new thread on this for simplicity.

There are three separate issues: (1) do we want a string concatenation
operator? (2) If so, what is the grammar for this operator (3) what
character(s) will be used for this operator?

Regarding (1), only Nick has expressed opposition  (I should advise
you all that he has asked to be unsubscribed from the group, so we do
not have an opportunity to dialogue with him).  I do not find his
objections particularly convincing. For the record, I support adding a
concatenation operator because:
(i) it is one solution to the 'long regex' problem
(ii) it solves the theoretical problem of not being able to represent
arbitrary strings in CIF2 (including CIF in CIF)
(iii) it simplifies some problems Simon has been having with CIF text processing
(iv) it is simple to implement in even simple-minded parsers
(v) the concatenation operation is simple for human readers to carry
out, so readability is not hindered (indeed, in many cases it is

I believe these considerations outweigh the following objections:
(i) added complexity
(ii) no longer a plain tag-value format (Nick's objection)
(iii) adding operators to data format files (not sure why this is exactly bad)

So, unless somebody objects soon I think we can declare that such an
operator will be included in the spec.

Regarding the particular grammar of concatenation, I believe we are
all in agreement that the concatenation operator should be separated
by whitespace from all neighbouring tokens.  Again: unless there are
objections soon, we can also declare the grammar for the operator to
have been accepted.  Incidentally, do we all agree that _item_q and
_item_s in the following have the same value?

_item_q        3  <concat> 4
_item_s        34

Regarding the particular characters to use to represent concatenation:
'+' is a poor choice given its possible uses as a datavalue in its own
right, and I find that  '_' is a little unintuitive and unnecessarily
overloads underscore. Note that there is no reason to limit ourselves
to a single character as we expect this operator to be used very
sparingly: we can use a digraph or trigraph if we so desire,
especially if it makes the meaning clearer to a naive user.  How about
one of the following options?

(a) //  (double forward slash, as in fortran)
(b) >> (double greater than)
(c) |concat|
(d) =concat=

I find (a) most appealing, but perhaps I'm showing my age?

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.