Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] String concatenation operator in CIF2. .

On Thursday, October 14, 2010 7:48 AM, James Hester wrote:

>There are three separate issues: (1) do we want a string concatenation
>operator? (2) If so, what is the grammar for this operator (3) what
>character(s) will be used for this operator?

>Regarding (1), only Nick has expressed opposition  (I should advise
>you all that he has asked to be unsubscribed from the group, so we do
>not have an opportunity to dialogue with him).

I did not express opposition, but I did recommend tabling the issue for reconsideration as part of the next revision to CIF.  Overall, I do not like the proposal very much, but I'm struggling with whether I have reasonable grounds actually to oppose it.  On that basis, then:

>  I do not find his
>objections particularly convincing. For the record, I support adding a
>concatenation operator because:
>(i) it is one solution to the 'long regex' problem

Indeed it is, but the CIF language does not need to be altered to solve that problem.  In my view, an all-around better solution would be to change the data type for regex items to one that permits non-significant whitespace.  Aside from avoiding a language change, the resultant regex expressions would be even easier to read than ones constructed with use of string concatenation.

Alternatively, the existing line-folding mechanism already serves in this capacity.  Herbert expressed concern about continuing to use that mechanism for this purpose, on the basis that it was not adopted into the CIF2 syntax.  I understand that as a reservation about using the existing mechanism for general purposes, but for specific items or data types, I don't see why it cannot be incorporated into the relevant dictionary / DDL.

Any program that uses regex values *as* regexes must already recognize and specially handle the items or types to which regex semantic conventions apply, so I can't see how this would be any harder to manage than a syntactic-level concatenation operator.

>(ii) it solves the theoretical problem of not being able to represent
>arbitrary strings in CIF2 (including CIF in CIF)

Yes it does, however:
(a) it is not clear to me that this is a problem that needs to be solved at this point
(b) as with (i), there are other solutions.  In particular, I would find a character escape syntax preferable for this purpose.  Such a thing need not be added as a general feature of the CIF language: it, too, could be an aspect of one or more particular data types.

>(iii) it simplifies some problems Simon has been having with CIF text processing

Simon has spoken in favor of the proposal, and described some ideas for related future CIF extensions, but I must have missed it if he mentioned specific practical problems this would solve for him.

In fact, I do not see how the proposal simplifies *anything* processing-wise.  It incrementally complicates processing in order to provide human readability and theoretical expressiveness benefits.

In its capacity as an enabler for future extensions, nothing would be lost by deferring this proposed feature to the point where the extensions are adopted.

>(iv) it is simple to implement in even simple-minded parsers

This is not a reason to support the feature.  Rather, it is a rejection of an argument against doing so.  I can imagine a multitude of things that would be simple to implement, but that we would not want in the language.

Furthermore, although I don't think implementation would be exactly hard, I do think it might be more complicated than you suppose.  In particular, it poses a problem for event-based parsers that, under the proposal, would need to provide some means to handle comments that appear inside data values.  In some reasonable parser designs, that impact would not be limited to the parser itself, but rather would also need to be accommodated by parser clients.

>(v) the concatenation operation is simple for human readers to carry
>out, so readability is not hindered (indeed, in many cases it is

I guess that is a matter of opinion.  Personally, I find that the insertion of " _ " (or similar) into the middle of a piece of text makes it harder for me to read.  If you also put a CIF comment or three in the middle there then it gets worse (though the *meaning* of the text may then be clarified).  To be sure, breaking up a complex regex at strategic points can make it easier to read, but a regex is text only in the most technical sense.  For such specific cases, I prefer an alternative solution.

Furthermore, some of the supposed readability enhancement, especially in line-splitting scenarios, depends on formatting that is not guaranteed to be conserved by CIF processors.  Consider this:


_tag "<line1 ...>"
   _ "<line2 ...>"

A CIF-to-CIF processor is reasonably likely to do one of these things to it:

_tag "<line1 ...><line2 ...>"

_tag "<line1 ...>" _ "<line2 ...>"

_tag "<line1 ...>"
_ "<line2>

etc..  In other words, collapsing the concatenation altogether, undoing the line-folding, disrupting the relative alignment of the pieces, etc..  Once that happens, the readability enhancement is reduced or lost.

>I believe these considerations outweigh the following objections:
>(i) added complexity
>(ii) no longer a plain tag-value format (Nick's objection)
>(iii) adding operators to data format files (not sure why this is exactly bad)

And these?
(iv) most or all of the perceived benefits can be realized without changing CIF (by instead defining new or altered CIF data types)
(v) alternative solutions can support most of the proposed use cases even better than a concatenation operator would
(vi) the impact of the proposed feature is not well understood (but is probably greater than some here believe)
(vii) the proposed feature would constitute another incompatibility with CIF1.
(viii) some of the benefits to be realized are fragile, in the sense that they rely on formatting conventions that are not themselves part of CIF.

>So, unless somebody objects soon I think we can declare that such an
>operator will be included in the spec.

I don't exactly object, yet, but I'm nearing that point.  The more I think about it, the more I prefer an alternative solution.


John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.