Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] String concatenation operator in CIF2. .

I am in agreement with Herbert. Alternate solutions, such as defining new data types or new 'common semantic features' would
require lengthy discussions and would probably result in far greater changes to CIF than this basic string concatenation mechanism.
For example, I may just wish to interject a comment within a publication-oriented data item - this cant be done at present because
the comment will automatically become part of the item's value. (Though 'comments' are inevitably transient, they can be very useful
for 'housekeeping' and 'instruction' purposes.)  Alternate solutions in this case might involve the definition of a markup convention,
but this convention might not be respected or might be illegal if the dictionary-definition of the item specified a different markup for that item...
in short, an alternate solution to allow comments 'within the scope' of data values would be far from trivial (and indeed would probably be far less
robust than making use of the string concatenation mechanism).



From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Tuesday, 19 October, 2010 12:17:01
Subject: Re: [ddlm-group] String concatenation operator in CIF2. .

Dear James,

  Now we are down to matters of taste about alternate changes: an easy to implement but less powerful concatentation operator versus harder to implement but arbitrarily powerful incompletely specified new data types. In view of the past history of this group in rejecting even the existing imgCIF MIME-based binary data type, the rejection of the existing line-folding protocol, the two years needed to decide on which combinations of brackets we were going to use, and the amount of time to deal with the less troublesome encoding issues, to me it seems unwise to try to deal with a new data type right now to solve the very real problems of the lack of human readability of CIF2 in dealing with long strings (regexes, PDB sequences, etc).  Now I do understand that some people, indeed most users, rarely or never read CIF dictionaries or mmCIF PDB entries "by eye," but some of us have to.  The simple concatentation operator solves this very real problem for dictionary developers and those who might like to be able to manually read a PDB mmCIF entry.

CIF is not a religion.  It is a tool.  I urge this group and will urge COMCIFS to be pragmatic and adopt Simon's underscore as a concatenation operator.  I like it so much, that, unless sombody comes up with much better alternative or a truly practical reason why it is bad to do so, I will code it as an option on CBFlib and cif2cbf for my own use and use by any interested friends, so I can at least be able to read the files I have to work with.  Don't worry, I won't call those files CIFs.  I'll call
them something else, but for my work I suspect I'll be using that
format in preference to a purer dialect of CIF that does not support a concatentation operator.


Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769


On Tue, 19 Oct 2010, James Hester wrote:

> I agree that *if* we are to have a concatenation operator then a whitespace
> delimited underscore appears to be the best choice.  John B's email
> contained a list of 8 objections to the *concept* of a concatenation
> operator which I have reproduced below.  I find the most substantial
> objections to be (iv) and (v).  Perhaps you would like to address these
> particular objections?
> (James's list)
> (i) added complexity
> (ii) no longer a plain tag-value format (Nick's objection)
> (iii) adding operators to data format files (not sure why this is exactly
> bad)
> (to which John B added:)
> (iv) most or all of the perceived benefits can be realized without changing
> CIF (by instead defining new or altered CIF data types)
> (v) alternative solutions can support most of the proposed use cases even
> better than a concatenation operator would
> (vi) the impact of the proposed feature is not well understood (but is
> probably greater than some here believe)
> (vii) the proposed feature would constitute another incompatibility with
> CIF1.
> (viii) some of the benefits to be realized are fragile, in the sense that
> they rely on formatting conventions that are not themselves part of CIF.
> On Tue, Oct 19, 2010 at 1:35 PM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>      Dear Colleagues,
>        I may not have completely understood this discussion.  To me
>      much
>      of the reasoning to be objecting to introducing the
>      concatenation operator because it changes the meaning of some
>      possible CIF files, not files that anybody has ever seen, but
>      possible ones.  Simon's proposal of using the underscore really
>      does not conflict with any existing CIF because it cannot be
>      a data value (it begins with an underscore) but also cannot be
>      a tag name, because we have implicitly treated the underscore in
>      a tag name as a delimiter rather than part of the name and have
>      not accepted empty names.
>        The only real remaining objection to the underscore seems to
>      be
>      that it does not look as nice as some of the more classical
>      concatenation operators: +, //, etc.  While I would prefer one
>      of the classical operators, there really is nothing wrong with
>      the
>      underscore and it really does make it much easier to deal with
>      the transition to CIF2 both for regular expressions and for
>      folded lines.
>        I ask for a formal vote by COMCIFS on the use of the
>      white-space
>      isolated underscore as a concatenation operator.  To me it seems
>      to
>      be a very useful addition to CIF2 and at worst a harmless
>      addition.
>        Regards,
>          Herbert
>      At 11:21 AM -0500 10/14/10, Bollinger, John C wrote:
>      >On Thursday, October 14, 2010 7:48 AM, James Hester wrote:
>      >
>      >>There are three separate issues: (1) do we want a string
>      concatenation
>      >>operator? (2) If so, what is the grammar for this operator (3)
>      what
>      >>character(s) will be used for this operator?
>      >
>      >[...]
>      >
> >>Regarding the particular characters to use to represent
> concatenation:
> >>'+' is a poor choice given its possible uses as a datavalue in its
> own
> >>right, and I find that  '_' is a little unintuitive and
> unnecessarily
> >>overloads underscore. Note that there is no reason to limit
> ourselves
> >>to a single character as we expect this operator to be used very
> >>sparingly: we can use a digraph or trigraph if we so desire,
> >>especially if it makes the meaning clearer to a naive user.
> >
> >If a concatenation operator is adopted, then I agree with Simon that
> >it should be something that is not legal as a whitespace-delimited
> >data value.  (Ideally, not in CIF1, either.)  I would prefer that it
> >not be legal even as the leading character(s) of a
> >whitespace-delimited data value.  Without departing from the allowed
> >ASCII characters, the lone underscore is the only viable string I
> >see that meets all those criteria.
> >
> >I don't so much mind '//' as a concatenation operator, except that
> >it must then be made a reserved word, and maybe even a reserved
> >prefix.  At least such a reservation is less likely to be a problem
> >than reservation of '+' would be, but I don't see that choice as
> >much more intuitive to the average user than '_'.
> >
> >
> >Regards,
> >
> >John
> >--
> >John C. Bollinger, Ph.D.
> >Department of Structural Biology
> >St. Jude Children's Research Hospital
> >
> >
> >
> >
> >
> >Email Disclaimer:  www.stjude.org/emaildisclaimer
> >
> >_______________________________________________
> >ddlm-group mailing list
> >ddlm-group@iucr.org
> >http://scripts.iucr.org/mailman/listinfo/ddlm-group
> --
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>                  +1-631-244-3035
>                  yaya@dowling.edu
> =====================================================
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.