Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] A modest addition to the DDLm spec. .. .

Dear Nick,

   You are lucky -- I have to read and understand those regular
expressions quite often, especially when it turns out that
one of them isn't doing what it is supposed to do.

  In this case, at the top level, the "+' would be operating on
string literals, so it _can_ be handled in the lexical scanner.
Certainly, lower down, at the dREL level it has to be handled
in the parser.

   Personally, I use and teach my students to use, concatenated
C string literals.  They are very, very useful, cost nothing
(in C they are handled by the preprocessor) and make code
much more readable.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Fri, 1 Oct 2010, Nick Spadaccini wrote:

> I don't think anything here is any more readable than anything else. But
> then I am assuming no-one is actually *reading* this stuff. I have
> applications that do all this for me.
>
> In writing this stuff for a dictionary, it will always be hard to do
> syntactically when these regexes are so long. But then again you are only
> going to do it once for each data item that needs it.
>
> I don't know what are the great advantages to this approach that Simon
> alludes to, but remember CIF files are a syntax to define values. The
> overloading of the + operator for programming languages is used for the
> *construction* of a string value, invariably at runtime when the components
> aren't known until runtime.
>
> Hence this all makes sense in dREL (which supports it) but not in a
> declaration of a string literal as is done in CIF data files.
>
> I certainly do not teach my Java students to declare a string literal as the
> concatenation of several string literals.
>
>
> On 1/10/10 5:04 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
> wrote:
>
>> Try working with some of the longer regexes for a while and you may
>> come to appreciate having something like the plus or the line-folding
>> backslash to allow you to present what really in one very long
>> single-line character string with lots of funny stuff in it
>> over a series of line and broken with some whitespace that is
>> not part of the string.  To take one which does not involve quote
>> marks:
>>
>>
>> "[0-9]?[0-9]?[0-9][0-9]-[0-9]?[0-9]-[0-9]?[0-9]((T[0-2][0-9](:[0-5][0-9](:[0-5
>> ][0-9](.[0-9]+)?)?)?)?\([+-][0-5][0-9]:[0-5][0-9]))?"
>>
>> is much harder to read and understand than
>>
>>
>> "[0-9]?[0-9]?[0-9][0-9]-[0-9]?[0-9]-[0-9]?[0-9]"+
>> "((T[0-2][0-9](:[0-5][0-9](:[0-5][0-9](.[0-9]+)?)?)?)?"+
>> "([+-][0-5][0-9]:[0-5][0-9]))?"
>>
>> which I normally do with the CIF1 line folding protocol, and the
>> treble quote add nothing of value, and to return to quotes
>>
>> """[][ \n\t()_,.;:"&<>/\{}'`~!@#$%?+=*A-Za-z0-9|^-]*"""
>>
>> is not as clear to me as
>>
>> "[][ \n\t()_,.;:" + '"&<>/\{}' + "'`~!@#$%?+=*"+
>> "A-Za-z0-9|^-]*"
>>
>> which helps to organize the data for me and calls out the
>> troublesome case.
>>
>> However, the question is not one of your taste or mine or Nic's, nor
>> even whether the feature is useful to everybody, but whether
>> it is useful to some reasonable number of people and
>> whether having it causes some kind of problem for other people.
>>
>> I assume we all agree that CIF is intended to be a useful tool to get
>> work involved with crystallography done.  The string concatenation
>> operator is one of the most generally useful operators, and there is no
>> sound reason not to support it top to bottom, from the top level CIF
>> data down to the dREL language itself.  It is in dREL right now
>> (see 7.1.2), so we are going to have to handle it down there.
>> (Indeed, dREL has 2 string concatenations, one with blank for
>> string literals and one with "+" for string objects).
>>
>> Why is it such a big issue to also handle the "+" at the top level?
>>
>>
>>
>>
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
>>
>> On Thu, 30 Sep 2010, Bollinger, John C wrote:
>>
>>>
>>> On Thursday, September 30, 2010 12:23 PM, Herbert J. Bernstein wrote:
>>>
>>>> It reduces the incompatability with CIF1 introduced by the change
>>>> in string quoting syntax, allowing the resulting CIF2 CIFS to
>>>> be much closer to their CIF1 originals,
>>>
>>> I don't buy that one.  Which is closer to 'O'Donnell said, "Pshaw"'?
>>>
>>> '''O'Donnell said, "Pshaw"'''
>>>
>>> or
>>>
>>> "O'Donnell said, " + '"Pshaw"'
>>>
>>> For me, it's the former, and that becomes much more the case the more
>>> concatenations are involved.  Consider, for example, how the concatenation
>>> approach would look for this slight variation: 'O'Donnell said, "P'shaw"'.
>>>
>>>> fills that gap
>>>> created by not dealing with elides for lone folding in
>>>> a simpler way,
>>>
>>> Are you saying that it provides a superior line-folding approach than the one
>>> already used with CIF1?  I'll have to think about that.  Does your argument
>>> apply in general, or only for particular cases such as regex?
>>>
>>> Is a line-folding mechanism that is incompatible with CIF1 even relevant?  I
>>> have always thought that the most important reason for having a line-folding
>>> protocol at all was for compatibility with CIF readers that implement the old
>>> 80-character line limit.  An alternative that is incompatible with CIF1 is
>>> useless for that purpose.
>>>
>>> Any way around, I'm hesitant to promote line folding from a semantic
>>> consideration to a syntactic one.
>>>
>>>> and conforms to well-established pratice in
>>>> multiple programming languages.
>>>
>>> That is not germane as far as I am concerned.  CIF is not a programming
>>> language, and its audience (as opposed to the audience of the specification)
>>> contains many non-programmers.
>>>
>>>>  C manages to deal with this
>>>> using the blank as the concatenation operator at the C preprocessor
>>>> level, so we should be able to handle it at the lexical level.
>>>
>>> Certainly we *can* handle it.  And doing so will make our code a little bit
>>> more complex, and a little bit more difficult to maintain.
>>>
>>> Also, this proposal would either make CIF diverge (further?) from STAR, or
>>> would require STAR to adopt the same change.  If we want the latter then we
>>> cannot settle the question here.
>>>
>>> I continue to reserve judgment, but right now it looks like more down side
>>> than up side to me.  Furthermore, it still seems that this could be addressed
>>> as well or better at the DDL and/or dictionary level, but perhaps that would
>>> have impacts that I do not presently appreciate.
>>>
>>>
>>> Regards,
>>>
>>> John
>>> --
>>> John C. Bollinger, Ph.D.
>>> Department of Structural Biology
>>> St. Jude Children's Research Hospital
>>>
>>>
>>>
>>>
>>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
>
> CRICOS Provider Code: 00126G
>
> e: Nick.Spadaccini@uwa.edu.au
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.