Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings

My major concern about anything we do is to be able to preserve
the functionality of the practices that the IUCr is following in
journal publications and the PDB is following. Inasmuch as they seem able 
to cope with no elide in CIF 1.1, the remaining question is whether
they will be negatively impacted by the change in string termination
without any elide.  If they can use CIF 2 with these changes, my
objections are purely academic and irrelevant.  -- Herberrt

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Wed, 25 Nov 2009, James Hester wrote:

> Herbert: I have the dubious advantage of not having participated in
> all those CIF1.0/1.1 discussions, so only have the spec as written
> down to rely on.
>
> Anyway, how do you feel about abandoning any specification of elides
> in CIF2 syntax, as suggested by Nick?
>
> On Wed, Nov 25, 2009 at 10:53 AM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>> Dear James,
>>
>>  I started to write:
>>  "No, in CIF 1.1, none of the terminal quote marks, including the \n; are
>> effective unless followed by whitespace (\n, space, tab, of end of file).
>> This is a well-established, and very tricky part of the CIF spec going back
>> to 1990.  That is why Nick had to explicitly specify that a terminal quote
>> mark would be effective no matter what it was followed by."
>>
>>  But the grammer currently on the IUCr web site is _not_ the one that I
>> recall COMCIFs discussing and approving.  It now explcitly removes
>> the requirement for terminal white space in the special case of
>> the \n; text field terminator.  I don't recall when that change was adopted,
>> but it appears that you are right under the current spec
>> about the example I chose.  Inasmuch as there is a lot of working code
>> that enforces and uses the original whitespace handling and uses it
>> in line-folding, I will not revise CIFtbx 3, but I will try to do
>> something to adapt to this change for CIFtbx 4.
>>
>>  I guess we are just going to have yet another few dialects of CIF.
>>
>>  Regards,
>>    Herbert
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>   Dowling College, Kramer Science Center, KSC 121
>>        Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                 +1-631-244-3035
>>                 yaya@dowling.edu
>> =====================================================
>>
>> On Wed, 25 Nov 2009, James Hester wrote:
>>
>>> To be precise, we are not 'referring all elides to the application'
>>> because no elides are recognised by the lexer under Nick's latest
>>> suggestion, so there are no elides to refer to the application.
>>>
>>> My understanding of CIF1.1 syntax suggests that the string you provide
>>> would produce a syntax error in CIF1.1, as the semicolon at the start
>>> of the second line would terminate the string, and so whitespace
>>> should then appear as the second character on the second line, rather
>>> than reverse solidus.
>>>
>>> On Wed, Nov 25, 2009 at 9:23 AM, Herbert J. Bernstein
>>> <yaya@bernstein-plus-sons.com> wrote:
>>>>
>>>> The only problem with referring all elisdes to the application is that
>>>> with the removal of the requirement of a blank after a \n; for it to be
>>>> effective, the line folding protocol develops a slight gap.  The
>>>> case is as follows
>>>>
>>>> ;\
>>>> ;\
>>>> ;
>>>>
>>>> Is a valid single text field in CIF 1.1, which when handled with the
>>>> line folding protocol translates to the equivalent of ';' because the
>>>> embedded ;\ is not a valid text terminator.  If we require that
>>>> a text field the begins with "\n;\\" must be terminated by "\n; "
>>>> or "\n;\n" or "\n;\t" that problem would be fixed.
>>>>
>>>> =====================================================
>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>   Dowling College, Kramer Science Center, KSC 121
>>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>>
>>>>                 +1-631-244-3035
>>>>                 yaya@dowling.edu
>>>> =====================================================
>>>>
>>>> On Wed, 25 Nov 2009, James Hester wrote:
>>>>
>>>>> I wholeheartedly agree with Nick's suggestion.
>>>>>
>>>>> On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini <nick@csse.uwa.edu.au>
>>>>> wrote:
>>>>>>
>>>>>> It appears to me that we have spent far too long on a syntactic issue
>>>>>> which
>>>>>> can be avoided 99.9999% of the time. Quite simply given the 5 ways to
>>>>>> delimit strings, it is next to impossible to get a situation where you
>>>>>> cannot choose one of those to make the problem go away.
>>>>>>
>>>>>> I think the RCSB systematically avoid it by choosing
>>>>>>
>>>>>> "ab'cd"
>>>>>> 'ab"cd'
>>>>>> ;ab'"cd
>>>>>> ;
>>>>>>
>>>>>> But now we additionally have """ and ''' to choose from, making it even
>>>>>> easier.
>>>>>>
>>>>>> So I propose in line with James' position there is NO eliding of
>>>>>> terminator
>>>>>> character at the CIF2 syntax level. ALL elides in the string are
>>>>>> assumed
>>>>>> to
>>>>>> be user specific encoding (say TeX, IUCr \greek) which can be resolved
>>>>>> at
>>>>>> the dictionary level.
>>>>>>
>>>>>> This necessarily means NO terminator character can appear in a string
>>>>>> delimited by the same terminator character. You will need to choose a
>>>>>> different terminator character. That is
>>>>>>
>>>>>> No " in "strings"
>>>>>> No ' in 'strings'
>>>>>> No """ in """strings""" (but separable individual and doublet " are
>>>>>> allowed)
>>>>>> No ''' in '''strings''' (but separable individual and doublet ' are
>>>>>> allowed)
>>>>>>
>>>>>> EVERYTHING in the string is returned as raw (except the initiating and
>>>>>> terminating character).
>>>>>>
>>>>>> The only time you will not be able to encode anything in a delimited
>>>>>> string
>>>>>> is when you want to include ' " """ ''' and \n; in the one string. The
>>>>>> likelihood of that is almost zero, unless you may want to include a CIF
>>>>>> within a CIF (a silly thing to do IMHO). In that case the contents can
>>>>>> be
>>>>>> encoded in a dictionary driven way. I suggest it be declared as a
>>>>>> BASE64
>>>>>> type and then all the syntactic ambiguity disappears.
>>>>>>
>>>>>> Problem solved! No need to elide because of CIF2 syntax rules all
>>>>>> elides
>>>>>> are
>>>>>> user driven, contents are returned raw.
>>>>>>
>>>>>> As for Herbs comment in a recent email what about line-folding, then
>>>>>> the
>>>>>> same holds. That is NOT a lexer issue and it has nothing to do with the
>>>>>> parser, everything is read literally and returned raw and what to do
>>>>>> with
>>>>>> it
>>>>>> is promulgated to the downstream application.
>>>>>>
>>>>>> Straw vote - No elides of terminator strings as described above - Nick
>>>>>>
>>>>>>
>
>>
>
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.