[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings

To: James Hester <[email protected]>
Subject: Re: [ddlm-group] Use of elides in strings
From: John Westbrook <[email protected]>
Date: Tue, 24 Nov 2009 21:33:28 -0500
Cc: Group finalising DDLm and associated dictionaries <[email protected]>
In-Reply-To: <[email protected]>
Organization: RCSB Protein Data Bank
References: <[email protected]> <C731AC95.125CB%[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]><[email protected]>

Hi James,

My preference is avoid the elides in the syntax for the purpose of escaping terminators
in strings deferring  interpretation to the application.

I do not understand all of the issues related to line folding, which I
believe is an issue for Brian and Simon.

John


James Hester wrote:
> Thanks for the quick reply over Thanksgiving, John.  I take from your
> message that the PDB does not need any elide mechanism to be defined
> in the CIF2 syntax.  Would you therefore be prepared to vote in favour
> of not defining any elides, or would you prefer to abstain?
> 
> Votes so far:
> 
> No elides: James, Nick, Herbert if the IUCr + PDB say it is OK
> Elides:?
> 
> Unknown: John, Joe, David B., Brian, Simon
> 
> On Wed, Nov 25, 2009 at 12:03 PM, John Westbrook
> <[email protected]> wrote:
>> I confess that I am having difficulty keeping up with all aspects
>> of this discussion.   Following Herb's suggestion I will try to
>> summarize the quoting issues from the PDB perspective.
>>
>> 1. As there are multiple ways of quoting a string our tools and files
>> surround embedded quotes with quotes of the opposite sense or with
>> semicolons in the mixed case.   I think that this point has been
>> covered a number of times now and I believe that Nick has suggested
>> that all reasonable cases can be handled by using this approach.
>>
>> 2. I too was not aware that original definition of terminators
>> had changed and did not include either a leading or trailing
>> whitespace.  Certainly this must still be the case for single
>> and double quotes.  I cannot recall ever seeing an example
>> where the terminator \n; was following by a whitespace character,
>> but about half of the codes that I am familiar with would
>> fall over on \n;next_token.
>>
>> 3. Line folding has never been an issue for PDB nor has line length.
>>
>> Regards,
>>
>> John
>>
>>
>> Herbert J. Bernstein wrote:
>>> My major concern about anything we do is to be able to preserve
>>> the functionality of the practices that the IUCr is following in
>>> journal publications and the PDB is following. Inasmuch as they seem
>>> able to cope with no elide in CIF 1.1, the remaining question is whether
>>> they will be negatively impacted by the change in string termination
>>> without any elide.  If they can use CIF 2 with these changes, my
>>> objections are purely academic and irrelevant.  -- Herberrt
>>>
>>> =====================================================
>>>  Herbert J. Bernstein, Professor of Computer Science
>>>    Dowling College, Kramer Science Center, KSC 121
>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                  +1-631-244-3035
>>>                  [email protected]
>>> =====================================================
>>>
>>> On Wed, 25 Nov 2009, James Hester wrote:
>>>
>>>> Herbert: I have the dubious advantage of not having participated in
>>>> all those CIF1.0/1.1 discussions, so only have the spec as written
>>>> down to rely on.
>>>>
>>>> Anyway, how do you feel about abandoning any specification of elides
>>>> in CIF2 syntax, as suggested by Nick?
>>>>
>>>> On Wed, Nov 25, 2009 at 10:53 AM, Herbert J. Bernstein
>>>> <[email protected]> wrote:
>>>>> Dear James,
>>>>>
>>>>>  I started to write:
>>>>>  "No, in CIF 1.1, none of the terminal quote marks, including the \n;
>>>>> are
>>>>> effective unless followed by whitespace (\n, space, tab, of end of
>>>>> file).
>>>>> This is a well-established, and very tricky part of the CIF spec
>>>>> going back
>>>>> to 1990.  That is why Nick had to explicitly specify that a terminal
>>>>> quote
>>>>> mark would be effective no matter what it was followed by."
>>>>>
>>>>>  But the grammer currently on the IUCr web site is _not_ the one that I
>>>>> recall COMCIFs discussing and approving.  It now explcitly removes
>>>>> the requirement for terminal white space in the special case of
>>>>> the \n; text field terminator.  I don't recall when that change was
>>>>> adopted,
>>>>> but it appears that you are right under the current spec
>>>>> about the example I chose.  Inasmuch as there is a lot of working code
>>>>> that enforces and uses the original whitespace handling and uses it
>>>>> in line-folding, I will not revise CIFtbx 3, but I will try to do
>>>>> something to adapt to this change for CIFtbx 4.
>>>>>
>>>>>  I guess we are just going to have yet another few dialects of CIF.
>>>>>
>>>>>  Regards,
>>>>>    Herbert
>>>>> =====================================================
>>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>>   Dowling College, Kramer Science Center, KSC 121
>>>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>>>
>>>>>                 +1-631-244-3035
>>>>>                 [email protected]
>>>>> =====================================================
>>>>>
>>>>> On Wed, 25 Nov 2009, James Hester wrote:
>>>>>
>>>>>> To be precise, we are not 'referring all elides to the application'
>>>>>> because no elides are recognised by the lexer under Nick's latest
>>>>>> suggestion, so there are no elides to refer to the application.
>>>>>>
>>>>>> My understanding of CIF1.1 syntax suggests that the string you provide
>>>>>> would produce a syntax error in CIF1.1, as the semicolon at the start
>>>>>> of the second line would terminate the string, and so whitespace
>>>>>> should then appear as the second character on the second line, rather
>>>>>> than reverse solidus.
>>>>>>
>>>>>> On Wed, Nov 25, 2009 at 9:23 AM, Herbert J. Bernstein
>>>>>> <[email protected]> wrote:
>>>>>>> The only problem with referring all elisdes to the application is that
>>>>>>> with the removal of the requirement of a blank after a \n; for it
>>>>>>> to be
>>>>>>> effective, the line folding protocol develops a slight gap.  The
>>>>>>> case is as follows
>>>>>>>
>>>>>>> ;\
>>>>>>> ;\
>>>>>>> ;
>>>>>>>
>>>>>>> Is a valid single text field in CIF 1.1, which when handled with the
>>>>>>> line folding protocol translates to the equivalent of ';' because the
>>>>>>> embedded ;\ is not a valid text terminator.  If we require that
>>>>>>> a text field the begins with "\n;\\" must be terminated by "\n; "
>>>>>>> or "\n;\n" or "\n;\t" that problem would be fixed.
>>>>>>>
>>>>>>> =====================================================
>>>>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>>>>   Dowling College, Kramer Science Center, KSC 121
>>>>>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>>
>>>>>>>                 +1-631-244-3035
>>>>>>>                 [email protected]
>>>>>>> =====================================================
>>>>>>>
>>>>>>> On Wed, 25 Nov 2009, James Hester wrote:
>>>>>>>
>>>>>>>> I wholeheartedly agree with Nick's suggestion.
>>>>>>>>
>>>>>>>> On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini
>>>>>>>> <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> It appears to me that we have spent far too long on a syntactic
>>>>>>>>> issue
>>>>>>>>> which
>>>>>>>>> can be avoided 99.9999% of the time. Quite simply given the 5
>>>>>>>>> ways to
>>>>>>>>> delimit strings, it is next to impossible to get a situation
>>>>>>>>> where you
>>>>>>>>> cannot choose one of those to make the problem go away.
>>>>>>>>>
>>>>>>>>> I think the RCSB systematically avoid it by choosing
>>>>>>>>>
>>>>>>>>> "ab'cd"
>>>>>>>>> 'ab"cd'
>>>>>>>>> ;ab'"cd
>>>>>>>>> ;
>>>>>>>>>
>>>>>>>>> But now we additionally have """ and ''' to choose from, making
>>>>>>>>> it even
>>>>>>>>> easier.
>>>>>>>>>
>>>>>>>>> So I propose in line with James' position there is NO eliding of
>>>>>>>>> terminator
>>>>>>>>> character at the CIF2 syntax level. ALL elides in the string are
>>>>>>>>> assumed
>>>>>>>>> to
>>>>>>>>> be user specific encoding (say TeX, IUCr \greek) which can be
>>>>>>>>> resolved
>>>>>>>>> at
>>>>>>>>> the dictionary level.
>>>>>>>>>
>>>>>>>>> This necessarily means NO terminator character can appear in a
>>>>>>>>> string
>>>>>>>>> delimited by the same terminator character. You will need to
>>>>>>>>> choose a
>>>>>>>>> different terminator character. That is
>>>>>>>>>
>>>>>>>>> No " in "strings"
>>>>>>>>> No ' in 'strings'
>>>>>>>>> No """ in """strings""" (but separable individual and doublet " are
>>>>>>>>> allowed)
>>>>>>>>> No ''' in '''strings''' (but separable individual and doublet ' are
>>>>>>>>> allowed)
>>>>>>>>>
>>>>>>>>> EVERYTHING in the string is returned as raw (except the
>>>>>>>>> initiating and
>>>>>>>>> terminating character).
>>>>>>>>>
>>>>>>>>> The only time you will not be able to encode anything in a delimited
>>>>>>>>> string
>>>>>>>>> is when you want to include ' " """ ''' and \n; in the one
>>>>>>>>> string. The
>>>>>>>>> likelihood of that is almost zero, unless you may want to include
>>>>>>>>> a CIF
>>>>>>>>> within a CIF (a silly thing to do IMHO). In that case the
>>>>>>>>> contents can
>>>>>>>>> be
>>>>>>>>> encoded in a dictionary driven way. I suggest it be declared as a
>>>>>>>>> BASE64
>>>>>>>>> type and then all the syntactic ambiguity disappears.
>>>>>>>>>
>>>>>>>>> Problem solved! No need to elide because of CIF2 syntax rules all
>>>>>>>>> elides
>>>>>>>>> are
>>>>>>>>> user driven, contents are returned raw.
>>>>>>>>>
>>>>>>>>> As for Herbs comment in a recent email what about line-folding, then
>>>>>>>>> the
>>>>>>>>> same holds. That is NOT a lexer issue and it has nothing to do
>>>>>>>>> with the
>>>>>>>>> parser, everything is read literally and returned raw and what to do
>>>>>>>>> with
>>>>>>>>> it
>>>>>>>>> is promulgated to the downstream application.
>>>>>>>>>
>>>>>>>>> Straw vote - No elides of terminator strings as described above -
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>>
>>>>
>>>>
>>>> --
>>>> T +61 (02) 9717 9907
>>>> F +61 (02) 9717 3145
>>>> M +61 (04) 0249 4148
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> [email protected]
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> [email protected]
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> _______________________________________________
>> ddlm-group mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
> 
> 
> 

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)

References:

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)

Re: [ddlm-group] Use of elides in strings (John Westbrook)

Re: [ddlm-group] Use of elides in strings (James Hester)

Prev by Date: Re: [ddlm-group] Use of elides in strings

Next by Date: Re: [ddlm-group] Use of elides in strings

Prev by thread: Re: [ddlm-group] Use of elides in strings

Next by thread: Re: [ddlm-group] Use of elides in strings

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Use of elides in strings