[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings


I confess that I am having difficulty keeping up with all aspects
of this discussion.   Following Herb's suggestion I will try to
summarize the quoting issues from the PDB perspective.

1. As there are multiple ways of quoting a string our tools and files
surround embedded quotes with quotes of the opposite sense or with
semicolons in the mixed case.   I think that this point has been
covered a number of times now and I believe that Nick has suggested
that all reasonable cases can be handled by using this approach.

2. I too was not aware that original definition of terminators
had changed and did not include either a leading or trailing
whitespace.  Certainly this must still be the case for single
and double quotes.  I cannot recall ever seeing an example
where the terminator \n; was following by a whitespace character,
but about half of the codes that I am familiar with would
fall over on \n;next_token.

3. Line folding has never been an issue for PDB nor has line length.

Regards,

John


Herbert J. Bernstein wrote:
> My major concern about anything we do is to be able to preserve
> the functionality of the practices that the IUCr is following in
> journal publications and the PDB is following. Inasmuch as they seem 
> able to cope with no elide in CIF 1.1, the remaining question is whether
> they will be negatively impacted by the change in string termination
> without any elide.  If they can use CIF 2 with these changes, my
> objections are purely academic and irrelevant.  -- Herberrt
> 
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
> 
>                  +1-631-244-3035
>                  yaya@dowling.edu
> =====================================================
> 
> On Wed, 25 Nov 2009, James Hester wrote:
> 
>> Herbert: I have the dubious advantage of not having participated in
>> all those CIF1.0/1.1 discussions, so only have the spec as written
>> down to rely on.
>>
>> Anyway, how do you feel about abandoning any specification of elides
>> in CIF2 syntax, as suggested by Nick?
>>
>> On Wed, Nov 25, 2009 at 10:53 AM, Herbert J. Bernstein
>> <yaya@bernstein-plus-sons.com> wrote:
>>> Dear James,
>>>
>>>  I started to write:
>>>  "No, in CIF 1.1, none of the terminal quote marks, including the \n; 
>>> are
>>> effective unless followed by whitespace (\n, space, tab, of end of 
>>> file).
>>> This is a well-established, and very tricky part of the CIF spec 
>>> going back
>>> to 1990.  That is why Nick had to explicitly specify that a terminal 
>>> quote
>>> mark would be effective no matter what it was followed by."
>>>
>>>  But the grammer currently on the IUCr web site is _not_ the one that I
>>> recall COMCIFs discussing and approving.  It now explcitly removes
>>> the requirement for terminal white space in the special case of
>>> the \n; text field terminator.  I don't recall when that change was 
>>> adopted,
>>> but it appears that you are right under the current spec
>>> about the example I chose.  Inasmuch as there is a lot of working code
>>> that enforces and uses the original whitespace handling and uses it
>>> in line-folding, I will not revise CIFtbx 3, but I will try to do
>>> something to adapt to this change for CIFtbx 4.
>>>
>>>  I guess we are just going to have yet another few dialects of CIF.
>>>
>>>  Regards,
>>>    Herbert
>>> =====================================================
>>>  Herbert J. Bernstein, Professor of Computer Science
>>>   Dowling College, Kramer Science Center, KSC 121
>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                 +1-631-244-3035
>>>                 yaya@dowling.edu
>>> =====================================================
>>>
>>> On Wed, 25 Nov 2009, James Hester wrote:
>>>
>>>> To be precise, we are not 'referring all elides to the application'
>>>> because no elides are recognised by the lexer under Nick's latest
>>>> suggestion, so there are no elides to refer to the application.
>>>>
>>>> My understanding of CIF1.1 syntax suggests that the string you provide
>>>> would produce a syntax error in CIF1.1, as the semicolon at the start
>>>> of the second line would terminate the string, and so whitespace
>>>> should then appear as the second character on the second line, rather
>>>> than reverse solidus.
>>>>
>>>> On Wed, Nov 25, 2009 at 9:23 AM, Herbert J. Bernstein
>>>> <yaya@bernstein-plus-sons.com> wrote:
>>>>>
>>>>> The only problem with referring all elisdes to the application is that
>>>>> with the removal of the requirement of a blank after a \n; for it 
>>>>> to be
>>>>> effective, the line folding protocol develops a slight gap.  The
>>>>> case is as follows
>>>>>
>>>>> ;\
>>>>> ;\
>>>>> ;
>>>>>
>>>>> Is a valid single text field in CIF 1.1, which when handled with the
>>>>> line folding protocol translates to the equivalent of ';' because the
>>>>> embedded ;\ is not a valid text terminator.  If we require that
>>>>> a text field the begins with "\n;\\" must be terminated by "\n; "
>>>>> or "\n;\n" or "\n;\t" that problem would be fixed.
>>>>>
>>>>> =====================================================
>>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>>   Dowling College, Kramer Science Center, KSC 121
>>>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>>>
>>>>>                 +1-631-244-3035
>>>>>                 yaya@dowling.edu
>>>>> =====================================================
>>>>>
>>>>> On Wed, 25 Nov 2009, James Hester wrote:
>>>>>
>>>>>> I wholeheartedly agree with Nick's suggestion.
>>>>>>
>>>>>> On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini 
>>>>>> <nick@csse.uwa.edu.au>
>>>>>> wrote:
>>>>>>>
>>>>>>> It appears to me that we have spent far too long on a syntactic 
>>>>>>> issue
>>>>>>> which
>>>>>>> can be avoided 99.9999% of the time. Quite simply given the 5 
>>>>>>> ways to
>>>>>>> delimit strings, it is next to impossible to get a situation 
>>>>>>> where you
>>>>>>> cannot choose one of those to make the problem go away.
>>>>>>>
>>>>>>> I think the RCSB systematically avoid it by choosing
>>>>>>>
>>>>>>> "ab'cd"
>>>>>>> 'ab"cd'
>>>>>>> ;ab'"cd
>>>>>>> ;
>>>>>>>
>>>>>>> But now we additionally have """ and ''' to choose from, making 
>>>>>>> it even
>>>>>>> easier.
>>>>>>>
>>>>>>> So I propose in line with James' position there is NO eliding of
>>>>>>> terminator
>>>>>>> character at the CIF2 syntax level. ALL elides in the string are
>>>>>>> assumed
>>>>>>> to
>>>>>>> be user specific encoding (say TeX, IUCr \greek) which can be 
>>>>>>> resolved
>>>>>>> at
>>>>>>> the dictionary level.
>>>>>>>
>>>>>>> This necessarily means NO terminator character can appear in a 
>>>>>>> string
>>>>>>> delimited by the same terminator character. You will need to 
>>>>>>> choose a
>>>>>>> different terminator character. That is
>>>>>>>
>>>>>>> No " in "strings"
>>>>>>> No ' in 'strings'
>>>>>>> No """ in """strings""" (but separable individual and doublet " are
>>>>>>> allowed)
>>>>>>> No ''' in '''strings''' (but separable individual and doublet ' are
>>>>>>> allowed)
>>>>>>>
>>>>>>> EVERYTHING in the string is returned as raw (except the 
>>>>>>> initiating and
>>>>>>> terminating character).
>>>>>>>
>>>>>>> The only time you will not be able to encode anything in a delimited
>>>>>>> string
>>>>>>> is when you want to include ' " """ ''' and \n; in the one 
>>>>>>> string. The
>>>>>>> likelihood of that is almost zero, unless you may want to include 
>>>>>>> a CIF
>>>>>>> within a CIF (a silly thing to do IMHO). In that case the 
>>>>>>> contents can
>>>>>>> be
>>>>>>> encoded in a dictionary driven way. I suggest it be declared as a
>>>>>>> BASE64
>>>>>>> type and then all the syntactic ambiguity disappears.
>>>>>>>
>>>>>>> Problem solved! No need to elide because of CIF2 syntax rules all
>>>>>>> elides
>>>>>>> are
>>>>>>> user driven, contents are returned raw.
>>>>>>>
>>>>>>> As for Herbs comment in a recent email what about line-folding, then
>>>>>>> the
>>>>>>> same holds. That is NOT a lexer issue and it has nothing to do 
>>>>>>> with the
>>>>>>> parser, everything is read literally and returned raw and what to do
>>>>>>> with
>>>>>>> it
>>>>>>> is promulgated to the downstream application.
>>>>>>>
>>>>>>> Straw vote - No elides of terminator strings as described above - 
>>>>>>> Nick
>>>>>>>
>>>>>>>
>>
>>>
>>
>>
>>
>> -- 
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]