[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Use of elides in strings
From: James Hester <[email protected]>
Date: Wed, 25 Nov 2009 11:44:40 +1100
In-Reply-To: <[email protected]>
References: <[email protected]><C731AC95.125CB%[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]>

Would John and Brian and/or Simon please comment on this?

On Wed, Nov 25, 2009 at 11:21 AM, Herbert J. Bernstein
<[email protected]> wrote:
> My major concern about anything we do is to be able to preserve
> the functionality of the practices that the IUCr is following in
> journal publications and the PDB is following. Inasmuch as they seem able to
> cope with no elide in CIF 1.1, the remaining question is whether
> they will be negatively impacted by the change in string termination
> without any elide.  If they can use CIF 2 with these changes, my
> objections are purely academic and irrelevant.  -- Herberrt
>
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 [email protected]
> =====================================================
>
> On Wed, 25 Nov 2009, James Hester wrote:
>
>> Herbert: I have the dubious advantage of not having participated in
>> all those CIF1.0/1.1 discussions, so only have the spec as written
>> down to rely on.
>>
>> Anyway, how do you feel about abandoning any specification of elides
>> in CIF2 syntax, as suggested by Nick?
>>
>> On Wed, Nov 25, 2009 at 10:53 AM, Herbert J. Bernstein
>> <[email protected]> wrote:
>>>
>>> Dear James,
>>>
>>>  I started to write:
>>>  "No, in CIF 1.1, none of the terminal quote marks, including the \n; are
>>> effective unless followed by whitespace (\n, space, tab, of end of file).
>>> This is a well-established, and very tricky part of the CIF spec going
>>> back
>>> to 1990.  That is why Nick had to explicitly specify that a terminal
>>> quote
>>> mark would be effective no matter what it was followed by."
>>>
>>>  But the grammer currently on the IUCr web site is _not_ the one that I
>>> recall COMCIFs discussing and approving.  It now explcitly removes
>>> the requirement for terminal white space in the special case of
>>> the \n; text field terminator.  I don't recall when that change was
>>> adopted,
>>> but it appears that you are right under the current spec
>>> about the example I chose.  Inasmuch as there is a lot of working code
>>> that enforces and uses the original whitespace handling and uses it
>>> in line-folding, I will not revise CIFtbx 3, but I will try to do
>>> something to adapt to this change for CIFtbx 4.
>>>
>>>  I guess we are just going to have yet another few dialects of CIF.
>>>
>>>  Regards,
>>>    Herbert
>>> =====================================================
>>>  Herbert J. Bernstein, Professor of Computer Science
>>>   Dowling College, Kramer Science Center, KSC 121
>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                 +1-631-244-3035
>>>                 [email protected]
>>> =====================================================
>>>
>>> On Wed, 25 Nov 2009, James Hester wrote:
>>>
>>>> To be precise, we are not 'referring all elides to the application'
>>>> because no elides are recognised by the lexer under Nick's latest
>>>> suggestion, so there are no elides to refer to the application.
>>>>
>>>> My understanding of CIF1.1 syntax suggests that the string you provide
>>>> would produce a syntax error in CIF1.1, as the semicolon at the start
>>>> of the second line would terminate the string, and so whitespace
>>>> should then appear as the second character on the second line, rather
>>>> than reverse solidus.
>>>>
>>>> On Wed, Nov 25, 2009 at 9:23 AM, Herbert J. Bernstein
>>>> <[email protected]> wrote:
>>>>>
>>>>> The only problem with referring all elisdes to the application is that
>>>>> with the removal of the requirement of a blank after a \n; for it to be
>>>>> effective, the line folding protocol develops a slight gap.  The
>>>>> case is as follows
>>>>>
>>>>> ;\
>>>>> ;\
>>>>> ;
>>>>>
>>>>> Is a valid single text field in CIF 1.1, which when handled with the
>>>>> line folding protocol translates to the equivalent of ';' because the
>>>>> embedded ;\ is not a valid text terminator.  If we require that
>>>>> a text field the begins with "\n;\\" must be terminated by "\n; "
>>>>> or "\n;\n" or "\n;\t" that problem would be fixed.
>>>>>
>>>>> =====================================================
>>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>>   Dowling College, Kramer Science Center, KSC 121
>>>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>>>
>>>>>                 +1-631-244-3035
>>>>>                 [email protected]
>>>>> =====================================================
>>>>>
>>>>> On Wed, 25 Nov 2009, James Hester wrote:
>>>>>
>>>>>> I wholeheartedly agree with Nick's suggestion.
>>>>>>
>>>>>> On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini
>>>>>> <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> It appears to me that we have spent far too long on a syntactic issue
>>>>>>> which
>>>>>>> can be avoided 99.9999% of the time. Quite simply given the 5 ways to
>>>>>>> delimit strings, it is next to impossible to get a situation where
>>>>>>> you
>>>>>>> cannot choose one of those to make the problem go away.
>>>>>>>
>>>>>>> I think the RCSB systematically avoid it by choosing
>>>>>>>
>>>>>>> "ab'cd"
>>>>>>> 'ab"cd'
>>>>>>> ;ab'"cd
>>>>>>> ;
>>>>>>>
>>>>>>> But now we additionally have """ and ''' to choose from, making it
>>>>>>> even
>>>>>>> easier.
>>>>>>>
>>>>>>> So I propose in line with James' position there is NO eliding of
>>>>>>> terminator
>>>>>>> character at the CIF2 syntax level. ALL elides in the string are
>>>>>>> assumed
>>>>>>> to
>>>>>>> be user specific encoding (say TeX, IUCr \greek) which can be
>>>>>>> resolved
>>>>>>> at
>>>>>>> the dictionary level.
>>>>>>>
>>>>>>> This necessarily means NO terminator character can appear in a string
>>>>>>> delimited by the same terminator character. You will need to choose a
>>>>>>> different terminator character. That is
>>>>>>>
>>>>>>> No " in "strings"
>>>>>>> No ' in 'strings'
>>>>>>> No """ in """strings""" (but separable individual and doublet " are
>>>>>>> allowed)
>>>>>>> No ''' in '''strings''' (but separable individual and doublet ' are
>>>>>>> allowed)
>>>>>>>
>>>>>>> EVERYTHING in the string is returned as raw (except the initiating
>>>>>>> and
>>>>>>> terminating character).
>>>>>>>
>>>>>>> The only time you will not be able to encode anything in a delimited
>>>>>>> string
>>>>>>> is when you want to include ' " """ ''' and \n; in the one string.
>>>>>>> The
>>>>>>> likelihood of that is almost zero, unless you may want to include a
>>>>>>> CIF
>>>>>>> within a CIF (a silly thing to do IMHO). In that case the contents
>>>>>>> can
>>>>>>> be
>>>>>>> encoded in a dictionary driven way. I suggest it be declared as a
>>>>>>> BASE64
>>>>>>> type and then all the syntactic ambiguity disappears.
>>>>>>>
>>>>>>> Problem solved! No need to elide because of CIF2 syntax rules all
>>>>>>> elides
>>>>>>> are
>>>>>>> user driven, contents are returned raw.
>>>>>>>
>>>>>>> As for Herbs comment in a recent email what about line-folding, then
>>>>>>> the
>>>>>>> same holds. That is NOT a lexer issue and it has nothing to do with
>>>>>>> the
>>>>>>> parser, everything is read literally and returned raw and what to do
>>>>>>> with
>>>>>>> it
>>>>>>> is promulgated to the downstream application.
>>>>>>>
>>>>>>> Straw vote - No elides of terminator strings as described above -
>>>>>>> Nick
>>>>>>>
>>>>>>>
>>
>>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> ddlm-group mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)

Prev by Date: Re: [ddlm-group] Use of elides in strings

Next by Date: Re: [ddlm-group] Use of elides in strings

Prev by thread: Re: [ddlm-group] Use of elides in strings

Next by thread: Re: [ddlm-group] Use of elides in strings

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Use of elides in strings