Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings

Dear Herbert

Is this the same as Nick's conclusion back in THREAD3 -
basically returning all reverse solidus?
If so, seems to me this discussion is converging towards two options:

1) The reverse solidus can escape a delimiter when within a value delimited by that delimiter;
the parser will repect this and return the value including this reverse solidus as well as all other reverse solidi

2) The reverse solidus can escape a delimiter when within a value delimited by that delimiter;
the parser will respect this and remove this instance of the reverse solidus, returing the value with all other reverse solidi intact.

I prefer option 2) bacause once the parser has removed the delimiters from the value, the significance of the
escaping reverse solidus is lost (e.g. client may request the value of _foo, the parser returns AB\"C, but if the client
didnt already know that the item was written as _foo "AB\"C", it wont know that the intended value was ABC?)

Another minor question: I assume by
"do _not_ honor the special lexical meaning of that
reverse solidus"
you are referring to its use in line wrapping?

Cheers

Simon



From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Saturday, 21 November, 2009 0:32:06
Subject: Re: [ddlm-group] Use of elides in strings

Dear Colleages,

  I am in favor of the following handling of the reverse solidus
in strings quoted by the \n;, ", ',  """ and '''

  Scan the string left to right for reverse solidus.  If a reverse
solids is encountered, examine the next character. If it is a reverse
solidus, or the initial character to the terminal quote sequence
for that string, do _not_ honor the special lexical meaning of that
reverse solidus or terminal quoting sequence character, continuing
the scan until you encounter the unescaped terminal quoting sequence,
at which point you end the scan of the quoted string.

  You then take all the characters you encountered, including all
the instances of the reverse solidus, but not including the initial
quoting sequence and not including the terminal quoting sequence,
and deliver that to the calling application, which may or may not
decide to do something further with those same instances of the
reverse solidus, depending on the dictionary or other higher-level
control.

  I for one would put the special handling of reverse-solidus
intiated unicode under control of the dictionary types, but,
if there is strong feeling for dropping that down to the lexical
scanner before consulting the dictionary, I don't think it would
do any harm.

  The only serious problem cases I can see in this approach is
the current IUCr special character handling of \' in a '-quoted
string and \" in a "-quoted string, but I suspect most of the
time those are in \n;-quoted strings, so I don't think this
will be a serious problem.

  This is just my opinion of something straight forward to code and
use.  There are lots of equality useful variations on the same theme.

  Regards,
    Herbert


=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Sat, 21 Nov 2009, James Hester wrote:

> First in reply to Joe: I believe that when Nick refers to the 'reading
> and writing application' he indeed has in mind the CIF parser/CIF
> writer layer, so I would guess that he agrees with your opinion as
> well.  The issue is that we do not present an opaque storage format,
> unlike SQL or HDF; it is pretty easy to create and manipulate CIFs
> with text tools, so we need to cater to this method of interfacing to
> CIFs as well.
>
> In reply to Herbert: your suggestion implies that we abandon any
> *lexical* meaning for <elide><terminator>.  Or are you suggesting that
> an application reads the dataname, then looks up the dictionary to
> decide if it should continue to input the string when it sees
> <elide><terminator>?  So we have dictionary-driven parsing?
>
> I can't work out from your previous email whether you are now in
> support of abandoning elision as well as supporting treating all
> strings as raw.  Please clarify...
>
> On Sat, Nov 21, 2009 at 6:44 AM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>> Dear Colelagues,
>>
>>   There is a difference between what are useful utitlties to have in
>> an API in support of CIF2 and what is formally part of the base CIF2.
>> I am all in favor of utiltities to apply and unapply the various
>> uses for the reverse solidus -- one for cleaning up python-style
>> use, one to handle the IUCr special characters, one for line folding,
>> etc., but I don;t think that means we have to make one of those
>> particular uses formally part of the base CIF2.
>>
>>   Regards,
>>    Herbert
>>
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>    Dowling College, Kramer Science Center, KSC 121
>>         Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                  +1-631-244-3035
>>                  yaya@dowling.edu
>> =====================================================
>>
>> On Fri, 20 Nov 2009, Joe Krahn wrote:
>>
>>> Unlike others here, I feel that a proper text archive library should be
>>> able to take any string from the calling application, and return that
>>> exact same string when reading it back in. It is the job of the archive
>>> format to avoid delimiter problems. An applications should be able to
>>> store and retrieve strings without such worries, and interface to an SQL
>>> database the same is it would interface to CIF. All commonly used
>>> database libraries work this way. Why should CIF continue to take an
>>> archaic approach?
>>>
>>> I essentially agree with the design below, except that the library
>>> should handle insertion and removal of the reverse solidus for the
>>> limited cases where it is required.
>>>
>>> If it is the client application's responsibility to deal with reverse
>>> solidus escape sequences, then the description below doesn't make sense.
>>> In that case, the reverse solidus never has any special meaning to CIF2.
>>> Instead, CIF2 simply disallows certain character sequences. A client
>>> application can use whatever it wants to encode/decode the disallowed
>>> character sequences.
>>>
>>> The advantage of having well-defined escape sequences at the I/O library
>>> level is that updates to the format do not require updates to client
>>> applications. A CIF client application should be able to send a string
>>> to the CIF library, and not have to know in advance what CIF revision is
>>> in use, or whether the string is semicolong block quoted or triple
>>> quoted. By requiring the client to escape invalid sequences, the client
>>> will have to escape strings differently, i.e. triple quote is OK withing
>>> semi-colon quotes, and a leading semicolon is OK within triple quotes,
>>> but not the other way around.
>>>
>>> Joe Krahn
>>>
>>>
>>> Nick Spadaccini wrote:
>>>>
>>>> SUMMARISING.
>>>>
>>>> (a) The contents of delimited strings are returned as raw, with the token
>>>> delimiters removed.
>>>> (b) Where a delimiter character is to be part of the string, that character
>>>> must be preceded by a reverse solidus when written out to the file. When
>>>> read, any reverse solidus preceding a terminating character is deleted.
>>>> (c) It is the responsibility of the writing and reading application to
>>>> insert and remove the reverse solidus preceding the terminating character.
>>>> (d) Otherwise the presence of a reverse solidus in the string has no
>>>> meaning.
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.