Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Elide close quotes by doubling?

Nick Spadaccini wrote:
> 
> 
> On 5/12/09 5:59 AM, "Joe Krahn" <krahn@niehs.nih.gov> wrote:
> 
>> OK, I agree that this can be revisited later. If CIF1-embedded quotes
>> are invalid, then adding a repeat-quote elide will not interfere with
>> any other syntax, so it does not have to be a concern in the initial
>> CIF2 spec.
> 
> Actually it will interfere lexically in CIF2. You are suggesting "" to be
> the same as an embedded ", however the first subsequent " after the
> intialising ", terminates the delimited string. Hence "" won't work.
No; CIF2 requires white space after the 2nd instance of the quote 
character, so an embedded "" is just a syntax error. (Unless the 
requirement for white space after a close quote has been dropped??)

> 
> But on a wider issue you are simply revisiting the elide discussion of last
> week(s). There is no great difference between a "" vs \", the reason we
> couldn't ultimately agree is because of our inability to separate user
> defined elides and CIF2 generated lexical elides. A CIF2 syntax issue that
> is not a user issue should be handled by the parser for the user. Others see
> it as a user issue for them to insert and delete "lexical" elides.
> 
> James and I see the latter approach as necessarily ambiguous and gave
> several examples of how it would be ambiguous.
Using \" is more complicated than "", because it makes escaping of \ 
characters necessary. Do you only escape \ in a literal \", or 
everywhere? What happens with \ in line folding? Using "" avoids these 
issues.

In either case, the serious ambiguities come from the method where "the 
unescaped string should be passed to the application". That complication 
is because some people want to handle escaping at the level of the 
dictionary code. If some people want to do that, their dictionary-level 
code should be written such that the end behavior to the API client code 
is such that it does not matter, and behaves as if it had been done by 
the parser. In other words, the dictionary code can process the escapes, 
but cannot override the hard-wired CIF2 definitions for a given 
quotation type. Or, they can, but it will not be standard-conforming CIF2.

Thanks,
Joe

> 
>>From there we decided that in the end it would be much easier if users used
> alternative delimiters.
> 
>> As for Simon's commen on avoiding the need for an elide, maybe he is
>> including the possibility of containing anything within semicolon quotes
>> if it is indented by at least one space to elide internal semicolon
>> quotes. I would rather see a formal elide mechanism, but that is a good
>> workaround.
>>
>> Joe
>>
>>
>> Herbert J. Bernstein wrote:
>>> Just for the record, we did _not_ agree " that there were enough
>>> alternative data value delimiters to avoid the use of any eliding
>>> mechanism,". What we agrees to was to stop arguing about how to
>>> use a reverse solidus and to take all strings as is.  I am certain we
>>> will eventually need some mechanisms to:
>>>
>>>    deal with long lines
>>>    deal with quoting of arbitrary text
>>>
>>> But neither issue is worth holding up the use of methods as quickly
>>> as possible.  We need to get something out that will allow dictionaries
>>> to get written using methods and out into use.  The current CIF 2
>>> specification is adequate to allow dictionaries to get written, and
>>> to deal with a large subset of what is needed in data files.  I hope
>>> we will continue this discussion _after_ getting CIF 2 with DDLm
>>> out and in use to see what is appropriate to extend the  useful range
>>> of data files.
>>>
>>> =====================================================
>>>   Herbert J. Bernstein, Professor of Computer Science
>>>     Dowling College, Kramer Science Center, KSC 121
>>>          Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                   +1-631-244-3035
>>>                   yaya@dowling.edu
>>> =====================================================
>>>
>>> On Fri, 4 Dec 2009, SIMON WESTRIP wrote:
>>>
>>>> Judging by the difficulties we had that eventually led to agreeing
>>>> that there were enough alternative data value delimiters to avoid
>>>> the use of any eliding mechanism (and thus returning all values as raw),
>>>> I suspect that arguing for a different eliding mechanism will also be
>>>> fruitless?
>>>>
>>>> I understand your view (Joe) about CSV, but we have to respect the
>>>> legacy that is CIF, which is why we have a variety of delimiters.
>>>> Otherwise, it could probably be argued that only one type of delimiter is
>>>> necessary
>>>> (say """")...
>>>>
>>>> Cheers
>>>> alt
>>>> Simon
>>>>
>>>>
>>>> ____________________________________________________________________________
>>>> From: Joe Krahn <krahn@niehs.nih.gov>
>>>> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
>>>> Sent: Friday, 4 December, 2009 18:03:03
>>>> Subject: [ddlm-group] Elide close quotes by doubling?
>>>>
>>>> The reverse solidus (aka backslash) elide was dropped because it really
>>>> does not work well to elide only the close quote. Now that close quotes
>>>> are invalid when not followed by white space, it provides the
>>>> opportunity to elide close quotes by a repeated close-quote sequence,
>>>> similar to Fortran and CSV format. It is free of most of the
>>>> repercussions of defining reverse-solidus as an escape character, and is
>>>> only making use of a character sequence that would otherwise just be a
>>>> syntax error.
>>>>
>>>> The caveat is that it could misinterpret valid CIF1 values. However, at
>>>> least RCSB has done a good job of avoiding embedded quotes by picking
>>>> alternate quoting types.
>>>>
>>>> There are workarounds for embedded quotes, even for CIF-within-CIF, so
>>>> elides are not essential. However, I think this should be easy to
>>>> implement, and free of the hassles generated by backslash escapes.
>>>>
>>>> Thanks,
>>>> Joe Krahn
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>
>>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> cheers
> 
> Nick
> 
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
> 
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
> 
> CRICOS Provider Code: 00126G
> 
> e: Nick.Spadaccini@uwa.edu.au
> 
> 
> 
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.