Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Technical issues with Proposal P. .

On Tuesday, February 22, 2011 4:55 PM, James Hester wrote:

>I am trying to focus relentlessly on a particular and very real
>technical issue.  I repeat that I am not concerned about the
>transformation from surface syntax to a sequence of characters.  I
>accept that that is well-defined and unambiguous for all proposals on
>the table.  If you think that IDLE can resolve this problem, you
>haven't understood my question.
>My question relates to the next step: how does the CIF application
>downstream from the parser interpret this sequence of characters?
>Under all previous incarnations of CIF, it was safe to assume that no
>artefacts of syntactical representation were left in the string, so
>the string had purely domain-specific meaning.  However, with the
>introduction of raw strings, <backslash><delimiter> will escape the
>delimiter, but the <backslash> is required to remain in the string.

I'm good this far.

>So the downstream application must decide between artefacts of the
>syntactical representation (<backslash><delimiter>) that have remained
>in raw strings, and domain-specific character sequences

And this is where the disconnect occurs.  I hold, and I interpret Herbert and Simon also to hold, that it is incorrect to characterize the backslash in the parsed data value as an artifact: it is rather an intended member of the string's character data.  Backslashes in Python raw strings serve simultaneously as elides and character data.  If an lexical-level eliding backslash is not intended to be part of an application-level data value, then raw string syntax is not suitable for expressing that value.

This is an odd and I think confusing feature that I am not eager to add to CIF, but I don't think it creates any technical ambiguity.

>  Here those examples are again (remember
>this is the character sequence after syntactic processing):
> <start> I have no idea what the last characters of this string are\"<finish>
> <start> Does this string have two\""" or three internal quotes?<finish>
>Assume the domain-specific meaning of <backslash><quote> when found in
>a datavalue is to accent the letter preceding the <backslash>.
>Does the first string finish with a double quote, or with an accented e?

The domain-specific meaning is that it ends with an accented e.

>Does the second string contain an accented o, followed by two double
>quotes, or a letter o followed by three quotes?

The domain-specific meaning is that it contains an accented o, followed by two quotes.


John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.