[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D

I have been quiet on this issue as my bias for supporting Python semantics
has not been popular or productive in prior DDLm/Cif2 discussions.   I would
extend Herb's argument to the whole of this enterprise and emphasize
my view that meaningful adoption of DDLm/CIF2 will require embracing
and leveraging existing technologies as much as possible.


On 1/7/11 7:52 AM, Herbert J. Bernstein wrote:
> As noted in my prior message, I disagree. I find it
> counter-inutitive and unproductive to adopt something
> that looks very much like the python treble quoted
> string but which follows confusingly different rules.
> Remeber -- for most of the the coomunity, the entire
> CIF2 approach to quoting is something new and different.
> It does not agree with the well-established CIF1 quoting
> rules. By giving them the python treble quoted strings
> we are giving them a way to simply and easily carry any
> and all strings and text fields forward from CIF1 to CIF2
> without having to seriously rework them. Sure, we could
> come up with some other set of rules for treble quoted
> strings, but by following the python rules we will
> greatly reduce the chances of misinterpretations in
> the marginal cases, and give ourselves an independent
> check on our new parsers -- all the existing oython
> parsers.
> I believe that Ralf is right.
> Regards,
> Herbert
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
> +1-631-244-3035
> yaya@dowling.edu
> =====================================================
> On Fri, 7 Jan 2011, SIMON WESTRIP wrote:
>> Dear All
>> My initial reaction to the adoption of the python mechanism for
>> tripple-quoted strings
>> was that it is counter-intuitive in a CIF context - i.e. you might expect
>> the base syntax of
>> ''' and """ delimiited strings to be the same as that of the other delimeted
>> strings, which in
>> CIF1 and the proposed CIF2 is closer to python's 'raw' strings.
>> However, I am in favour of revisiting the issue to address the restrictions
>> of the current set of
>> delimiters, and believe that there may indeed be an answer amoungst James's
>> proposals, which
>> could be agreed upon quite swiftly, both respecting the lagacy of CIF1 and
>> rectifying its shortcomings in
>> this respect.
>> I will follow up on this when I have considered James's proposals in more
>> detail.
>> I'd rather the group spent a little more time on this than just 'dumping' a
>> bit of python syntax into CIF.
>> Cheers
>> Simon
>> ____________________________________________________________________________
>> From: James Hester <jamesrhester@gmail.com>
>> To: ddlm-group <ddlm-group@iucr.org>
>> Sent: Friday, 7 January, 2011 4:46:10
>> Subject: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D
>> Dear DDLm group members,
>> Most of you will be aware that the CIF2 standard has been approved by
>> COMCIFS, with one dissenting vote.  I propose to revisit the point
>> raised by Ralf in his dissenting vote, in order to see if we can't
>> improve this aspect of the standard.  The particular problem
>> identified by Ralf, and this problem exists to a more limited extent
>> with CIF1 as well, is that there is no mechanism to elide instances of
>> the string delimiter sequence, meaning that certain pathological
>> strings cannot be included in a CIF2 file.  A further issue is that
>> CIF writing programs have to run through a long series of checks when
>> determining how to delimit any given string. I propose that we revisit
>> this problem, with the restriction proposed by Ralf that we consider
>> only triple quote/triple apostrophe delimited strings.
>> To get us back up to speed on this issue, you will recall some salient
>> points from previous discussions, which taken together led to our
>> failure to make any progress:
>> (1) CIF files are often edited in text editors.  Working with CIF text
>> in a text editor should not produce unexpected behaviour for a typical
>> workflow.
>> (2) CIF text may include LaTeX or other marked-up text, which will be
>> cumbersome to insert in the file if it contains many instances of
>> elide characters (see point (1))
>> (3) IUCr "markup" for Greek letters uses backslash to introduce the
>> special character combination
>> (4) Any characters that function as elides must be removed from the
>> string at parse time to avoid ambiguity in interpretation when
>> returned to the calling application
>> If we limit ourselves to triple quote/apostrophe delimited strings, as
>> Ralf proposes, then we can construct an elide scheme that is invisible
>> to the lexer, by simply breaking the trigraph appropriately.  I
>> propose the following general scheme, where <delimiter> refers to one
>> delimiter character, so the full string delimiter would be
>> <delimiter><delimiter><delimiter>:
>> Proposal C:
>> When reconstructing the datavalue from an input triple-<delimiter>
>> delimited string, the following simple transformation is performed:
>> all occurrences of <delimiter><elide> are replaced by <delimiter>.
>> My comments on this scheme are as follows:
>> (0) When preparing a string for output, any occurrences of
>> <delimiter><elide> *must* be replaced by <delimiter><elide><elide>;
>> <delimiter> only needs to be elided when necessary to break up triple
>> <delimiter> sequences in the source string, and when the final
>> character of a string is <delimiter>
>> (1) It is invisible to the lexer, which will correctly find the string
>> terminator characters without knowledge of the <elide> character used.
>> (2) With appropriate choice of <elide>, there is a low likelihood of
>> ever encountering a string where transformation needs to be performed,
>> which means transforming the string is necessary only where three or
>> more delimiter characters are present in a row, or the string
>> concludes with a delimiter character.
>> (3) The <elide> is a post-elide, by which I mean it elides the
>> preceding character, not the next character.  This is preferable to
>> cover the case of an input string finishing with the <delimiter>
>> character, in which case some non-<delimiter> character must appear
>> after it to ensure the lexer does not consider the final <delimiter>
>> character in the string as the first character of the terminating
>> <delimiter><delimiter><delimiter> sequence.
>> Finally, consider a general proposal D:
>> Elided triple-<delimiter> strings are delimited by
>> <char><delimiter><delimiter><delimiter>...<delimiter><delimiter><delimiter>
>> .
>> The initial <char> defines the character to use to post-elide the
>> contents of the string as per proposal C. <char> would initially be
>> any non-alphanumeric ASCII character, with the set expanded in the
>> future to include Unicode characters once most applications were
>> Unicode-aware.
>> Examples (LHS is string as written in CIF file, RHS is actual
>> datavalue inside angle brackets)
>> &""" Bleg blah blah ""&"  and so forth "&""" <
>> Bleg blah blah """ and so forth">
>>       $'''''$' AAABBB ''$' CCCDDD '$'''
>> <''' AAABBB ''' CCCDDD '>
>> This allows the string writer to choose the elide character to
>> minimise <delimiter><elide> occurrences in the source text.  Note that
>> the need to choose and prepend a character to the string minimizes the
>> likelihood that somebody will do a naive cut and paste.
>> An even more general proposal would prepend a character to the string
>> to indicate pre-elide (as per Proposal A in a separate email) or
>> append a character to indicate post-elide.  I don't propose to
>> consider this.
>> Again, please indicate your views on including any of these proposals
>> in the CIF standard.
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

   John Westbrook, Ph.D.
   Rutgers, The State University of New Jersey
   Department of Chemistry and Chemical Biology
   610 Taylor Road
   Piscataway, NJ 08854-8087
   e-mail: jwest@rcsb.rutgers.edu
   Ph:  (732) 445-4290  Fax: (732) 445-4320
ddlm-group mailing list

Reply to: [list | sender only]