Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings

OK, my rewritten voting proposal appears to be an abject failure.  Let
me repeat 1 as clearly as possible

1.  Should CIF2 allow elision of terminator characters?  In other
words, should we make it possible to include <quote> as a normal
character in a <quote> delimited string?

Herbert:  It's difficult to understand how to rephrase things if it is
not clear where exactly the problem lies.

Joe: good point about double backslash.  Consider this added to proposal (a).

Before we discuss (2) precisely, can we agree to use the following
abstract model and terminology for CIF2 file parsing and dictionary
application?  If not, please indicate your alternative.

1. A CIF lexer separates a CIF file into tokens according to the CIF2
syntax specification only, that is, this process cannot be altered by
DDL directives.

2. A CIF parser accepts the tokens from the lexer.  CIF parsers can be
modelled as performing at least the following actions with these
tokens:
  (i) assignment of datavalue to dataname
  (ii) grouping looped datanames into a set
  (iii) assigning looped datavalues to the appropriate dataname and packet
  (iv) editing datavalues according to the syntax specification if
this has not been performed in the lexer (e.g. stripping enclosing
quotes, removing elides)

3. DDL dictionaries operate on and refer to the datavalues and
datanames returned by the CIF parser after (2).  They have no ability
to influence the lexing process, or the parsing actions listed above
(in particular the datavalue editing).

4. The 'string value' or 'value' of a token is that value returned by
the parser in (2).  In particular, this is the value that:
  (i) may be checked against regular expressions in the dictionary;
  (ii) is accessed by dREL expressions;
  (iii) is returned by dREL expressions;
  (iv) is referred to in dictionary descriptive text;
  (v) may be passed to client routines for further editing;
  (vi) may be passed to external applications

[Side note: in other words the parser returns the CIF "infoset" and
the dictionaries refer to the CIF "infoset", but we haven't been
talking in those terms so I've been more explicit].

So my voting question (2) is: should the 'string value' of a token
referred to in (4) include the eliding characters?


On Tue, Nov 24, 2009 at 10:57 AM, Joe Krahn <krahn@niehs.nih.gov> wrote:
> A few points to consider:
>
> James Hester wrote:
> ...
>> 2. Character(s) used to indicate elision should be part of the string value
> This does not specify where the elision character should be stripped. It
> could be done by the parser or the dictionary-level code. The rule only
> refers to the final string for the final output text, right?
>
>>
>> Now for the specifics:
>>
>> 3.  Which of the following elision proposals do you support (more than one OK)?
>>
>>   Proposal (a) (intended to correspond to Nick's)
>>    (i) A character which would otherwise be interpreted as a delimiter
>> is elided by immediately preceding it with a reverse solidus.
>>   (ii) Otherwise a reverse solidus in the string has no special
>> lexical significance.
>>
>>   Proposal (b)
>>    (i) The combinations <reverse solidus><quote> or a <reverse
>> solidus><double quote> always signify <quote> and <double quote>
>> respectively, regardless of the delimiter used in a particular string.
>>    (ii) The combinations in (i) elide the <quote> or <double quote>
>> character where that character would otherwise terminate the string
>>    (iii) Apart from (i) and (ii), the reverse solidus has no special
>> significance
>>    (iv) If not used as the string delimiter, <quote> or <double quote>
>> when not preceded by <reverse solidus> represent themselves.
>
> In both forms <reverse solidus><reverse solidus> should also be defined
> in order to allow a literal string that ends in <reverse solidus>. For
> example, a single <reverse solidus> character has to be written as "\\",
> to avoid eliding the close quote.
>
> Joe Krahn
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.