Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings

Title:
I am having diffioulty keeping up with the discussion - just as everything seems to be falling into place, we are suddenly off in some new direction.

As with Joe, I am not close to the elide problem which is why I have not weighed in earlier or voted.  If James' call for a vote is still active, I vote for no elides, joining James, Nick and Herbert.

David B

James Hester wrote:
Thanks for the quick reply over Thanksgiving, John.  I take from your
message that the PDB does not need any elide mechanism to be defined
in the CIF2 syntax.  Would you therefore be prepared to vote in favour
of not defining any elides, or would you prefer to abstain?

Votes so far:

No elides: James, Nick, Herbert if the IUCr + PDB say it is OK
Elides:?

Unknown: John, Joe, David B., Brian, Simon

On Wed, Nov 25, 2009 at 12:03 PM, John Westbrook
<jwest@pdb-mail.rutgers.edu> wrote:
I confess that I am having difficulty keeping up with all aspects
of this discussion.   Following Herb's suggestion I will try to
summarize the quoting issues from the PDB perspective.

1. As there are multiple ways of quoting a string our tools and files
surround embedded quotes with quotes of the opposite sense or with
semicolons in the mixed case.   I think that this point has been
covered a number of times now and I believe that Nick has suggested
that all reasonable cases can be handled by using this approach.

2. I too was not aware that original definition of terminators
had changed and did not include either a leading or trailing
whitespace.  Certainly this must still be the case for single
and double quotes.  I cannot recall ever seeing an example
where the terminator \n; was following by a whitespace character,
but about half of the codes that I am familiar with would
fall over on \n;next_token.

3. Line folding has never been an issue for PDB nor has line length.

Regards,

John


Herbert J. Bernstein wrote:
My major concern about anything we do is to be able to preserve
the functionality of the practices that the IUCr is following in
journal publications and the PDB is following. Inasmuch as they seem
able to cope with no elide in CIF 1.1, the remaining question is whether
they will be negatively impacted by the change in string termination
without any elide.  If they can use CIF 2 with these changes, my
objections are purely academic and irrelevant.  -- Herberrt

=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 yaya@dowling.edu
=====================================================

On Wed, 25 Nov 2009, James Hester wrote:

Herbert: I have the dubious advantage of not having participated in
all those CIF1.0/1.1 discussions, so only have the spec as written
down to rely on.

Anyway, how do you feel about abandoning any specification of elides
in CIF2 syntax, as suggested by Nick?

On Wed, Nov 25, 2009 at 10:53 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
Dear James,

 I started to write:
 "No, in CIF 1.1, none of the terminal quote marks, including the \n;
are
effective unless followed by whitespace (\n, space, tab, of end of
file).
This is a well-established, and very tricky part of the CIF spec
going back
to 1990.  That is why Nick had to explicitly specify that a terminal
quote
mark would be effective no matter what it was followed by."

 But the grammer currently on the IUCr web site is _not_ the one that I
recall COMCIFs discussing and approving.  It now explcitly removes
the requirement for terminal white space in the special case of
the \n; text field terminator.  I don't recall when that change was
adopted,
but it appears that you are right under the current spec
about the example I chose.  Inasmuch as there is a lot of working code
that enforces and uses the original whitespace handling and uses it
in line-folding, I will not revise CIFtbx 3, but I will try to do
something to adapt to this change for CIFtbx 4.

 I guess we are just going to have yet another few dialects of CIF.

 Regards,
   Herbert
=====================================================
 Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
       Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Wed, 25 Nov 2009, James Hester wrote:

To be precise, we are not 'referring all elides to the application'
because no elides are recognised by the lexer under Nick's latest
suggestion, so there are no elides to refer to the application.

My understanding of CIF1.1 syntax suggests that the string you provide
would produce a syntax error in CIF1.1, as the semicolon at the start
of the second line would terminate the string, and so whitespace
should then appear as the second character on the second line, rather
than reverse solidus.

On Wed, Nov 25, 2009 at 9:23 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
The only problem with referring all elisdes to the application is that
with the removal of the requirement of a blank after a \n; for it
to be
effective, the line folding protocol develops a slight gap.  The
case is as follows

;\
;\
;

Is a valid single text field in CIF 1.1, which when handled with the
line folding protocol translates to the equivalent of ';' because the
embedded ;\ is not a valid text terminator.  If we require that
a text field the begins with "\n;\\" must be terminated by "\n; "
or "\n;\n" or "\n;\t" that problem would be fixed.

=====================================================
 Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
       Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Wed, 25 Nov 2009, James Hester wrote:

I wholeheartedly agree with Nick's suggestion.

On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini
<nick@csse.uwa.edu.au>
wrote:
It appears to me that we have spent far too long on a syntactic
issue
which
can be avoided 99.9999% of the time. Quite simply given the 5
ways to
delimit strings, it is next to impossible to get a situation
where you
cannot choose one of those to make the problem go away.

I think the RCSB systematically avoid it by choosing

"ab'cd"
'ab"cd'
;ab'"cd
;

But now we additionally have """ and ''' to choose from, making
it even
easier.

So I propose in line with James' position there is NO eliding of
terminator
character at the CIF2 syntax level. ALL elides in the string are
assumed
to
be user specific encoding (say TeX, IUCr \greek) which can be
resolved
at
the dictionary level.

This necessarily means NO terminator character can appear in a
string
delimited by the same terminator character. You will need to
choose a
different terminator character. That is

No " in "strings"
No ' in 'strings'
No """ in """strings""" (but separable individual and doublet " are
allowed)
No ''' in '''strings''' (but separable individual and doublet ' are
allowed)

EVERYTHING in the string is returned as raw (except the
initiating and
terminating character).

The only time you will not be able to encode anything in a delimited
string
is when you want to include ' " """ ''' and \n; in the one
string. The
likelihood of that is almost zero, unless you may want to include
a CIF
within a CIF (a silly thing to do IMHO). In that case the
contents can
be
encoded in a dictionary driven way. I suggest it be declared as a
BASE64
type and then all the syntactic ambiguity disappears.

Problem solved! No need to elide because of CIF2 syntax rules all
elides
are
user driven, contents are returned raw.

As for Herbs comment in a recent email what about line-folding, then
the
same holds. That is NOT a lexer issue and it has nothing to do
with the
parser, everything is read literally and returned raw and what to do
with
it
is promulgated to the downstream application.

Straw vote - No elides of terminator strings as described above -
Nick



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

------------------------------------------------------------------------

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group



  


begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.