Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Space as a list item separator

Title:
Dear all

One point I read in David's comments is that there are no legacy issues with respect to lists, associative arrays etc.
Does anyone disagree? Obviously it makes life easier when considering lists etc if the 'legacy' word doesnt rear its head.

From: David Brown <idbrown@mcmaster.ca>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Monday, 30 November, 2009 19:56:30
Subject: Re: [ddlm-group] Space as a list item separator

Pleasse forgive me, everyone, but what is all this CIF1.5 about? 

Why do we need it?

If a DDLm application is presented with with a CIF data file written using a DDL1 or DDL2 dictionary, which I assume uses CIF1.1 syntax, why can't we continue to use CIF1.1 since this works just fine for these files?  Why do we need CI1.5?

CIF data files written using DDL1 and DDL2 dictionaries do not contain lists and arrays because lists and arrays were not invented when these files were written, and any data files written with these dictionaries in the future (and there may be many of them) will still use the CIF1.1 syntax.  There is no danger of arrays slipping into these data files unnoticed because they are not defined (and never will be) in DDL1 and DDL2 dictionaries (CIF1.1 does not allow it.)

Of course our DDLm application (if we ever get it off the ground) will need to be able to read data files written with CIF1.1 syntax because we are required to ensure that this application can read in any existing CIF data file.  It will also need to be able to read files written in CIF2 syntax because CIF2 will be needed for reading in the DDLm dictionaries (the only dictionaries that contain dREL) and the CIF2 data files (which may, unlike the CIF1.1 data files, also contain arrays and lists).

As I pointed out earlier (and it seems to have come as something of a shock or epiphany to some), the DDLm dictionaries include very nice lists of aliases that contain every data name that was ever used for a given item.  The data names in this alias list are, of course, quoted data values within the DDLm dictionary. and some contain characters that CIF2 would not recognize in a data name, but that is fine because they appear only in data values, and quoted data values no less,

When confronted with a datafile written in CIF1.1, our hypothetical application would switch on its CIF1.1 lexer to read in the CIF1 data file, and pass the results into a preparser which would match the data name in the CIF1.1 data file with an alias name in the DDLm dictionary, and immediately substitute the DDLm data name for the original DDL1 or DDL2 data mame.  Now all the problem with the old data names has disappeared.  The preparser might have to make other changes to the data value (I am not sure that there are any, perhaps adding delimiters to all strings so they could be stripped away by the parser?).  At this point you have a fully compliant CIF2-DDLm data set, which you can dREL to your heart's content.  In particular, if dREL calls for an array, the item associated with that array will contain a dREL mothod for assembling the array from the individual data items that were originally stored in the input CIF and are now stored under a DDLm defined name.  The only thing that would be difficult to do would be to reconstruct a DDL1 or DDL2 compliant data output file, but even this could be done if it was thought necessary.

Please let's not make this exercise more confusing than necessary. 

You guys need to get on with defining what you want in CIF2.  CIF1 can then look after itself using the existing tools together with the aliases for renaming the items.

David

Herbert J. Bernstein wrote:
Dear Colleagues,

  Instead of looking at the minimally disruptive approach as a modification to CIF 2, in order to in fact be minimally disruptive, I would suggest looking at CIF 1.5 in terms if what would need to be changed in CIF 1.1 in order to support DDLm.

  I think the following will do it:

  For data values, only, recognize three new initial string delimiters in addition to the existing single quote ("'"), double quote ("\"") and newline-semicolon ("\n;"):

  left brace ("{")
  left square bracket ("[")

Unless these are encountered in a left to right scan at a point at which the first character if a data value is expected, the parse remains the same as for CIF 1.1.

Once the left brace or left square bracket is encountered, then whatever the formally agreed rules for the CIF2 parse are would apply until the balancing terminal right brace or right square bracket.  It is only the top level terminal right brace or right square bracket that would be required to be followed by whitespace.

The new dictionaries would _not_ be written in CIF 1.5, only in full CIF 2, but parsers would be expected to process any CIF not clearly self-identifying as a CIF 2 file as a CIF 1.5 file.  This means that the only major use of CIF 2 constructs in CIF 1.5 would be to allow users to provide list, matrix and vector data values.

This also means, for example, as per David's suggestion, that the only way a tag with embedded square brackets or embedded braces would be handled in a new dictionary would be as an alias, but the formality of CIF 1.5 would give applications a clean way to make use of those aliases in parsing data files.

If we follow this approach, then we would be honoring the published commitment to be able to keep essentially all exsiting data files unchanged, and still be able to handle them with DDLm.  The only exception would be data files that happen to include data values that begin with '{' or '[', which would now have to be quoted. I do not believe that there are many such cases, and I believe that there would be acceptance of the need to add such quoting if encountered.

To summarize:

  Development of CIF 2 with DDLm support would continue and be used for
new dictionaires; and

  Development of CIF 1.5 to serve as a bridge between CIF 1.1 and DDLm would start, primarily giving uses the ability to provide list, matrix and vector data values, would be started to allow for a smooth transition to wider use of DDLm and CIF 2

Regards,
  Herbert


=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 yaya@dowling.edu
=====================================================

On Sun, 29 Nov 2009, SIMON WESTRIP wrote:

Yes that summarizes the differences. Unfortunately, the single-byte
non-delimited strings have to be separated by
white space in this approach, which is perhaps counter-intuitive and mght
have some legacy issues?

____________________________________________________________________________
From: James Hester <jamesrhester@gmail.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Sunday, 29 November, 2009 3:45:18
Subject: Re: [ddlm-group] Space as a list item separator

Hi Simon: I'm trying to read between the lines here as to how the syntax we
have been discussing diverges from what you have described, and have come up
with the following list:

1. Presumably the []{} characters must be surrounded by whitespace in your
version
2. We have restricted the character sets of the non-delimited strings and
tags more than strictly necessary.
3. Comma might be included in the single-byte non-delimited string list

Are there any other differences that you would identify?

On Sat, Nov 28, 2009 at 10:58 PM, SIMON WESTRIP
<simonwestrip@btinternet.com> wrote:
      Dear all

      I was chatting with the man who 'writes the cheques' yesterday
      about some of the
      changes he might expect with CIF2, and based on this I feel I
      ought to at least have
      a go at exploring a 'minimally disruptive' approach, so at the
      risk of being shouted at,
      here goes at a slightly different way of looking at CIF:

      CIF contains a list of strings separated by whitespace.

      A string can be nondelimited or delimited.

      Nondelimited strings have a restricted character set (minimally
      whitespace is excluded)

      A nondelimited string cannot start with any of the delimiters
      (obviously)

      Nondelimited strings can have special meaning governing what
      follows them:

          reserved words, e.g. loop_

          tags, e.g. data_ , _foo

          single-byte nondelimited strings, e.g. [ ] { } :

      All other strings are treated as raw data values


      There, least I can say I tried :-)

      Cheers

      Simon

____________________________________________________________________________
From: SIMON WESTRIP <simonwestrip@btinternet.com>
To: Group finalising DDLm and associated dictionaries
<ddlm-group@iucr.org>
Sent: Saturday, 28 November, 2009 10:01:38

Subject: Re: [ddlm-group] Space as a list item separator

I had been under the assumption that the separation of list items by a
comma was 'set in stone'
(and was one reason for dropping the CIF1 syntax of requiring space
after data values),
but if its up for negotiation I would opt for using the space as a
separator as elsewhere in the CIF,
partly because then a list can essentially be treated much like a
single-item loop - i.e. same basic parsing
of <value><space><value><space>...

Cheers

Simon

____________________________________________________________________________
From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group finalising DDLm and associated dictionaries
<ddlm-group@iucr.org>
Cc: Nick.Spadaccini@uwa.edu.au
Sent: Friday, 27 November, 2009 11:43:10
Subject: Re: [ddlm-group] Space as a list item separator

Dear Colleagues,

  I have no objection to accepting either comma or whitespace
as a valid separator in a list.  I can't object -- I have been
coding to that standard since 1997, and now would only have to
remove the message generated for the case of the space.  We already
accept multiple glyphs as valid separators at all levels:

  whitespace itself it one of several character sequences in rather
complex combinations:  any number of blanks, tabs, newlines and
comments.
The comma itself is handled in a complex way.  We accept (or should
accept) any whitespace before and after a comma as valid, as in
{a,b} versus {a , b }.  Adding the option of leaving out the comma
itself and just having the whitespace as the separator make just
as much sense.

  I see nothing to be gained by now forbidding the comma.  The meaning
of {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or, under this new
(and I think more sensibsle and realistic approach) {a . b .} or {a ?
b ?}.

  The blank reads particularly well in dealing with vectors and
matrices. The comma reads well when dealing with strings.

  I think we would do best with both as valid alternatives (no error,
no warning for either one).

  Regards,
    Herbert =====================================================
Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Fri, 27 Nov 2009, SIMON WESTRIP wrote:

> At first glance, you're considering using space instead of commas as
list
> separators?
> which is not so far away from the CIF1 requirement of space
following a
> delimiter?
>
> But I'm only on my first cup of coffee this morning :-)
>
>___________________________________________________________________________
_
> From: Nick Spadaccini <nick@csse.uwa.edu.au>
> To: Group finalising DDLm and associated dictionaries
<ddlm-group@iucr.org>
> Sent: Friday, 27 November, 2009 7:46:44
> Subject: Re: [ddlm-group] Space as a list item separator
>
>
>
>
> On 27/11/09 2:32 PM, "James Hester" <jamesrhester@gmail.com> wrote:
>
> > See comments below:
> >
> > On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini
<nick@csse.uwa.edu.au>
> wrote:
> >> Timely email, come in just after the one I sent.
> >>
> >> My position is if we specify the syntax then we encourage its
correct use
> but
> >> acknowledge that there may be cases where one might be able to
recover
> >> intent. But I wouldn?t encourage those cases.
> >
> > Absolutely, which is why I would like to elevate space-separated
list
> items to
> > be correct syntax rather than 'wrong but intent is clear' syntax.
> >>
> >> You could say that token separator in lists are a or b or c, but
that
> just
> >> adds a level of complexity for very little gain. The choice of
comma
> makes it
> >> seamless to translate from the raw CIF data straight in to most
language
> >> specific data declaration. The only language I know that accepts
one or
> the
> >> other or both is MatLab.
> >
> > Re ease of translation: you speak as if a viable approach to a CIF
data
> file
> > is to take whole text chunks and throw them at some language
interpreter,
> > without doing your own parse.  Quite apart from being a rather
unlikely
> > approach, this is impossible, as without parsing you won't know
where the
> list
> > finishes.  If you do do your own parse, you can populate your
> datastructures
> > directly during the parse, and what list separator was originally
used in
> the
> > data file is completely irrelevant.
> >
> > Re complexity: not sure how you are planning to deal with
whitespace in
> the
> > formal grammar, but consider the following, where I have assumed
that each
> > token 'eats up' the following whitespace.
> >
> > <dataitem> = <dataname><whitespace>+<datavalue>
> > <datavalue> = {<list>|<string>}<whitespace>+
> > <listdatavalue> = {<list>|<string>}<whitespace>*
> > <list> = '[' <whitespace>* {<listdatavalue>
> > {<comma><whitespace>*<listdatavalue>}*}* ']'
> >
> > If we make comma or whitespace possible separators, the last
production
> > becomes:
> > <list> =  '[' <whitespace>* {<listdatavalue> {<comma or
> > whitespace><listdatavalue>}*}* ']'
> >
> > This looks like no extra complexity, and from a user's point of
view
> > whitespace as an alternative separator is simple to understand and
> consistent
> > with space as a token separator used everywhere else in CIF. 
Anyway, if
> > reduction of grammar complexity is your goal, you can just
completely
> exclude
> > commas as list separators!
>
> Why not? Make them spaces only, and you become consistent across the
board.
> I have to think about the possibility of pathological cases where
spaces
> won't work. I can't think of any at the moment.
>
> >
> > Some questions about how commas behave:
> > 1: is a trailing comma e.g. [1,2,3,4,] a syntax error?
> > 2. are two commas in a row a syntax error? E.g. [1,2,3,,4]
>
> I would say yes to syntax error. I an easily determine they may need
to be
> an additional list value, but can't determine what.
>
> > Note the above productions assume that the answer to both is yes.
> >
> >>
> >> What big advantage to a language is there to specify you can use
a comma
> or
> >> whitespace as a token separator? Will you be happy with the first
person
> who
> >> interprets this as being ok
> >>
> >> loop_
> >>   _severalvalues 1,2,3,4,5,6,7 # these being the 7 values of
> severalvalues
> >>
> > Note sure what you are getting at here: I am proposing the
following:
> >
> > _nicelist      [1 2 3 4 5 6 7]
> >
> > being the same as
> >
> > _nicelist      [1,2,3,4,5,6,7]
> >
> >  Don't see how this relates to loops.
>
> The point was, once you say a space and comma are equivalent token
> separators then will it be an interpretation that they are always so
even in
> loops? My example was not a list, just 7 values that were separated
by
> commas not spaces.
>
> >
> > James.
> > ------
> >>
> >> On 27/11/09 11:41 AM, "James Hester" <jamesrhester@gmail.com
> >> <http://jamesrhester@gmail.com> > wrote:
> >>
> >>> Dear All: looking over the list I posted previously of items
left to
> >>> resolve, I see only one serious one outstanding: whether or not
to allow
> >>> space as a separator between list items.  Nick has stated:
> >>>
> >>> " I will propose it has to be a comma, but make the coercion
rule that
> space
> >>> separated values in a list-type object be coerced into comma
separated
> >>> values. That is, read spaces as you want, but don't encourage
them."
> >>>
> >>> I would like to counter-propose, as Joe did originally, that
whitespace
> be
> >>> elevated to equal status with comma as a valid list separator. 
I see no
> >>> downside to this.  Would anyone else like to speak to this issue
before
> we
> >>> vote?  In particular, I would be interested to hear why Nick
doesn't
> want to
> >>> encourage spaces.
> >>
> >> cheers
> >>
> >> Nick
> >>
> >> --------------------------------
> >> Associate Professor N. Spadaccini, PhD
> >> School of Computer Science & Software Engineering
> >>
> >> The University of Western Australia    t: +61 (0)8 6488 3452
> >> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> >> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3:
www.csse.uwa.edu.au/~nick
> >> <http://www.csse.uwa.edu.au/%7Enick>
> >> MBDP  M002
> >>
> >> CRICOS Provider Code: 00126G
> >>
> >> e: Nick.Spadaccini@uwa.edu.au <http://Nick.Spadaccini@uwa.edu.au>
> >>
> >>
> >>
> >> _______________________________________________
> >> ddlm-group mailing list
> >> ddlm-group@iucr.org
> >> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>
> >
> >
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA  w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
>
> CRICOS Provider Code: 00126G
>
> e: Nick.Spadaccini@uwa.edu.au
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group




--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148




_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.