Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Space as a list item separator

I think this is certainly worth exploring. Part of the description I gave of a 'CIF'
in terms of strings and separators came from 'day dreaming' about a
'self-defining CIF format' - obviously an impossible dream - but if we could come up with
something that allowed an almost seemless progression from one format to the next...


From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Sunday, 29 November, 2009 14:35:51
Subject: Re: [ddlm-group] Space as a list item separator

Dear Colleagues,

  Instead of looking at the minimally disruptive approach as a modification to CIF 2, in order to in fact be minimally disruptive, I would suggest looking at CIF 1.5 in terms if what would need to be changed in CIF 1.1 in order to support DDLm.

  I think the following will do it:

  For data values, only, recognize three new initial string delimiters in addition to the existing single quote ("'"), double quote ("\"") and newline-semicolon ("\n;"):

  left brace ("{")
  left square bracket ("[")

Unless these are encountered in a left to right scan at a point at which the first character if a data value is expected, the parse remains the same as for CIF 1.1.

Once the left brace or left square bracket is encountered, then whatever the formally agreed rules for the CIF2 parse are would apply until the balancing terminal right brace or right square bracket.  It is only the top level terminal right brace or right square bracket that would be required to be followed by whitespace.

The new dictionaries would _not_ be written in CIF 1.5, only in full CIF 2, but parsers would be expected to process any CIF not clearly self-identifying as a CIF 2 file as a CIF 1.5 file.  This means that the only major use of CIF 2 constructs in CIF 1.5 would be to allow users to provide list, matrix and vector data values.

This also means, for example, as per David's suggestion, that the only way a tag with embedded square brackets or embedded braces would be handled in a new dictionary would be as an alias, but the formality of CIF 1.5 would give applications a clean way to make use of those aliases in parsing data files.

If we follow this approach, then we would be honoring the published commitment to be able to keep essentially all exsiting data files unchanged, and still be able to handle them with DDLm.  The only exception would be data files that happen to include data values that begin with '{' or '[', which would now have to be quoted. I do not believe that there are many such cases, and I believe that there would be acceptance of the need to add such quoting if encountered.

To summarize:

  Development of CIF 2 with DDLm support would continue and be used for
new dictionaires; and

  Development of CIF 1.5 to serve as a bridge between CIF 1.1 and DDLm would start, primarily giving uses the ability to provide list, matrix and vector data values, would be started to allow for a smooth transition to wider use of DDLm and CIF 2

Regards,
  Herbert


=====================================================
Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Sun, 29 Nov 2009, SIMON WESTRIP wrote:

> Yes that summarizes the differences. Unfortunately, the single-byte
> non-delimited strings have to be separated by
> white space in this approach, which is perhaps counter-intuitive and mght
> have some legacy issues?
>
> ____________________________________________________________________________
> From: James Hester <jamesrhester@gmail.com>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Sunday, 29 November, 2009 3:45:18
> Subject: Re: [ddlm-group] Space as a list item separator
>
> Hi Simon: I'm trying to read between the lines here as to how the syntax we
> have been discussing diverges from what you have described, and have come up
> with the following list:
>
> 1. Presumably the []{} characters must be surrounded by whitespace in your
> version
> 2. We have restricted the character sets of the non-delimited strings and
> tags more than strictly necessary.
> 3. Comma might be included in the single-byte non-delimited string list
>
> Are there any other differences that you would identify?
>
> On Sat, Nov 28, 2009 at 10:58 PM, SIMON WESTRIP
> <simonwestrip@btinternet.com> wrote:
>      Dear all
>
>      I was chatting with the man who 'writes the cheques' yesterday
>      about some of the
>      changes he might expect with CIF2, and based on this I feel I
>      ought to at least have
>      a go at exploring a 'minimally disruptive' approach, so at the
>      risk of being shouted at,
>      here goes at a slightly different way of looking at CIF:
>
>      CIF contains a list of strings separated by whitespace.
>
>      A string can be nondelimited or delimited.
>
>      Nondelimited strings have a restricted character set (minimally
>      whitespace is excluded)
>
>      A nondelimited string cannot start with any of the delimiters
>      (obviously)
>
>      Nondelimited strings can have special meaning governing what
>      follows them:
>
>          reserved words, e.g. loop_
>
>          tags, e.g. data_ , _foo
>
>          single-byte nondelimited strings, e.g. [ ] { } :
>
>      All other strings are treated as raw data values
>
>
>      There, least I can say I tried :-)
>
>      Cheers
>
>      Simon
>
> ____________________________________________________________________________
> From: SIMON WESTRIP <simonwestrip@btinternet.com>
> To: Group finalising DDLm and associated dictionaries
> <ddlm-group@iucr.org>
> Sent: Saturday, 28 November, 2009 10:01:38
>
> Subject: Re: [ddlm-group] Space as a list item separator
>
> I had been under the assumption that the separation of list items by a
> comma was 'set in stone'
> (and was one reason for dropping the CIF1 syntax of requiring space
> after data values),
> but if its up for negotiation I would opt for using the space as a
> separator as elsewhere in the CIF,
> partly because then a list can essentially be treated much like a
> single-item loop - i.e. same basic parsing
> of <value><space><value><space>...
>
> Cheers
>
> Simon
>
> ____________________________________________________________________________
> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> To: Group finalising DDLm and associated dictionaries
> <ddlm-group@iucr.org>
> Cc: Nick.Spadaccini@uwa.edu.au
> Sent: Friday, 27 November, 2009 11:43:10
> Subject: Re: [ddlm-group] Space as a list item separator
>
> Dear Colleagues,
>
>   I have no objection to accepting either comma or whitespace
> as a valid separator in a list.  I can't object -- I have been
> coding to that standard since 1997, and now would only have to
> remove the message generated for the case of the space.  We already
> accept multiple glyphs as valid separators at all levels:
>
>   whitespace itself it one of several character sequences in rather
> complex combinations:  any number of blanks, tabs, newlines and
> comments.
> The comma itself is handled in a complex way.  We accept (or should
> accept) any whitespace before and after a comma as valid, as in
> {a,b} versus {a , b }.  Adding the option of leaving out the comma
> itself and just having the whitespace as the separator make just
> as much sense.
>
>   I see nothing to be gained by now forbidding the comma.  The meaning
> of {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or, under this new
> (and I think more sensibsle and realistic approach) {a . b .} or {a ?
> b ?}.
>
>   The blank reads particularly well in dealing with vectors and
> matrices. The comma reads well when dealing with strings.
>
>   I think we would do best with both as valid alternatives (no error,
> no warning for either one).
>
>   Regards,
>     Herbert =====================================================
> Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
>
> On Fri, 27 Nov 2009, SIMON WESTRIP wrote:
>
> > At first glance, you're considering using space instead of commas as
> list
> > separators?
> > which is not so far away from the CIF1 requirement of space
> following a
> > delimiter?
> >
> > But I'm only on my first cup of coffee this morning :-)
> >
> >___________________________________________________________________________
> _
> > From: Nick Spadaccini <nick@csse.uwa.edu.au>
> > To: Group finalising DDLm and associated dictionaries
> <ddlm-group@iucr.org>
> > Sent: Friday, 27 November, 2009 7:46:44
> > Subject: Re: [ddlm-group] Space as a list item separator
> >
> >
> >
> >
> > On 27/11/09 2:32 PM, "James Hester" <jamesrhester@gmail.com> wrote:
> >
> > > See comments below:
> > >
> > > On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini
> <nick@csse.uwa.edu.au>
> > wrote:
> > >> Timely email, come in just after the one I sent.
> > >>
> > >> My position is if we specify the syntax then we encourage its
> correct use
> > but
> > >> acknowledge that there may be cases where one might be able to
> recover
> > >> intent. But I wouldn?t encourage those cases.
> > >
> > > Absolutely, which is why I would like to elevate space-separated
> list
> > items to
> > > be correct syntax rather than 'wrong but intent is clear' syntax.
> > >>
> > >> You could say that token separator in lists are a or b or c, but
> that
> > just
> > >> adds a level of complexity for very little gain. The choice of
> comma
> > makes it
> > >> seamless to translate from the raw CIF data straight in to most
> language
> > >> specific data declaration. The only language I know that accepts
> one or
> > the
> > >> other or both is MatLab.
> > >
> > > Re ease of translation: you speak as if a viable approach to a CIF
> data
> > file
> > > is to take whole text chunks and throw them at some language
> interpreter,
> > > without doing your own parse.  Quite apart from being a rather
> unlikely
> > > approach, this is impossible, as without parsing you won't know
> where the
> > list
> > > finishes.  If you do do your own parse, you can populate your
> > datastructures
> > > directly during the parse, and what list separator was originally
> used in
> > the
> > > data file is completely irrelevant.
> > >
> > > Re complexity: not sure how you are planning to deal with
> whitespace in
> > the
> > > formal grammar, but consider the following, where I have assumed
> that each
> > > token 'eats up' the following whitespace.
> > >
> > > <dataitem> = <dataname><whitespace>+<datavalue>
> > > <datavalue> = {<list>|<string>}<whitespace>+
> > > <listdatavalue> = {<list>|<string>}<whitespace>*
> > > <list> = '[' <whitespace>* {<listdatavalue>
> > > {<comma><whitespace>*<listdatavalue>}*}* ']'
> > >
> > > If we make comma or whitespace possible separators, the last
> production
> > > becomes:
> > > <list> =  '[' <whitespace>* {<listdatavalue> {<comma or
> > > whitespace><listdatavalue>}*}* ']'
> > >
> > > This looks like no extra complexity, and from a user's point of
> view
> > > whitespace as an alternative separator is simple to understand and
> > consistent
> > > with space as a token separator used everywhere else in CIF. 
> Anyway, if
> > > reduction of grammar complexity is your goal, you can just
> completely
> > exclude
> > > commas as list separators!
> >
> > Why not? Make them spaces only, and you become consistent across the
> board.
> > I have to think about the possibility of pathological cases where
> spaces
> > won't work. I can't think of any at the moment.
> >
> > >
> > > Some questions about how commas behave:
> > > 1: is a trailing comma e.g. [1,2,3,4,] a syntax error?
> > > 2. are two commas in a row a syntax error? E.g. [1,2,3,,4]
> >
> > I would say yes to syntax error. I an easily determine they may need
> to be
> > an additional list value, but can't determine what.
> >
> > > Note the above productions assume that the answer to both is yes.
> > >
> > >>
> > >> What big advantage to a language is there to specify you can use
> a comma
> > or
> > >> whitespace as a token separator? Will you be happy with the first
> person
> > who
> > >> interprets this as being ok
> > >>
> > >> loop_
> > >>   _severalvalues 1,2,3,4,5,6,7 # these being the 7 values of
> > severalvalues
> > >>
> > > Note sure what you are getting at here: I am proposing the
> following:
> > >
> > > _nicelist      [1 2 3 4 5 6 7]
> > >
> > > being the same as
> > >
> > > _nicelist      [1,2,3,4,5,6,7]
> > >
> > >  Don't see how this relates to loops.
> >
> > The point was, once you say a space and comma are equivalent token
> > separators then will it be an interpretation that they are always so
> even in
> > loops? My example was not a list, just 7 values that were separated
> by
> > commas not spaces.
> >
> > >
> > > James.
> > > ------
> > >>
> > >> On 27/11/09 11:41 AM, "James Hester" <jamesrhester@gmail.com
> > >> <http://jamesrhester@gmail.com> > wrote:
> > >>
> > >>> Dear All: looking over the list I posted previously of items
> left to
> > >>> resolve, I see only one serious one outstanding: whether or not
> to allow
> > >>> space as a separator between list items.  Nick has stated:
> > >>>
> > >>> " I will propose it has to be a comma, but make the coercion
> rule that
> > space
> > >>> separated values in a list-type object be coerced into comma
> separated
> > >>> values. That is, read spaces as you want, but don't encourage
> them."
> > >>>
> > >>> I would like to counter-propose, as Joe did originally, that
> whitespace
> > be
> > >>> elevated to equal status with comma as a valid list separator. 
> I see no
> > >>> downside to this.  Would anyone else like to speak to this issue
> before
> > we
> > >>> vote?  In particular, I would be interested to hear why Nick
> doesn't
> > want to
> > >>> encourage spaces.
> > >>
> > >> cheers
> > >>
> > >> Nick
> > >>
> > >> --------------------------------
> > >> Associate Professor N. Spadaccini, PhD
> > >> School of Computer Science & Software Engineering
> > >>
> > >> The University of Western Australia    t: +61 (0)8 6488 3452
> > >> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> > >> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3:
> www.csse.uwa.edu.au/~nick
> > >> <http://www.csse.uwa.edu.au/%7Enick>
> > >> MBDP  M002
> > >>
> > >> CRICOS Provider Code: 00126G
> > >>
> > >> e: Nick.Spadaccini@uwa.edu.au <http://Nick.Spadaccini@uwa.edu.au>
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> ddlm-group mailing list
> > >> ddlm-group@iucr.org
> > >> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> > >>
> > >
> > >
> >
> > cheers
> >
> > Nick
> >
> > --------------------------------
> > Associate Professor N. Spadaccini, PhD
> > School of Computer Science & Software Engineering
> >
> > The University of Western Australia    t: +61 (0)8 6488 3452
> > 35 Stirling Highway                    f: +61 (0)8 6488 1089
> > CRAWLEY, Perth,  WA  6009 AUSTRALIA  w3: www.csse.uwa.edu.au/~nick
> > MBDP  M002
> >
> > CRICOS Provider Code: 00126G
> >
> > e: Nick.Spadaccini@uwa.edu.au
> >
> >
> >
> >
> > _______________________________________________
> > ddlm-group mailing list
> > ddlm-group@iucr.org
> > http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
> >
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.