[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Space as a list item separator

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Space as a list item separator
From: "Herbert J. Bernstein" <[email protected]>
Date: Mon, 30 Nov 2009 16:01:44 -0500 (EST)
In-Reply-To: <[email protected]>
References: <C735A4E4.12669%[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]>

The problem is more a matter of legacy people and legacy experimental
practices than legacy data sets.  These are legacies I think we should
retain and respect.  These legacy people doing things with legacy
practices do very new and exciting science, for which CIF 2 will, 
hopefully be a useful tool, if we make it relatively easy for them
to integrate CIF 2 into their work flows.

CIF 1.5 will help some of them to do that.

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  [email protected]
=====================================================

On Mon, 30 Nov 2009, SIMON WESTRIP wrote:

> Dear all
> 
> One point I read in David's comments is that there are no legacy issues with
> respect to lists, associative arrays etc.
> Does anyone disagree? Obviously it makes life easier when considering lists
> etc if the 'legacy' word doesnt rear its head.
> 
> ____________________________________________________________________________
> From: David Brown <[email protected]>
> To: Group finalising DDLm and associated dictionaries <[email protected]>
> Sent: Monday, 30 November, 2009 19:56:30
> Subject: Re: [ddlm-group] Space as a list item separator
> 
> Pleasse forgive me, everyone, but what is all this CIF1.5 about? 
> 
> Why do we need it?
> 
> If a DDLm application is presented with with a CIF data file written using a
> DDL1 or DDL2 dictionary, which I assume uses CIF1.1 syntax, why can't we
> continue to use CIF1.1 since this works just fine for these files?  Why do
> we need CI1.5?
> 
> CIF data files written using DDL1 and DDL2 dictionaries do not contain lists
> and arrays because lists and arrays were not invented when these files were
> written, and any data files written with these dictionaries in the future
> (and there may be many of them) will still use the CIF1.1 syntax.  There is
> no danger of arrays slipping into these data files unnoticed because they
> are not defined (and never will be) in DDL1 and DDL2 dictionaries (CIF1.1
> does not allow it.)
> 
> Of course our DDLm application (if we ever get it off the ground) will need
> to be able to read data files written with CIF1.1 syntax because we are
> required to ensure that this application can read in any existing CIF data
> file.  It will also need to be able to read files written in CIF2 syntax
> because CIF2 will be needed for reading in the DDLm dictionaries (the only
> dictionaries that contain dREL) and the CIF2 data files (which may, unlike
> the CIF1.1 data files, also contain arrays and lists).
> 
> As I pointed out earlier (and it seems to have come as something of a shock
> or epiphany to some), the DDLm dictionaries include very nice lists of
> aliases that contain every data name that was ever used for a given item. 
> The data names in this alias list are, of course, quoted data values within
> the DDLm dictionary. and some contain characters that CIF2 would not
> recognize in a data name, but that is fine because they appear only in data
> values, and quoted data values no less,
> 
> When confronted with a datafile written in CIF1.1, our hypothetical
> application would switch on its CIF1.1 lexer to read in the CIF1 data file,
> and pass the results into a preparser which would match the data name in the
> CIF1.1 data file with an alias name in the DDLm dictionary, and immediately
> substitute the DDLm data name for the original DDL1 or DDL2 data mame.  Now
> all the problem with the old data names has disappeared.  The preparser
> might have to make other changes to the data value (I am not sure that there
> are any, perhaps adding delimiters to all strings so they could be stripped
> away by the parser?).  At this point you have a fully compliant CIF2-DDLm
> data set, which you can dREL to your heart's content.  In particular, if
> dREL calls for an array, the item associated with that array will contain a
> dREL mothod for assembling the array from the individual data items that
> were originally stored in the input CIF and are now stored under a DDLm
> defined name.  The only thing that would be difficult to do would be to
> reconstruct a DDL1 or DDL2 compliant data output file, but even this could
> be done if it was thought necessary.
> 
> Please let's not make this exercise more confusing than necessary. 
> 
> You guys need to get on with defining what you want in CIF2.  CIF1 can then
> look after itself using the existing tools together with the aliases for
> renaming the items.
> 
> David
> 
> Herbert J. Bernstein wrote:
>       Dear Colleagues,
>
>         Instead of looking at the minimally disruptive approach as a
>       modification to CIF 2, in order to in fact be minimally
>       disruptive, I would suggest looking at CIF 1.5 in terms if what
>       would need to be changed in CIF 1.1 in order to support DDLm.
>
>         I think the following will do it:
>
>         For data values, only, recognize three new initial string
>       delimiters in addition to the existing single quote ("'"),
>       double quote ("\"") and newline-semicolon ("\n;"):
>
>         left brace ("{")
>         left square bracket ("[")
>
>       Unless these are encountered in a left to right scan at a point
>       at which the first character if a data value is expected, the
>       parse remains the same as for CIF 1.1.
>
>       Once the left brace or left square bracket is encountered, then
>       whatever the formally agreed rules for the CIF2 parse are would
>       apply until the balancing terminal right brace or right square
>       bracket.  It is only the top level terminal right brace or right
>       square bracket that would be required to be followed by
>       whitespace.
>
>       The new dictionaries would _not_ be written in CIF 1.5, only in
>       full CIF 2, but parsers would be expected to process any CIF not
>       clearly self-identifying as a CIF 2 file as a CIF 1.5 file. 
>       This means that the only major use of CIF 2 constructs in CIF
>       1.5 would be to allow users to provide list, matrix and vector
>       data values.
>
>       This also means, for example, as per David's suggestion, that
>       the only way a tag with embedded square brackets or embedded
>       braces would be handled in a new dictionary would be as an
>       alias, but the formality of CIF 1.5 would give applications a
>       clean way to make use of those aliases in parsing data files.
>
>       If we follow this approach, then we would be honoring the
>       published commitment to be able to keep essentially all exsiting
>       data files unchanged, and still be able to handle them with
>       DDLm.  The only exception would be data files that happen to
>       include data values that begin with '{' or '[', which would now
>       have to be quoted. I do not believe that there are many such
>       cases, and I believe that there would be acceptance of the need
>       to add such quoting if encountered.
>
>       To summarize:
>
>         Development of CIF 2 with DDLm support would continue and be
>       used for
>       new dictionaires; and
>
>         Development of CIF 1.5 to serve as a bridge between CIF 1.1
>       and DDLm would start, primarily giving uses the ability to
>       provide list, matrix and vector data values, would be started to
>       allow for a smooth transition to wider use of DDLm and CIF 2
>
>       Regards,
>         Herbert
> 
>
>       =====================================================
>        Herbert J. Bernstein, Professor of Computer Science
>          Dowling College, Kramer Science Center, KSC 121
>               Idle Hour Blvd, Oakdale, NY, 11769
>
>                        +1-631-244-3035
>                        [email protected]
>       =====================================================
>
>       On Sun, 29 Nov 2009, SIMON WESTRIP wrote:
>
>             Yes that summarizes the differences. Unfortunately,
>             the single-byte
>             non-delimited strings have to be separated by
>             white space in this approach, which is perhaps
>             counter-intuitive and mght
>             have some legacy issues?
> 
> ___________________________________________________________________________
>             _
>             From: James Hester <[email protected]>
>             To: Group finalising DDLm and associated
>             dictionaries <[email protected]>
>             Sent: Sunday, 29 November, 2009 3:45:18
>             Subject: Re: [ddlm-group] Space as a list item
>             separator
>
>             Hi Simon: I'm trying to read between the lines here
>             as to how the syntax we
>             have been discussing diverges from what you have
>             described, and have come up
>             with the following list:
>
>             1. Presumably the []{} characters must be surrounded
>             by whitespace in your
>             version
>             2. We have restricted the character sets of the
>             non-delimited strings and
>             tags more than strictly necessary.
>             3. Comma might be included in the single-byte
>             non-delimited string list
>
>             Are there any other differences that you would
>             identify?
>
>             On Sat, Nov 28, 2009 at 10:58 PM, SIMON WESTRIP
>             <[email protected]> wrote:
>                   Dear all
>
>                   I was chatting with the man who 'writes the
>             cheques' yesterday
>                   about some of the
>                   changes he might expect with CIF2, and based
>             on this I feel I
>                   ought to at least have
>                   a go at exploring a 'minimally disruptive'
>             approach, so at the
>                   risk of being shouted at,
>                   here goes at a slightly different way of
>             looking at CIF:
>
>                   CIF contains a list of strings separated by
>             whitespace.
>
>                   A string can be nondelimited or delimited.
>
>                   Nondelimited strings have a restricted
>             character set (minimally
>                   whitespace is excluded)
>
>                   A nondelimited string cannot start with any of
>             the delimiters
>                   (obviously)
>
>                   Nondelimited strings can have special meaning
>             governing what
>                   follows them:
>
>                       reserved words, e.g. loop_
>
>                       tags, e.g. data_ , _foo
>
>                       single-byte nondelimited strings, e.g. [ ]
>             { } :
>
>                   All other strings are treated as raw data
>             values
> 
>
>                   There, least I can say I tried :-)
>
>                   Cheers
>
>                   Simon
> 
> ___________________________________________________________________________
>             _
>             From: SIMON WESTRIP <[email protected]>
>             To: Group finalising DDLm and associated
>             dictionaries
>             <[email protected]>
>             Sent: Saturday, 28 November, 2009 10:01:38
>
>             Subject: Re: [ddlm-group] Space as a list item
>             separator
>
>             I had been under the assumption that the separation
>             of list items by a
>             comma was 'set in stone'
>             (and was one reason for dropping the CIF1 syntax of
>             requiring space
>             after data values),
>             but if its up for negotiation I would opt for using
>             the space as a
>             separator as elsewhere in the CIF,
>             partly because then a list can essentially be
>             treated much like a
>             single-item loop - i.e. same basic parsing
>             of <value><space><value><space>...
>
>             Cheers
>
>             Simon
> 
> ___________________________________________________________________________
>             _
>             From: Herbert J. Bernstein
>             <[email protected]>
>             To: Group finalising DDLm and associated
>             dictionaries
>             <[email protected]>
>             Cc: [email protected]
>             Sent: Friday, 27 November, 2009 11:43:10
>             Subject: Re: [ddlm-group] Space as a list item
>             separator
>
>             Dear Colleagues,
>
>               I have no objection to accepting either comma or
>             whitespace
>             as a valid separator in a list.  I can't object -- I
>             have been
>             coding to that standard since 1997, and now would
>             only have to
>             remove the message generated for the case of the
>             space.  We already
>             accept multiple glyphs as valid separators at all
>             levels:
>
>               whitespace itself it one of several character
>             sequences in rather
>             complex combinations:  any number of blanks, tabs,
>             newlines and
>             comments.
>             The comma itself is handled in a complex way.  We
>             accept (or should
>             accept) any whitespace before and after a comma as
>             valid, as in
>             {a,b} versus {a , b }.  Adding the option of leaving
>             out the comma
>             itself and just having the whitespace as the
>             separator make just
>             as much sense.
>
>               I see nothing to be gained by now forbidding the
>             comma.  The meaning
>             of {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or,
>             under this new
>             (and I think more sensibsle and realistic approach)
>             {a . b .} or {a ?
>             b ?}.
>
>               The blank reads particularly well in dealing with
>             vectors and
>             matrices. The comma reads well when dealing with
>             strings.
>
>               I think we would do best with both as valid
>             alternatives (no error,
>             no warning for either one).
>
>               Regards,
>                 Herbert
>             =====================================================
>             Herbert J. Bernstein, Professor of Computer Science
>               Dowling College, Kramer Science Center, KSC 121
>                     Idle Hour Blvd, Oakdale, NY, 11769
>
>                             +1-631-244-3035
>                             [email protected]
>             =====================================================
>
>             On Fri, 27 Nov 2009, SIMON WESTRIP wrote:
>
>             > At first glance, you're considering using space
>             instead of commas as
>             list
>             > separators?
>             > which is not so far away from the CIF1 requirement
>             of space
>             following a
>             > delimiter?
>             >
>             > But I'm only on my first cup of coffee this
>             morning :-)
>             >
> >__________________________________________________________________________
>             _
>             _
>             > From: Nick Spadaccini <[email protected]>
>             > To: Group finalising DDLm and associated
>             dictionaries
>             <[email protected]>
>             > Sent: Friday, 27 November, 2009 7:46:44
>             > Subject: Re: [ddlm-group] Space as a list item
>             separator
>             >
>             >
>             >
>             >
>             > On 27/11/09 2:32 PM, "James Hester"
>             <[email protected]> wrote:
>             >
>             > > See comments below:
>             > >
>             > > On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini
>             <[email protected]>
>             > wrote:
>             > >> Timely email, come in just after the one I
>             sent.
>             > >>
>             > >> My position is if we specify the syntax then we
>             encourage its
>             correct use
>             > but
>             > >> acknowledge that there may be cases where one
>             might be able to
>             recover
>             > >> intent. But I wouldn?t encourage those cases.
>             > >
>             > > Absolutely, which is why I would like to elevate
>             space-separated
>             list
>             > items to
>             > > be correct syntax rather than 'wrong but intent
>             is clear' syntax.
>             > >>
>             > >> You could say that token separator in lists are
>             a or b or c, but
>             that
>             > just
>             > >> adds a level of complexity for very little
>             gain. The choice of
>             comma
>             > makes it
>             > >> seamless to translate from the raw CIF data
>             straight in to most
>             language
>             > >> specific data declaration. The only language I
>             know that accepts
>             one or
>             > the
>             > >> other or both is MatLab.
>             > >
>             > > Re ease of translation: you speak as if a viable
>             approach to a CIF
>             data
>             > file
>             > > is to take whole text chunks and throw them at
>             some language
>             interpreter,
>             > > without doing your own parse.  Quite apart from
>             being a rather
>             unlikely
>             > > approach, this is impossible, as without parsing
>             you won't know
>             where the
>             > list
>             > > finishes.  If you do do your own parse, you can
>             populate your
>             > datastructures
>             > > directly during the parse, and what list
>             separator was originally
>             used in
>             > the
>             > > data file is completely irrelevant.
>             > >
>             > > Re complexity: not sure how you are planning to
>             deal with
>             whitespace in
>             > the
>             > > formal grammar, but consider the following,
>             where I have assumed
>             that each
>             > > token 'eats up' the following whitespace.
>             > >
>             > > <dataitem> = <dataname><whitespace>+<datavalue>
>             > > <datavalue> = {<list>|<string>}<whitespace>+
>             > > <listdatavalue> = {<list>|<string>}<whitespace>*
>             > > <list> = '[' <whitespace>* {<listdatavalue>
>             > > {<comma><whitespace>*<listdatavalue>}*}* ']'
>             > >
>             > > If we make comma or whitespace possible
>             separators, the last
>             production
>             > > becomes:
>             > > <list> =  '[' <whitespace>* {<listdatavalue>
>             {<comma or
>             > > whitespace><listdatavalue>}*}* ']'
>             > >
>             > > This looks like no extra complexity, and from a
>             user's point of
>             view
>             > > whitespace as an alternative separator is simple
>             to understand and
>             > consistent
>             > > with space as a token separator used everywhere
>             else in CIF. 
>             Anyway, if
>             > > reduction of grammar complexity is your goal,
>             you can just
>             completely
>             > exclude
>             > > commas as list separators!
>             >
>             > Why not? Make them spaces only, and you become
>             consistent across the
>             board.
>             > I have to think about the possibility of
>             pathological cases where
>             spaces
>             > won't work. I can't think of any at the moment.
>             >
>             > >
>             > > Some questions about how commas behave:
>             > > 1: is a trailing comma e.g. [1,2,3,4,] a syntax
>             error?
>             > > 2. are two commas in a row a syntax error? E.g.
>             [1,2,3,,4]
>             >
>             > I would say yes to syntax error. I an easily
>             determine they may need
>             to be
>             > an additional list value, but can't determine
>             what.
>             >
>             > > Note the above productions assume that the
>             answer to both is yes.
>             > >
>             > >>
>             > >> What big advantage to a language is there to
>             specify you can use
>             a comma
>             > or
>             > >> whitespace as a token separator? Will you be
>             happy with the first
>             person
>             > who
>             > >> interprets this as being ok
>             > >>
>             > >> loop_
>             > >>   _severalvalues 1,2,3,4,5,6,7 # these being
>             the 7 values of
>             > severalvalues
>             > >>
>             > > Note sure what you are getting at here: I am
>             proposing the
>             following:
>             > >
>             > > _nicelist      [1 2 3 4 5 6 7]
>             > >
>             > > being the same as
>             > >
>             > > _nicelist      [1,2,3,4,5,6,7]
>             > >
>             > >  Don't see how this relates to loops.
>             >
>             > The point was, once you say a space and comma are
>             equivalent token
>             > separators then will it be an interpretation that
>             they are always so
>             even in
>             > loops? My example was not a list, just 7 values
>             that were separated
>             by
>             > commas not spaces.
>             >
>             > >
>             > > James.
>             > > ------
>             > >>
>             > >> On 27/11/09 11:41 AM, "James Hester"
>             <[email protected]
>             > >> <http://[email protected]> > wrote:
>             > >>
>             > >>> Dear All: looking over the list I posted
>             previously of items
>             left to
>             > >>> resolve, I see only one serious one
>             outstanding: whether or not
>             to allow
>             > >>> space as a separator between list items.  Nick
>             has stated:
>             > >>>
>             > >>> " I will propose it has to be a comma, but
>             make the coercion
>             rule that
>             > space
>             > >>> separated values in a list-type object be
>             coerced into comma
>             separated
>             > >>> values. That is, read spaces as you want, but
>             don't encourage
>             them."
>             > >>>
>             > >>> I would like to counter-propose, as Joe did
>             originally, that
>             whitespace
>             > be
>             > >>> elevated to equal status with comma as a valid
>             list separator. 
>             I see no
>             > >>> downside to this.  Would anyone else like to
>             speak to this issue
>             before
>             > we
>             > >>> vote?  In particular, I would be interested to
>             hear why Nick
>             doesn't
>             > want to
>             > >>> encourage spaces.
>             > >>
>             > >> cheers
>             > >>
>             > >> Nick
>             > >>
>             > >> --------------------------------
>             > >> Associate Professor N. Spadaccini, PhD
>             > >> School of Computer Science & Software
>             Engineering
>             > >>
>             > >> The University of Western Australia    t: +61
>             (0)8 6488 3452
>             > >> 35 Stirling Highway                    f: +61
>             (0)8 6488 1089
>             > >> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3:
>             www.csse.uwa.edu.au/~nick
>             > >> <http://www.csse.uwa.edu.au/%7Enick>
>             > >> MBDP  M002
>             > >>
>             > >> CRICOS Provider Code: 00126G
>             > >>
>             > >> e: [email protected]
>             <http://[email protected]>
>             > >>
>             > >>
>             > >>
>             > >> _______________________________________________
>             > >> ddlm-group mailing list
>             > >> [email protected]
>             > >>
>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
>             > >>
>             > >
>             > >
>             >
>             > cheers
>             >
>             > Nick
>             >
>             > --------------------------------
>             > Associate Professor N. Spadaccini, PhD
>             > School of Computer Science & Software Engineering
>             >
>             > The University of Western Australia    t: +61 (0)8
>             6488 3452
>             > 35 Stirling Highway                    f: +61 (0)8
>             6488 1089
>             > CRAWLEY, Perth,  WA  6009 AUSTRALIA  w3:
>             www.csse.uwa.edu.au/~nick
>             > MBDP  M002
>             >
>             > CRICOS Provider Code: 00126G
>             >
>             > e: [email protected]
>             >
>             >
>             >
>             >
>             > _______________________________________________
>             > ddlm-group mailing list
>             > [email protected]
>             >
>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
>             >
>             >
>
>             _______________________________________________
>             ddlm-group mailing list
>             [email protected]
>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> 
>
>             --
>             T +61 (02) 9717 9907
>             F +61 (02) 9717 3145
>             M +61 (04) 0249 4148
> 
> 
>
>     ____________________________________________________________________
> 
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> 
> 
>

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] Space as a list item separator (Nick Spadaccini)

Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)

Re: [ddlm-group] Space as a list item separator (Herbert J. Bernstein)

Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)

Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)

Re: [ddlm-group] Space as a list item separator (James Hester)

Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)

Re: [ddlm-group] Space as a list item separator (Herbert J. Bernstein)

Re: [ddlm-group] Space as a list item separator (David Brown)

Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)

Prev by Date: Re: [ddlm-group] Space as a list item separator

Next by Date: Re: [ddlm-group] Space as a list item separator

Prev by thread: Re: [ddlm-group] Space as a list item separator

Next by thread: Re: [ddlm-group] Space as a list item separator

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Space as a list item separator