[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Space as a list item separator
- To: Group finalising DDLm and associated dictionaries <[email protected]>
- Subject: Re: [ddlm-group] Space as a list item separator
- From: "Herbert J. Bernstein" <[email protected]>
- Date: Mon, 30 Nov 2009 16:01:44 -0500 (EST)
- In-Reply-To: <[email protected]>
- References: <C735A4E4.12669%[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]>
The problem is more a matter of legacy people and legacy experimental
practices than legacy data sets. These are legacies I think we should
retain and respect. These legacy people doing things with legacy
practices do very new and exciting science, for which CIF 2 will,
hopefully be a useful tool, if we make it relatively easy for them
to integrate CIF 2 into their work flows.
CIF 1.5 will help some of them to do that.
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[email protected]
=====================================================
On Mon, 30 Nov 2009, SIMON WESTRIP wrote:
> Dear all
>
> One point I read in David's comments is that there are no legacy issues with
> respect to lists, associative arrays etc.
> Does anyone disagree? Obviously it makes life easier when considering lists
> etc if the 'legacy' word doesnt rear its head.
>
> ____________________________________________________________________________
> From: David Brown <[email protected]>
> To: Group finalising DDLm and associated dictionaries <[email protected]>
> Sent: Monday, 30 November, 2009 19:56:30
> Subject: Re: [ddlm-group] Space as a list item separator
>
> Pleasse forgive me, everyone, but what is all this CIF1.5 about?
>
> Why do we need it?
>
> If a DDLm application is presented with with a CIF data file written using a
> DDL1 or DDL2 dictionary, which I assume uses CIF1.1 syntax, why can't we
> continue to use CIF1.1 since this works just fine for these files? Why do
> we need CI1.5?
>
> CIF data files written using DDL1 and DDL2 dictionaries do not contain lists
> and arrays because lists and arrays were not invented when these files were
> written, and any data files written with these dictionaries in the future
> (and there may be many of them) will still use the CIF1.1 syntax. There is
> no danger of arrays slipping into these data files unnoticed because they
> are not defined (and never will be) in DDL1 and DDL2 dictionaries (CIF1.1
> does not allow it.)
>
> Of course our DDLm application (if we ever get it off the ground) will need
> to be able to read data files written with CIF1.1 syntax because we are
> required to ensure that this application can read in any existing CIF data
> file. It will also need to be able to read files written in CIF2 syntax
> because CIF2 will be needed for reading in the DDLm dictionaries (the only
> dictionaries that contain dREL) and the CIF2 data files (which may, unlike
> the CIF1.1 data files, also contain arrays and lists).
>
> As I pointed out earlier (and it seems to have come as something of a shock
> or epiphany to some), the DDLm dictionaries include very nice lists of
> aliases that contain every data name that was ever used for a given item.
> The data names in this alias list are, of course, quoted data values within
> the DDLm dictionary. and some contain characters that CIF2 would not
> recognize in a data name, but that is fine because they appear only in data
> values, and quoted data values no less,
>
> When confronted with a datafile written in CIF1.1, our hypothetical
> application would switch on its CIF1.1 lexer to read in the CIF1 data file,
> and pass the results into a preparser which would match the data name in the
> CIF1.1 data file with an alias name in the DDLm dictionary, and immediately
> substitute the DDLm data name for the original DDL1 or DDL2 data mame. Now
> all the problem with the old data names has disappeared. The preparser
> might have to make other changes to the data value (I am not sure that there
> are any, perhaps adding delimiters to all strings so they could be stripped
> away by the parser?). At this point you have a fully compliant CIF2-DDLm
> data set, which you can dREL to your heart's content. In particular, if
> dREL calls for an array, the item associated with that array will contain a
> dREL mothod for assembling the array from the individual data items that
> were originally stored in the input CIF and are now stored under a DDLm
> defined name. The only thing that would be difficult to do would be to
> reconstruct a DDL1 or DDL2 compliant data output file, but even this could
> be done if it was thought necessary.
>
> Please let's not make this exercise more confusing than necessary.
>
> You guys need to get on with defining what you want in CIF2. CIF1 can then
> look after itself using the existing tools together with the aliases for
> renaming the items.
>
> David
>
> Herbert J. Bernstein wrote:
> Dear Colleagues,
>
> Instead of looking at the minimally disruptive approach as a
> modification to CIF 2, in order to in fact be minimally
> disruptive, I would suggest looking at CIF 1.5 in terms if what
> would need to be changed in CIF 1.1 in order to support DDLm.
>
> I think the following will do it:
>
> For data values, only, recognize three new initial string
> delimiters in addition to the existing single quote ("'"),
> double quote ("\"") and newline-semicolon ("\n;"):
>
> left brace ("{")
> left square bracket ("[")
>
> Unless these are encountered in a left to right scan at a point
> at which the first character if a data value is expected, the
> parse remains the same as for CIF 1.1.
>
> Once the left brace or left square bracket is encountered, then
> whatever the formally agreed rules for the CIF2 parse are would
> apply until the balancing terminal right brace or right square
> bracket. It is only the top level terminal right brace or right
> square bracket that would be required to be followed by
> whitespace.
>
> The new dictionaries would _not_ be written in CIF 1.5, only in
> full CIF 2, but parsers would be expected to process any CIF not
> clearly self-identifying as a CIF 2 file as a CIF 1.5 file.
> This means that the only major use of CIF 2 constructs in CIF
> 1.5 would be to allow users to provide list, matrix and vector
> data values.
>
> This also means, for example, as per David's suggestion, that
> the only way a tag with embedded square brackets or embedded
> braces would be handled in a new dictionary would be as an
> alias, but the formality of CIF 1.5 would give applications a
> clean way to make use of those aliases in parsing data files.
>
> If we follow this approach, then we would be honoring the
> published commitment to be able to keep essentially all exsiting
> data files unchanged, and still be able to handle them with
> DDLm. The only exception would be data files that happen to
> include data values that begin with '{' or '[', which would now
> have to be quoted. I do not believe that there are many such
> cases, and I believe that there would be acceptance of the need
> to add such quoting if encountered.
>
> To summarize:
>
> Development of CIF 2 with DDLm support would continue and be
> used for
> new dictionaires; and
>
> Development of CIF 1.5 to serve as a bridge between CIF 1.1
> and DDLm would start, primarily giving uses the ability to
> provide list, matrix and vector data values, would be started to
> allow for a smooth transition to wider use of DDLm and CIF 2
>
> Regards,
> Herbert
>
>
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
>
> +1-631-244-3035
> [email protected]
> =====================================================
>
> On Sun, 29 Nov 2009, SIMON WESTRIP wrote:
>
> Yes that summarizes the differences. Unfortunately,
> the single-byte
> non-delimited strings have to be separated by
> white space in this approach, which is perhaps
> counter-intuitive and mght
> have some legacy issues?
>
> ___________________________________________________________________________
> _
> From: James Hester <[email protected]>
> To: Group finalising DDLm and associated
> dictionaries <[email protected]>
> Sent: Sunday, 29 November, 2009 3:45:18
> Subject: Re: [ddlm-group] Space as a list item
> separator
>
> Hi Simon: I'm trying to read between the lines here
> as to how the syntax we
> have been discussing diverges from what you have
> described, and have come up
> with the following list:
>
> 1. Presumably the []{} characters must be surrounded
> by whitespace in your
> version
> 2. We have restricted the character sets of the
> non-delimited strings and
> tags more than strictly necessary.
> 3. Comma might be included in the single-byte
> non-delimited string list
>
> Are there any other differences that you would
> identify?
>
> On Sat, Nov 28, 2009 at 10:58 PM, SIMON WESTRIP
> <[email protected]> wrote:
> Dear all
>
> I was chatting with the man who 'writes the
> cheques' yesterday
> about some of the
> changes he might expect with CIF2, and based
> on this I feel I
> ought to at least have
> a go at exploring a 'minimally disruptive'
> approach, so at the
> risk of being shouted at,
> here goes at a slightly different way of
> looking at CIF:
>
> CIF contains a list of strings separated by
> whitespace.
>
> A string can be nondelimited or delimited.
>
> Nondelimited strings have a restricted
> character set (minimally
> whitespace is excluded)
>
> A nondelimited string cannot start with any of
> the delimiters
> (obviously)
>
> Nondelimited strings can have special meaning
> governing what
> follows them:
>
> reserved words, e.g. loop_
>
> tags, e.g. data_ , _foo
>
> single-byte nondelimited strings, e.g. [ ]
> { } :
>
> All other strings are treated as raw data
> values
>
>
> There, least I can say I tried :-)
>
> Cheers
>
> Simon
>
> ___________________________________________________________________________
> _
> From: SIMON WESTRIP <[email protected]>
> To: Group finalising DDLm and associated
> dictionaries
> <[email protected]>
> Sent: Saturday, 28 November, 2009 10:01:38
>
> Subject: Re: [ddlm-group] Space as a list item
> separator
>
> I had been under the assumption that the separation
> of list items by a
> comma was 'set in stone'
> (and was one reason for dropping the CIF1 syntax of
> requiring space
> after data values),
> but if its up for negotiation I would opt for using
> the space as a
> separator as elsewhere in the CIF,
> partly because then a list can essentially be
> treated much like a
> single-item loop - i.e. same basic parsing
> of <value><space><value><space>...
>
> Cheers
>
> Simon
>
> ___________________________________________________________________________
> _
> From: Herbert J. Bernstein
> <[email protected]>
> To: Group finalising DDLm and associated
> dictionaries
> <[email protected]>
> Cc: [email protected]
> Sent: Friday, 27 November, 2009 11:43:10
> Subject: Re: [ddlm-group] Space as a list item
> separator
>
> Dear Colleagues,
>
> I have no objection to accepting either comma or
> whitespace
> as a valid separator in a list. I can't object -- I
> have been
> coding to that standard since 1997, and now would
> only have to
> remove the message generated for the case of the
> space. We already
> accept multiple glyphs as valid separators at all
> levels:
>
> whitespace itself it one of several character
> sequences in rather
> complex combinations: any number of blanks, tabs,
> newlines and
> comments.
> The comma itself is handled in a complex way. We
> accept (or should
> accept) any whitespace before and after a comma as
> valid, as in
> {a,b} versus {a , b }. Adding the option of leaving
> out the comma
> itself and just having the whitespace as the
> separator make just
> as much sense.
>
> I see nothing to be gained by now forbidding the
> comma. The meaning
> of {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or,
> under this new
> (and I think more sensibsle and realistic approach)
> {a . b .} or {a ?
> b ?}.
>
> The blank reads particularly well in dealing with
> vectors and
> matrices. The comma reads well when dealing with
> strings.
>
> I think we would do best with both as valid
> alternatives (no error,
> no warning for either one).
>
> Regards,
> Herbert
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
>
> +1-631-244-3035
> [email protected]
> =====================================================
>
> On Fri, 27 Nov 2009, SIMON WESTRIP wrote:
>
> > At first glance, you're considering using space
> instead of commas as
> list
> > separators?
> > which is not so far away from the CIF1 requirement
> of space
> following a
> > delimiter?
> >
> > But I'm only on my first cup of coffee this
> morning :-)
> >
> >__________________________________________________________________________
> _
> _
> > From: Nick Spadaccini <[email protected]>
> > To: Group finalising DDLm and associated
> dictionaries
> <[email protected]>
> > Sent: Friday, 27 November, 2009 7:46:44
> > Subject: Re: [ddlm-group] Space as a list item
> separator
> >
> >
> >
> >
> > On 27/11/09 2:32 PM, "James Hester"
> <[email protected]> wrote:
> >
> > > See comments below:
> > >
> > > On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini
> <[email protected]>
> > wrote:
> > >> Timely email, come in just after the one I
> sent.
> > >>
> > >> My position is if we specify the syntax then we
> encourage its
> correct use
> > but
> > >> acknowledge that there may be cases where one
> might be able to
> recover
> > >> intent. But I wouldn?t encourage those cases.
> > >
> > > Absolutely, which is why I would like to elevate
> space-separated
> list
> > items to
> > > be correct syntax rather than 'wrong but intent
> is clear' syntax.
> > >>
> > >> You could say that token separator in lists are
> a or b or c, but
> that
> > just
> > >> adds a level of complexity for very little
> gain. The choice of
> comma
> > makes it
> > >> seamless to translate from the raw CIF data
> straight in to most
> language
> > >> specific data declaration. The only language I
> know that accepts
> one or
> > the
> > >> other or both is MatLab.
> > >
> > > Re ease of translation: you speak as if a viable
> approach to a CIF
> data
> > file
> > > is to take whole text chunks and throw them at
> some language
> interpreter,
> > > without doing your own parse. Quite apart from
> being a rather
> unlikely
> > > approach, this is impossible, as without parsing
> you won't know
> where the
> > list
> > > finishes. If you do do your own parse, you can
> populate your
> > datastructures
> > > directly during the parse, and what list
> separator was originally
> used in
> > the
> > > data file is completely irrelevant.
> > >
> > > Re complexity: not sure how you are planning to
> deal with
> whitespace in
> > the
> > > formal grammar, but consider the following,
> where I have assumed
> that each
> > > token 'eats up' the following whitespace.
> > >
> > > <dataitem> = <dataname><whitespace>+<datavalue>
> > > <datavalue> = {<list>|<string>}<whitespace>+
> > > <listdatavalue> = {<list>|<string>}<whitespace>*
> > > <list> = '[' <whitespace>* {<listdatavalue>
> > > {<comma><whitespace>*<listdatavalue>}*}* ']'
> > >
> > > If we make comma or whitespace possible
> separators, the last
> production
> > > becomes:
> > > <list> = '[' <whitespace>* {<listdatavalue>
> {<comma or
> > > whitespace><listdatavalue>}*}* ']'
> > >
> > > This looks like no extra complexity, and from a
> user's point of
> view
> > > whitespace as an alternative separator is simple
> to understand and
> > consistent
> > > with space as a token separator used everywhere
> else in CIF.
> Anyway, if
> > > reduction of grammar complexity is your goal,
> you can just
> completely
> > exclude
> > > commas as list separators!
> >
> > Why not? Make them spaces only, and you become
> consistent across the
> board.
> > I have to think about the possibility of
> pathological cases where
> spaces
> > won't work. I can't think of any at the moment.
> >
> > >
> > > Some questions about how commas behave:
> > > 1: is a trailing comma e.g. [1,2,3,4,] a syntax
> error?
> > > 2. are two commas in a row a syntax error? E.g.
> [1,2,3,,4]
> >
> > I would say yes to syntax error. I an easily
> determine they may need
> to be
> > an additional list value, but can't determine
> what.
> >
> > > Note the above productions assume that the
> answer to both is yes.
> > >
> > >>
> > >> What big advantage to a language is there to
> specify you can use
> a comma
> > or
> > >> whitespace as a token separator? Will you be
> happy with the first
> person
> > who
> > >> interprets this as being ok
> > >>
> > >> loop_
> > >> _severalvalues 1,2,3,4,5,6,7 # these being
> the 7 values of
> > severalvalues
> > >>
> > > Note sure what you are getting at here: I am
> proposing the
> following:
> > >
> > > _nicelist [1 2 3 4 5 6 7]
> > >
> > > being the same as
> > >
> > > _nicelist [1,2,3,4,5,6,7]
> > >
> > > Don't see how this relates to loops.
> >
> > The point was, once you say a space and comma are
> equivalent token
> > separators then will it be an interpretation that
> they are always so
> even in
> > loops? My example was not a list, just 7 values
> that were separated
> by
> > commas not spaces.
> >
> > >
> > > James.
> > > ------
> > >>
> > >> On 27/11/09 11:41 AM, "James Hester"
> <[email protected]
> > >> <http://[email protected]> > wrote:
> > >>
> > >>> Dear All: looking over the list I posted
> previously of items
> left to
> > >>> resolve, I see only one serious one
> outstanding: whether or not
> to allow
> > >>> space as a separator between list items. Nick
> has stated:
> > >>>
> > >>> " I will propose it has to be a comma, but
> make the coercion
> rule that
> > space
> > >>> separated values in a list-type object be
> coerced into comma
> separated
> > >>> values. That is, read spaces as you want, but
> don't encourage
> them."
> > >>>
> > >>> I would like to counter-propose, as Joe did
> originally, that
> whitespace
> > be
> > >>> elevated to equal status with comma as a valid
> list separator.
> I see no
> > >>> downside to this. Would anyone else like to
> speak to this issue
> before
> > we
> > >>> vote? In particular, I would be interested to
> hear why Nick
> doesn't
> > want to
> > >>> encourage spaces.
> > >>
> > >> cheers
> > >>
> > >> Nick
> > >>
> > >> --------------------------------
> > >> Associate Professor N. Spadaccini, PhD
> > >> School of Computer Science & Software
> Engineering
> > >>
> > >> The University of Western Australia t: +61
> (0)8 6488 3452
> > >> 35 Stirling Highway f: +61
> (0)8 6488 1089
> > >> CRAWLEY, Perth, WA 6009 AUSTRALIA w3:
> www.csse.uwa.edu.au/~nick
> > >> <http://www.csse.uwa.edu.au/%7Enick>
> > >> MBDP M002
> > >>
> > >> CRICOS Provider Code: 00126G
> > >>
> > >> e: [email protected]
> <http://[email protected]>
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> ddlm-group mailing list
> > >> [email protected]
> > >>
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> > >>
> > >
> > >
> >
> > cheers
> >
> > Nick
> >
> > --------------------------------
> > Associate Professor N. Spadaccini, PhD
> > School of Computer Science & Software Engineering
> >
> > The University of Western Australia t: +61 (0)8
> 6488 3452
> > 35 Stirling Highway f: +61 (0)8
> 6488 1089
> > CRAWLEY, Perth, WA 6009 AUSTRALIA w3:
> www.csse.uwa.edu.au/~nick
> > MBDP M002
> >
> > CRICOS Provider Code: 00126G
> >
> > e: [email protected]
> >
> >
> >
> >
> > _______________________________________________
> > ddlm-group mailing list
> > [email protected]
> >
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
> >
>
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
>
>
> ____________________________________________________________________
>
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
>
>
>
_______________________________________________ ddlm-group mailing list [email protected] http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] Space as a list item separator (Nick Spadaccini)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Re: [ddlm-group] Space as a list item separator (Herbert J. Bernstein)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Re: [ddlm-group] Space as a list item separator (James Hester)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Re: [ddlm-group] Space as a list item separator (Herbert J. Bernstein)
- Re: [ddlm-group] Space as a list item separator (David Brown)
- Re: [ddlm-group] Space as a list item separator (SIMON WESTRIP)
- Prev by Date: Re: [ddlm-group] Space as a list item separator
- Next by Date: Re: [ddlm-group] Space as a list item separator
- Prev by thread: Re: [ddlm-group] Space as a list item separator
- Next by thread: Re: [ddlm-group] Space as a list item separator
- Index(es):

