Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Space as a list item separator

The problem is more a matter of legacy people and legacy experimental
practices than legacy data sets.  These are legacies I think we should
retain and respect.  These legacy people doing things with legacy
practices do very new and exciting science, for which CIF 2 will, 
hopefully be a useful tool, if we make it relatively easy for them
to integrate CIF 2 into their work flows.

CIF 1.5 will help some of them to do that.

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Mon, 30 Nov 2009, SIMON WESTRIP wrote:

> Dear all
> 
> One point I read in David's comments is that there are no legacy issues with
> respect to lists, associative arrays etc.
> Does anyone disagree? Obviously it makes life easier when considering lists
> etc if the 'legacy' word doesnt rear its head.
> 
> ____________________________________________________________________________
> From: David Brown <idbrown@mcmaster.ca>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Monday, 30 November, 2009 19:56:30
> Subject: Re: [ddlm-group] Space as a list item separator
> 
> Pleasse forgive me, everyone, but what is all this CIF1.5 about? 
> 
> Why do we need it?
> 
> If a DDLm application is presented with with a CIF data file written using a
> DDL1 or DDL2 dictionary, which I assume uses CIF1.1 syntax, why can't we
> continue to use CIF1.1 since this works just fine for these files?  Why do
> we need CI1.5?
> 
> CIF data files written using DDL1 and DDL2 dictionaries do not contain lists
> and arrays because lists and arrays were not invented when these files were
> written, and any data files written with these dictionaries in the future
> (and there may be many of them) will still use the CIF1.1 syntax.  There is
> no danger of arrays slipping into these data files unnoticed because they
> are not defined (and never will be) in DDL1 and DDL2 dictionaries (CIF1.1
> does not allow it.)
> 
> Of course our DDLm application (if we ever get it off the ground) will need
> to be able to read data files written with CIF1.1 syntax because we are
> required to ensure that this application can read in any existing CIF data
> file.  It will also need to be able to read files written in CIF2 syntax
> because CIF2 will be needed for reading in the DDLm dictionaries (the only
> dictionaries that contain dREL) and the CIF2 data files (which may, unlike
> the CIF1.1 data files, also contain arrays and lists).
> 
> As I pointed out earlier (and it seems to have come as something of a shock
> or epiphany to some), the DDLm dictionaries include very nice lists of
> aliases that contain every data name that was ever used for a given item. 
> The data names in this alias list are, of course, quoted data values within
> the DDLm dictionary. and some contain characters that CIF2 would not
> recognize in a data name, but that is fine because they appear only in data
> values, and quoted data values no less,
> 
> When confronted with a datafile written in CIF1.1, our hypothetical
> application would switch on its CIF1.1 lexer to read in the CIF1 data file,
> and pass the results into a preparser which would match the data name in the
> CIF1.1 data file with an alias name in the DDLm dictionary, and immediately
> substitute the DDLm data name for the original DDL1 or DDL2 data mame.  Now
> all the problem with the old data names has disappeared.  The preparser
> might have to make other changes to the data value (I am not sure that there
> are any, perhaps adding delimiters to all strings so they could be stripped
> away by the parser?).  At this point you have a fully compliant CIF2-DDLm
> data set, which you can dREL to your heart's content.  In particular, if
> dREL calls for an array, the item associated with that array will contain a
> dREL mothod for assembling the array from the individual data items that
> were originally stored in the input CIF and are now stored under a DDLm
> defined name.  The only thing that would be difficult to do would be to
> reconstruct a DDL1 or DDL2 compliant data output file, but even this could
> be done if it was thought necessary.
> 
> Please let's not make this exercise more confusing than necessary. 
> 
> You guys need to get on with defining what you want in CIF2.  CIF1 can then
> look after itself using the existing tools together with the aliases for
> renaming the items.
> 
> David
> 
> Herbert J. Bernstein wrote:
>       Dear Colleagues,
>
>         Instead of looking at the minimally disruptive approach as a
>       modification to CIF 2, in order to in fact be minimally
>       disruptive, I would suggest looking at CIF 1.5 in terms if what
>       would need to be changed in CIF 1.1 in order to support DDLm.
>
>         I think the following will do it:
>
>         For data values, only, recognize three new initial string
>       delimiters in addition to the existing single quote ("'"),
>       double quote ("\"") and newline-semicolon ("\n;"):
>
>         left brace ("{")
>         left square bracket ("[")
>
>       Unless these are encountered in a left to right scan at a point
>       at which the first character if a data value is expected, the
>       parse remains the same as for CIF 1.1.
>
>       Once the left brace or left square bracket is encountered, then
>       whatever the formally agreed rules for the CIF2 parse are would
>       apply until the balancing terminal right brace or right square
>       bracket.  It is only the top level terminal right brace or right
>       square bracket that would be required to be followed by
>       whitespace.
>
>       The new dictionaries would _not_ be written in CIF 1.5, only in
>       full CIF 2, but parsers would be expected to process any CIF not
>       clearly self-identifying as a CIF 2 file as a CIF 1.5 file. 
>       This means that the only major use of CIF 2 constructs in CIF
>       1.5 would be to allow users to provide list, matrix and vector
>       data values.
>
>       This also means, for example, as per David's suggestion, that
>       the only way a tag with embedded square brackets or embedded
>       braces would be handled in a new dictionary would be as an
>       alias, but the formality of CIF 1.5 would give applications a
>       clean way to make use of those aliases in parsing data files.
>
>       If we follow this approach, then we would be honoring the
>       published commitment to be able to keep essentially all exsiting
>       data files unchanged, and still be able to handle them with
>       DDLm.  The only exception would be data files that happen to
>       include data values that begin with '{' or '[', which would now
>       have to be quoted. I do not believe that there are many such
>       cases, and I believe that there would be acceptance of the need
>       to add such quoting if encountered.
>
>       To summarize:
>
>         Development of CIF 2 with DDLm support would continue and be
>       used for
>       new dictionaires; and
>
>         Development of CIF 1.5 to serve as a bridge between CIF 1.1
>       and DDLm would start, primarily giving uses the ability to
>       provide list, matrix and vector data values, would be started to
>       allow for a smooth transition to wider use of DDLm and CIF 2
>
>       Regards,
>         Herbert
> 
>
>       =====================================================
>        Herbert J. Bernstein, Professor of Computer Science
>          Dowling College, Kramer Science Center, KSC 121
>               Idle Hour Blvd, Oakdale, NY, 11769
>
>                        +1-631-244-3035
>                        yaya@dowling.edu
>       =====================================================
>
>       On Sun, 29 Nov 2009, SIMON WESTRIP wrote:
>
>             Yes that summarizes the differences. Unfortunately,
>             the single-byte
>             non-delimited strings have to be separated by
>             white space in this approach, which is perhaps
>             counter-intuitive and mght
>             have some legacy issues?
> 
> ___________________________________________________________________________
>             _
>             From: James Hester <jamesrhester@gmail.com>
>             To: Group finalising DDLm and associated
>             dictionaries <ddlm-group@iucr.org>
>             Sent: Sunday, 29 November, 2009 3:45:18
>             Subject: Re: [ddlm-group] Space as a list item
>             separator
>
>             Hi Simon: I'm trying to read between the lines here
>             as to how the syntax we
>             have been discussing diverges from what you have
>             described, and have come up
>             with the following list:
>
>             1. Presumably the []{} characters must be surrounded
>             by whitespace in your
>             version
>             2. We have restricted the character sets of the
>             non-delimited strings and
>             tags more than strictly necessary.
>             3. Comma might be included in the single-byte
>             non-delimited string list
>
>             Are there any other differences that you would
>             identify?
>
>             On Sat, Nov 28, 2009 at 10:58 PM, SIMON WESTRIP
>             <simonwestrip@btinternet.com> wrote:
>                   Dear all
>
>                   I was chatting with the man who 'writes the
>             cheques' yesterday
>                   about some of the
>                   changes he might expect with CIF2, and based
>             on this I feel I
>                   ought to at least have
>                   a go at exploring a 'minimally disruptive'
>             approach, so at the
>                   risk of being shouted at,
>                   here goes at a slightly different way of
>             looking at CIF:
>
>                   CIF contains a list of strings separated by
>             whitespace.
>
>                   A string can be nondelimited or delimited.
>
>                   Nondelimited strings have a restricted
>             character set (minimally
>                   whitespace is excluded)
>
>                   A nondelimited string cannot start with any of
>             the delimiters
>                   (obviously)
>
>                   Nondelimited strings can have special meaning
>             governing what
>                   follows them:
>
>                       reserved words, e.g. loop_
>
>                       tags, e.g. data_ , _foo
>
>                       single-byte nondelimited strings, e.g. [ ]
>             { } :
>
>                   All other strings are treated as raw data
>             values
> 
>
>                   There, least I can say I tried :-)
>
>                   Cheers
>
>                   Simon
> 
> ___________________________________________________________________________
>             _
>             From: SIMON WESTRIP <simonwestrip@btinternet.com>
>             To: Group finalising DDLm and associated
>             dictionaries
>             <ddlm-group@iucr.org>
>             Sent: Saturday, 28 November, 2009 10:01:38
>
>             Subject: Re: [ddlm-group] Space as a list item
>             separator
>
>             I had been under the assumption that the separation
>             of list items by a
>             comma was 'set in stone'
>             (and was one reason for dropping the CIF1 syntax of
>             requiring space
>             after data values),
>             but if its up for negotiation I would opt for using
>             the space as a
>             separator as elsewhere in the CIF,
>             partly because then a list can essentially be
>             treated much like a
>             single-item loop - i.e. same basic parsing
>             of <value><space><value><space>...
>
>             Cheers
>
>             Simon
> 
> ___________________________________________________________________________
>             _
>             From: Herbert J. Bernstein
>             <yaya@bernstein-plus-sons.com>
>             To: Group finalising DDLm and associated
>             dictionaries
>             <ddlm-group@iucr.org>
>             Cc: Nick.Spadaccini@uwa.edu.au
>             Sent: Friday, 27 November, 2009 11:43:10
>             Subject: Re: [ddlm-group] Space as a list item
>             separator
>
>             Dear Colleagues,
>
>               I have no objection to accepting either comma or
>             whitespace
>             as a valid separator in a list.  I can't object -- I
>             have been
>             coding to that standard since 1997, and now would
>             only have to
>             remove the message generated for the case of the
>             space.  We already
>             accept multiple glyphs as valid separators at all
>             levels:
>
>               whitespace itself it one of several character
>             sequences in rather
>             complex combinations:  any number of blanks, tabs,
>             newlines and
>             comments.
>             The comma itself is handled in a complex way.  We
>             accept (or should
>             accept) any whitespace before and after a comma as
>             valid, as in
>             {a,b} versus {a , b }.  Adding the option of leaving
>             out the comma
>             itself and just having the whitespace as the
>             separator make just
>             as much sense.
>
>               I see nothing to be gained by now forbidding the
>             comma.  The meaning
>             of {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or,
>             under this new
>             (and I think more sensibsle and realistic approach)
>             {a . b .} or {a ?
>             b ?}.
>
>               The blank reads particularly well in dealing with
>             vectors and
>             matrices. The comma reads well when dealing with
>             strings.
>
>               I think we would do best with both as valid
>             alternatives (no error,
>             no warning for either one).
>
>               Regards,
>                 Herbert
>             =====================================================
>             Herbert J. Bernstein, Professor of Computer Science
>               Dowling College, Kramer Science Center, KSC 121
>                     Idle Hour Blvd, Oakdale, NY, 11769
>
>                             +1-631-244-3035
>                             yaya@dowling.edu
>             =====================================================
>
>             On Fri, 27 Nov 2009, SIMON WESTRIP wrote:
>
>             > At first glance, you're considering using space
>             instead of commas as
>             list
>             > separators?
>             > which is not so far away from the CIF1 requirement
>             of space
>             following a
>             > delimiter?
>             >
>             > But I'm only on my first cup of coffee this
>             morning :-)
>             >
> >__________________________________________________________________________
>             _
>             _
>             > From: Nick Spadaccini <nick@csse.uwa.edu.au>
>             > To: Group finalising DDLm and associated
>             dictionaries
>             <ddlm-group@iucr.org>
>             > Sent: Friday, 27 November, 2009 7:46:44
>             > Subject: Re: [ddlm-group] Space as a list item
>             separator
>             >
>             >
>             >
>             >
>             > On 27/11/09 2:32 PM, "James Hester"
>             <jamesrhester@gmail.com> wrote:
>             >
>             > > See comments below:
>             > >
>             > > On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini
>             <nick@csse.uwa.edu.au>
>             > wrote:
>             > >> Timely email, come in just after the one I
>             sent.
>             > >>
>             > >> My position is if we specify the syntax then we
>             encourage its
>             correct use
>             > but
>             > >> acknowledge that there may be cases where one
>             might be able to
>             recover
>             > >> intent. But I wouldn?t encourage those cases.
>             > >
>             > > Absolutely, which is why I would like to elevate
>             space-separated
>             list
>             > items to
>             > > be correct syntax rather than 'wrong but intent
>             is clear' syntax.
>             > >>
>             > >> You could say that token separator in lists are
>             a or b or c, but
>             that
>             > just
>             > >> adds a level of complexity for very little
>             gain. The choice of
>             comma
>             > makes it
>             > >> seamless to translate from the raw CIF data
>             straight in to most
>             language
>             > >> specific data declaration. The only language I
>             know that accepts
>             one or
>             > the
>             > >> other or both is MatLab.
>             > >
>             > > Re ease of translation: you speak as if a viable
>             approach to a CIF
>             data
>             > file
>             > > is to take whole text chunks and throw them at
>             some language
>             interpreter,
>             > > without doing your own parse.  Quite apart from
>             being a rather
>             unlikely
>             > > approach, this is impossible, as without parsing
>             you won't know
>             where the
>             > list
>             > > finishes.  If you do do your own parse, you can
>             populate your
>             > datastructures
>             > > directly during the parse, and what list
>             separator was originally
>             used in
>             > the
>             > > data file is completely irrelevant.
>             > >
>             > > Re complexity: not sure how you are planning to
>             deal with
>             whitespace in
>             > the
>             > > formal grammar, but consider the following,
>             where I have assumed
>             that each
>             > > token 'eats up' the following whitespace.
>             > >
>             > > <dataitem> = <dataname><whitespace>+<datavalue>
>             > > <datavalue> = {<list>|<string>}<whitespace>+
>             > > <listdatavalue> = {<list>|<string>}<whitespace>*
>             > > <list> = '[' <whitespace>* {<listdatavalue>
>             > > {<comma><whitespace>*<listdatavalue>}*}* ']'
>             > >
>             > > If we make comma or whitespace possible
>             separators, the last
>             production
>             > > becomes:
>             > > <list> =  '[' <whitespace>* {<listdatavalue>
>             {<comma or
>             > > whitespace><listdatavalue>}*}* ']'
>             > >
>             > > This looks like no extra complexity, and from a
>             user's point of
>             view
>             > > whitespace as an alternative separator is simple
>             to understand and
>             > consistent
>             > > with space as a token separator used everywhere
>             else in CIF. 
>             Anyway, if
>             > > reduction of grammar complexity is your goal,
>             you can just
>             completely
>             > exclude
>             > > commas as list separators!
>             >
>             > Why not? Make them spaces only, and you become
>             consistent across the
>             board.
>             > I have to think about the possibility of
>             pathological cases where
>             spaces
>             > won't work. I can't think of any at the moment.
>             >
>             > >
>             > > Some questions about how commas behave:
>             > > 1: is a trailing comma e.g. [1,2,3,4,] a syntax
>             error?
>             > > 2. are two commas in a row a syntax error? E.g.
>             [1,2,3,,4]
>             >
>             > I would say yes to syntax error. I an easily
>             determine they may need
>             to be
>             > an additional list value, but can't determine
>             what.
>             >
>             > > Note the above productions assume that the
>             answer to both is yes.
>             > >
>             > >>
>             > >> What big advantage to a language is there to
>             specify you can use
>             a comma
>             > or
>             > >> whitespace as a token separator? Will you be
>             happy with the first
>             person
>             > who
>             > >> interprets this as being ok
>             > >>
>             > >> loop_
>             > >>   _severalvalues 1,2,3,4,5,6,7 # these being
>             the 7 values of
>             > severalvalues
>             > >>
>             > > Note sure what you are getting at here: I am
>             proposing the
>             following:
>             > >
>             > > _nicelist      [1 2 3 4 5 6 7]
>             > >
>             > > being the same as
>             > >
>             > > _nicelist      [1,2,3,4,5,6,7]
>             > >
>             > >  Don't see how this relates to loops.
>             >
>             > The point was, once you say a space and comma are
>             equivalent token
>             > separators then will it be an interpretation that
>             they are always so
>             even in
>             > loops? My example was not a list, just 7 values
>             that were separated
>             by
>             > commas not spaces.
>             >
>             > >
>             > > James.
>             > > ------
>             > >>
>             > >> On 27/11/09 11:41 AM, "James Hester"
>             <jamesrhester@gmail.com
>             > >> <http://jamesrhester@gmail.com> > wrote:
>             > >>
>             > >>> Dear All: looking over the list I posted
>             previously of items
>             left to
>             > >>> resolve, I see only one serious one
>             outstanding: whether or not
>             to allow
>             > >>> space as a separator between list items.  Nick
>             has stated:
>             > >>>
>             > >>> " I will propose it has to be a comma, but
>             make the coercion
>             rule that
>             > space
>             > >>> separated values in a list-type object be
>             coerced into comma
>             separated
>             > >>> values. That is, read spaces as you want, but
>             don't encourage
>             them."
>             > >>>
>             > >>> I would like to counter-propose, as Joe did
>             originally, that
>             whitespace
>             > be
>             > >>> elevated to equal status with comma as a valid
>             list separator. 
>             I see no
>             > >>> downside to this.  Would anyone else like to
>             speak to this issue
>             before
>             > we
>             > >>> vote?  In particular, I would be interested to
>             hear why Nick
>             doesn't
>             > want to
>             > >>> encourage spaces.
>             > >>
>             > >> cheers
>             > >>
>             > >> Nick
>             > >>
>             > >> --------------------------------
>             > >> Associate Professor N. Spadaccini, PhD
>             > >> School of Computer Science & Software
>             Engineering
>             > >>
>             > >> The University of Western Australia    t: +61
>             (0)8 6488 3452
>             > >> 35 Stirling Highway                    f: +61
>             (0)8 6488 1089
>             > >> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3:
>             www.csse.uwa.edu.au/~nick
>             > >> <http://www.csse.uwa.edu.au/%7Enick>
>             > >> MBDP  M002
>             > >>
>             > >> CRICOS Provider Code: 00126G
>             > >>
>             > >> e: Nick.Spadaccini@uwa.edu.au
>             <http://Nick.Spadaccini@uwa.edu.au>
>             > >>
>             > >>
>             > >>
>             > >> _______________________________________________
>             > >> ddlm-group mailing list
>             > >> ddlm-group@iucr.org
>             > >>
>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
>             > >>
>             > >
>             > >
>             >
>             > cheers
>             >
>             > Nick
>             >
>             > --------------------------------
>             > Associate Professor N. Spadaccini, PhD
>             > School of Computer Science & Software Engineering
>             >
>             > The University of Western Australia    t: +61 (0)8
>             6488 3452
>             > 35 Stirling Highway                    f: +61 (0)8
>             6488 1089
>             > CRAWLEY, Perth,  WA  6009 AUSTRALIA  w3:
>             www.csse.uwa.edu.au/~nick
>             > MBDP  M002
>             >
>             > CRICOS Provider Code: 00126G
>             >
>             > e: Nick.Spadaccini@uwa.edu.au
>             >
>             >
>             >
>             >
>             > _______________________________________________
>             > ddlm-group mailing list
>             > ddlm-group@iucr.org
>             >
>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
>             >
>             >
>
>             _______________________________________________
>             ddlm-group mailing list
>             ddlm-group@iucr.org
>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> 
>
>             --
>             T +61 (02) 9717 9907
>             F +61 (02) 9717 3145
>             M +61 (04) 0249 4148
> 
> 
>
>     ____________________________________________________________________
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> 
> 
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.