Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Space as a list item separator

Sorry something got lost in the prior message.  It should have
read:

> Dear Colleagues,
>
>  Back to the question of commas.  If you accept the desirability of
> having a CIF 1.5, commas in lists become very useful.  Someone with
> a CIF 1.1 editor will be able to prepare a CIF 1.5 file for many
> useful cases by doing all lists with commas and no embedded blanks
> as long as they can make their lists fit on single lines.  In CIF 1.1
>
> [[1,2,3],[4,5,6],[7,8,9]]
>
> is a valid value for a tag, but
>
> [[1 2 3] [4 5 6] [7 8 9]]
>
> is not.
>
> Having the option of commas in lists will help to smooth the
> transition for at least some people.
>
> Regards,
>  Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Mon, 30 Nov 2009, Herbert J. Bernstein wrote:

> Dear Colleagues,
>
>  Back to the question of commas.  If you accept the desirability of
> having a CIF 1.5, commas in lists become very useful.  Someone with
> a CIF 1.1 editor will be able to prepare a CIF 1.5 file for many
> useful cases by doing all lists with commas and no embedded blanks
> as long as they can make their lists fit on single lines.  In CIF 1.1
>
> [[1,2,3],[4,5,6],[7,8,9]]
>
> is a valid value for a tag, but
>
> [[1 2 3] [4 5 6] [7 8 9]]
>
> Having the option of commas in lists will help to smooth the
> transition for at least some people.
>
> Regards,
>  Herbert
>
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
>
> On Mon, 30 Nov 2009, SIMON WESTRIP wrote:
>
>> OK - one last bash at explaining my recent description of base CIF syntax
>> and the motivation
>> behind it.
>> 
>> I'll start with the motivation:
>> 
>> (1) reduce the restriction on the character set of non-delimited strings.
>> 
>> In CIF2 the character set of nondelimited strings has been restricted to
>> disallow e.g. ' in O1' because this can lead to ambiguity in e.g. lists.
>> However, lists require a separator (say its a comma for the sake of
>> argument), so
>> O1' can be included in a list, e.g. [O1',O1',O1'].
>> The important point here is that we require separators
>> between tokens, so these separators have significance in parsing and
>> effectively terminate a nondelimited string.
>> Obviously, the separators cannot be part of a nondelimited string, which is
>> why
>> I specified a single separator in my recent description.
>> 
>> (2) reduce the base syntax as far as possible to something that is readily
>> parsable
>> by both machine and human, and can be seen as set-in-stone so that we dont
>> have the same problems when going from CIF2 to CIF3 that we have in going
>> from CIF1 to
>> CIF2.
>> 
>> I think we are all agreed that one of the aims in defining CIF2 is to 
>> define
>> something
>> that will be the base for all future CIF versions, so there's nothing new
>> here.
>> 
>> My particular approach to the description is irrelevant in many respects,
>> as CIF2 will be defined unambiguously.
>> However, it was an attempt to reconcile current CIF1 with CIF2 - e.g. using
>> the concept of
>> separators rather than delimiters that effectively include the separator in
>> their definition, and
>> describing everything in terms of delimited and nondelimited strings.
>> 
>> Actually, I will not elaborate on my description further as the main point
>> in this message is given
>> in (1) above. I've been trying to find examples that break my assertion 
>> that
>> a delimiter can be
>> contained in a nondelimited string as long as its not the first character -
>> perhaps
>> someone can put me out of my misery?
>> 
>> Cheers
>> 
>> Simon
>> 
>> 
>> 
>> 
>> ____________________________________________________________________________
>> From: SIMON WESTRIP <simonwestrip@btinternet.com>
>> To: Nick.Spadaccini@uwa.edu.au; Group finalising DDLm and associated
>> dictionaries <ddlm-group@iucr.org>
>> Sent: Monday, 30 November, 2009 9:32:07
>> Subject: Re: [ddlm-group] Space as a list item separator
>> 
>> Yes I agree - my wording "dropping the CIF1 syntax of requiring space after
>> data values" was simply careless here.
>> 
>> 
>> 
>> ____________________________________________________________________________
>> From: Nick Spadaccini <nick@csse.uwa.edu.au>
>> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
>> Sent: Monday, 30 November, 2009 6:22:00
>> Subject: Re: [ddlm-group] Space as a list item separator
>> 
>> James has already elaborated on this but for the record we have dropped 
>> that
>> ?adelimiter character and one whitespace? is MANDATED to be the token 
>> delimiter.
>> We still require a space as a token separator, it is not elevated to being
>> part of the delimiter. If not there as a separator we have ways of
>> recovering with coercion rules. Clearly a whitespace is necessary to
>> separate non-delimited strings because they have no delimiting character.
>> 
>> This more consistent approach lead to grammar rules that were the same
>> whether tokens were inside the new compound data types of not.
>> 
>> The previous discussions on this list elaborate on these points.
>> 
>> 
>> On 28/11/09 6:01 PM, "SIMON WESTRIP" <simonwestrip@btinternet.com> wrote:
>>
>>       I had been under the assumption that the separation of list
>>       items by a comma was 'set in stone'
>>       (and was one reason for dropping the CIF1 syntax of requiring
>>       space after data values),
>>       but if its up for negotiation I would opt for using the space as
>>       a separator as elsewhere in the CIF,
>>       partly because then a list can essentially be treated much like
>>       a single-item loop - i.e. same basic parsing
>>       of <value><space><value><space>...
>>
>>       Cheers
>>
>>       Simon
>> 
>> ____________________________________________________________________________
>>       From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
>>       To: Group finalising DDLm and associated dictionaries
>>       <ddlm-group@iucr.org>
>>       Cc: Nick.Spadaccini@uwa.edu.au
>>       Sent: Friday, 27 November, 2009 11:43:10
>>       Subject: Re: [ddlm-group] Space as a list item separator
>>
>>       Dear Colleagues,
>>
>>          I have no objection to accepting either comma or whitespace
>>       as a valid separator in a list.  I can't object -- I have been
>>       coding to that standard since 1997, and now would only have to
>>       remove the message generated for the case of the space.  We
>>       already
>>       accept multiple glyphs as valid separators at all levels:
>>
>>         whitespace itself it one of several character sequences in
>>       rather
>>       complex combinations:  any number of blanks, tabs, newlines and
>>       comments.
>>       The comma itself is handled in a complex way.  We accept (or
>>       should accept) any whitespace before and after a comma as valid,
>>       as in
>>       {a,b} versus {a , b }.  Adding the option of leaving out the
>>       comma
>>       itself and just having the whitespace as the separator make just
>>       as much sense.
>>
>>         I see nothing to be gained by now forbidding the comma.  The
>>       meaning of {a,,b,} is the same as {a,.,b,.} or {a,?,b,?} or,
>>       under this new (and I think more sensibsle and realistic
>>       approach) {a . b .} or {a ? b ?}.
>>
>>         The blank reads particularly well in dealing with vectors and
>>       matrices. The comma reads well when dealing with strings.
>>
>>         I think we would do best with both as valid alternatives (no
>>       error, no warning for either one).
>>
>>         Regards,
>>           Herbert
>>       =====================================================
>>        Herbert J. Bernstein, Professor of Computer Science
>>          Dowling College, Kramer Science Center, KSC 121
>>               Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                        +1-631-244-3035
>>                        yaya@dowling.edu
>>       =====================================================
>>
>>       On Fri, 27 Nov 2009, SIMON WESTRIP wrote:
>>
>>       > At first glance, you're considering using space instead of
>>       commas as list
>>       > separators?
>>       > which is not so far away from the CIF1 requirement of space
>>       following a
>>       > delimiter?
>>       >
>>       > But I'm only on my first cup of coffee this morning :-)
>>       >
>>       >___________________________________________________________________________
>>       _
>>       > From: Nick Spadaccini <nick@csse.uwa.edu.au>
>>       > To: Group finalising DDLm and associated dictionaries
>>       <ddlm-group@iucr.org>
>>       > Sent: Friday, 27 November, 2009 7:46:44
>>       > Subject: Re: [ddlm-group] Space as a list item separator
>>       >
>>       >
>>       >
>>       >
>>       > On 27/11/09 2:32 PM, "James Hester" <jamesrhester@gmail.com>
>>       wrote:
>>       >
>>       > > See comments below:
>>       > >
>>       > > On Fri, Nov 27, 2009 at 3:09 PM, Nick Spadaccini
>>       <nick@csse.uwa.edu.au>
>>       > wrote:
>>       > >> Timely email, come in just after the one I sent.
>>       > >>
>>       > >> My position is if we specify the syntax then we encourage
>>       its correct use
>>       > but
>>       > >> acknowledge that there may be cases where one might be able
>>       to recover
>>       > >> intent. But I wouldn?t encourage those cases.
>>       > >
>>       > > Absolutely, which is why I would like to elevate
>>       space-separated list
>>       > items to
>>       > > be correct syntax rather than 'wrong but intent is clear'
>>       syntax.
>>       > >>
>>       > >> You could say that token separator in lists are a or b or
>>       c, but that
>>       > just
>>       > >> adds a level of complexity for very little gain. The choice
>>       of comma
>>       > makes it
>>       > >> seamless to translate from the raw CIF data straight in to
>>       most language
>>       > >> specific data declaration. The only language I know that
>>       accepts one or
>>       > the
>>       > >> other or both is MatLab.
>>       > >
>>       > > Re ease of translation: you speak as if a viable approach to
>>       a CIF data
>>       > file
>>       > > is to take whole text chunks and throw them at some language
>>       interpreter,
>>       > > without doing your own parse.  Quite apart from being a
>>       rather unlikely
>>       > > approach, this is impossible, as without parsing you won't
>>       know where the
>>       > list
>>       > > finishes.  If you do do your own parse, you can populate
>>       your
>>       > datastructures
>>       > > directly during the parse, and what list separator was
>>       originally used in
>>       > the
>>       > > data file is completely irrelevant.
>>       > >
>>       > > Re complexity: not sure how you are planning to deal with
>>       whitespace in
>>       > the
>>       > > formal grammar, but consider the following, where I have
>>       assumed that each
>>       > > token 'eats up' the following whitespace.
>>       > >
>>       > > <dataitem> = <dataname><whitespace>+<datavalue>
>>       > > <datavalue> = {<list>|<string>}<whitespace>+
>>       > > <listdatavalue> = {<list>|<string>}<whitespace>*
>>       > > <list> = '[' <whitespace>* {<listdatavalue>
>>       > > {<comma><whitespace>*<listdatavalue>}*}* ']'
>>       > >
>>       > > If we make comma or whitespace possible separators, the last
>>       production
>>       > > becomes:
>>       > > <list> =  '[' <whitespace>* {<listdatavalue> {<comma or
>>       > > whitespace><listdatavalue>}*}* ']'
>>       > >
>>       > > This looks like no extra complexity, and from a user's point
>>       of view
>>       > > whitespace as an alternative separator is simple to
>>       understand and
>>       > consistent
>>       > > with space as a token separator used everywhere else in CIF.
>>        Anyway, if
>>       > > reduction of grammar complexity is your goal, you can just
>>       completely
>>       > exclude
>>       > > commas as list separators!
>>       >
>>       > Why not? Make them spaces only, and you become consistent
>>       across the board.
>>       > I have to think about the possibility of pathological cases
>>       where spaces
>>       > won't work. I can't think of any at the moment.
>>       >
>>       > >
>>       > > Some questions about how commas behave:
>>       > > 1: is a trailing comma e.g. [1,2,3,4,] a syntax error?
>>       > > 2. are two commas in a row a syntax error? E.g. [1,2,3,,4]
>>       >
>>       > I would say yes to syntax error. I an easily determine they
>>       may need to be
>>       > an additional list value, but can't determine what.
>>       >
>>       > > Note the above productions assume that the answer to both is
>>       yes.
>>       > >
>>       > >>
>>       > >> What big advantage to a language is there to specify you
>>       can use a comma
>>       > or
>>       > >> whitespace as a token separator? Will you be happy with the
>>       first person
>>       > who
>>       > >> interprets this as being ok
>>       > >>
>>       > >> loop_
>>       > >>   _severalvalues 1,2,3,4,5,6,7 # these being the 7 values
>>       of
>>       > severalvalues
>>       > >>
>>       > > Note sure what you are getting at here: I am proposing the
>>       following:
>>       > >
>>       > > _nicelist      [1 2 3 4 5 6 7]
>>       > >
>>       > > being the same as
>>       > >
>>       > > _nicelist      [1,2,3,4,5,6,7]
>>       > >
>>       > >  Don't see how this relates to loops.
>>       >
>>       > The point was, once you say a space and comma are equivalent
>>       token
>>       > separators then will it be an interpretation that they are
>>       always so even in
>>       > loops? My example was not a list, just 7 values that were
>>       separated by
>>       > commas not spaces.
>>       >
>>       > >
>>       > > James.
>>       > > ------
>>       > >>
>>       > >> On 27/11/09 11:41 AM, "James Hester"
>>       <jamesrhester@gmail.com
>>       > >> <http://jamesrhester@gmail.com> > wrote:
>>       > >>
>>       > >>> Dear All: looking over the list I posted previously of
>>       items left to
>>       > >>> resolve, I see only one serious one outstanding: whether
>>       or not to allow
>>       > >>> space as a separator between list items.  Nick has stated:
>>       > >>>
>>       > >>> " I will propose it has to be a comma, but make the
>>       coercion rule that
>>       > space
>>       > >>> separated values in a list-type object be coerced into
>>       comma separated
>>       > >>> values. That is, read spaces as you want, but don't
>>       encourage them."
>>       > >>>
>>       > >>> I would like to counter-propose, as Joe did originally,
>>       that whitespace
>>       > be
>>       > >>> elevated to equal status with comma as a valid list
>>       separator.  I see no
>>       > >>> downside to this.  Would anyone else like to speak to this
>>       issue before
>>       > we
>>       > >>> vote?  In particular, I would be interested to hear why
>>       Nick doesn't
>>       > want to
>>       > >>> encourage spaces.
>>       > >>
>>       > >> cheers
>>       > >>
>>       > >> Nick
>>       > >>
>>       > >> --------------------------------
>>       > >> Associate Professor N. Spadaccini, PhD
>>       > >> School of Computer Science & Software Engineering
>>       > >>
>>       > >> The University of Western Australia    t: +61 (0)8 6488
>>       3452
>>       > >> 35 Stirling Highway                    f: +61 (0)8 6488
>>       1089
>>       > >> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3:
>>       www.csse.uwa.edu.au/~nick <http://www.csse.uwa.edu.au/%7Enick>
>>       > >> <http://www.csse.uwa.edu.au/%7Enick>
>>       > >> MBDP  M002
>>       > >>
>>       > >> CRICOS Provider Code: 00126G
>>       > >>
>>       > >> e: Nick.Spadaccini@uwa.edu.au
>>       <http://Nick.Spadaccini@uwa.edu.au>
>>       > >>
>>       > >>
>>       > >>
>>       > >> _______________________________________________
>>       > >> ddlm-group mailing list
>>       > >> ddlm-group@iucr.org
>>       > >> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>       > >>
>>       > >
>>       > >
>>       >
>>       > cheers
>>       >
>>       > Nick
>>       >
>>       > --------------------------------
>>       > Associate Professor N. Spadaccini, PhD
>>       > School of Computer Science & Software Engineering
>>       >
>>       > The University of Western Australia    t: +61 (0)8 6488 3452
>>       > 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>       > CRAWLEY, Perth,  WA  6009 AUSTRALIA  w3:
>>       www.csse.uwa.edu.au/~nick <http://www.csse.uwa.edu.au/%7Enick>
>>       > MBDP  M002
>>       >
>>       > CRICOS Provider Code: 00126G
>>       >
>>       > e: Nick.Spadaccini@uwa.edu.au
>>       >
>>       >
>>       >
>>       >
>>       > _______________________________________________
>>       > ddlm-group mailing list
>>       > ddlm-group@iucr.org
>>       > http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>       >
>>       >
>>
>>   ________________________________________________________________________
>>       _______________________________________________
>>       ddlm-group mailing list
>>       ddlm-group@iucr.org
>>       http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> 
>> 
>> cheers
>> 
>> Nick
>> 
>> --------------------------------
>> Associate Professor N. Spadaccini, PhD
>> School of Computer Science & Software Engineering
>> 
>> The University of Western Australia    t: +61 (0)8 6488 3452
>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>> MBDP  M002
>> 
>> CRICOS Provider Code: 00126G
>> 
>> e: Nick.Spadaccini@uwa.edu.au
>> 
>> 
>> 
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.