[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
From: "Herbert J. Bernstein" <[email protected]>
Date: Wed, 30 Sep 2009 08:58:46 -0400 (EDT)
In-Reply-To: <[email protected]>
References: <C6E123F5.11EB6%[email protected]><[email protected]><[email protected]>

Dear Colleagues,

   Bottom line -- what is proposed is a very different language that will
use a significantly different lexer and parser from the one used for DDL1 
and DDL2 CIFS, guaranteeing to leave us with multiple dialects for a very
long time.  I think that is a shame -- rather than DDLm consolidating
DDL1 and DDL2 and adding useful new features, we are simply going to end
up with DDL1, DDL2 and DDL3 as three distinct dialects.

   I think this is unwise.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  [email protected]
=====================================================

On Wed, 30 Sep 2009, James Hester wrote:

> I am currently connected to the world via a slow dialup connection, so I will
> tend to fewer, more wordy communications.
>
> There are two issues here which we can treat separately. The
> first is the restriction of the character set for non-delimited
> strings, to which I have seen no objections so far.  Can we therefore
> take the expression given by Nick as agreed?  For reference it was:
>
> non-DS = [A-Za-z0-9./-()+?][A-Za-z0-9_./-()+?]*
>
> There remains then the treatment of whitespace.  Following Nick's
> visit, I have had some time to ponder this topic and have shifted my
> position somewhat. I am not overly swayed by the assertion that
> computer language parsers never use whitespace as a delimiter, so
> neither should we. A CIF file is different from a computer language
> source file.  By and large, computer language source files are
> created, edited and maintained by humans, who will generally do
> whatever they can to improve readability, including using whitespace
> to delimit words when appropriate.  There is no reason beyond
> enforcing readability to use whitespace as a delimiter (NB Python's
> use of indentation as semantically meaningful). CIF files, on the
> other hand, are almost always computer-generated and computer-read,
> and so unless whitespace is required by the standard it will tend to
> disappear.  This erodes CIF readability, one of the pleasant features
> of CIF when compared with other data formats.  Therefore, while I
> sympathise with the urge to simplify the BNF description, I believe
> the complexity introduced by whitespace treatment is the price we pay
> for enforcing readability. So I would prefer that all items in a CIF
> file are separated by whitespace, where I view a bracket expression as
> a single item.
>
> That said, we need to disallow delimiters inside delimited strings,
> even if not followed by whitespace. This would simplify parsing,
> editing in delimiter-aware editors, and importation of CIF loops into
> other software (e.g. spreadsheet software often understands double and
> single quote delimited strings, and whitespace as a delimiter). It
> also simplifies treatment of delimited strings inside bracket
> structures, where one might expect that a comma or close bracket could
> follow immediately after a string closing delimiter.
>
> A concern for backwards compatibility has been expressed.  There are three
> different types of compatibility issues that I can see:
>
> 1. Ability of legacy software to read new-style (CIF 1.2) CIF files
> 2. Ability of legacy software to write new-style CIF files
> 3. Need for remediation of old-style CIFs.
> 4. Upgrade burden on software writers
>
> Regarding reading: as soon as a triple quote or bracket construct
> appears in a CIF file, legacy software will not parse the CIF
> correctly.  I would suggest that it is therefore pointless to worry
> about incompatibilities in the details of string-handling also
> breaking the parse.  Quite the opposite, if we are going to break
> compatibility, we might as well do it all at once so that the
> programmer only has to edit their code once.
>
> Regarding writing: I believe that a policy decision has been made not
> to redefine existing datanames to use bracket constructs.  Therefore,
> current CIF software for outputting CIFs falls into three categories:
> (a) software with conservative string handling - all non-numeric data
>    delimited by quotes, even if not necessary under CIF 1.1
> (b) software which puts the "#CIF1.1" magic comment at the top of its files,
>    but outputs strings that might not be correct under CIF 1.2
> (c) software with no "#CIF1.1" magic comment and incorrect CIF 1.2 string
>    handling.
>
> I would suggest that only type (c) is of concern, and that these files are
> easily caught and "#CIF1.1" added to the top.
>
> Need for remediation: as Nick has said, this simply means putting a
> "#CIF1.1" string sequence at the top of every file that doesn't have
> one.
>
> Upgrade burden: I think this is where we have to tread carefully, as a
> large part of the success of CIF1.2 will depend on the provision of
> programs that support it. For this reason, Nick's proposal to minimise
> the number of string productions is welcome as it translates into
> reduced work for the programmer.  Removing use of delimiters
> internally if not followed by whitespace also simplifies things in a
> small way for the programmer.
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Nick Spadaccini)

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Herbert J. Bernstein)

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)

Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.