[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- To: Group finalising DDLm and associated dictionaries <[email protected]>
- Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- From: "Herbert J. Bernstein" <[email protected]>
- Date: Wed, 30 Sep 2009 08:58:46 -0400 (EDT)
- In-Reply-To: <[email protected]>
- References: <C6E123F5.11EB6%[email protected]><[email protected]><[email protected]>
Dear Colleagues,
Bottom line -- what is proposed is a very different language that will
use a significantly different lexer and parser from the one used for DDL1
and DDL2 CIFS, guaranteeing to leave us with multiple dialects for a very
long time. I think that is a shame -- rather than DDLm consolidating
DDL1 and DDL2 and adding useful new features, we are simply going to end
up with DDL1, DDL2 and DDL3 as three distinct dialects.
I think this is unwise.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[email protected]
=====================================================
On Wed, 30 Sep 2009, James Hester wrote:
> I am currently connected to the world via a slow dialup connection, so I will
> tend to fewer, more wordy communications.
>
> There are two issues here which we can treat separately. The
> first is the restriction of the character set for non-delimited
> strings, to which I have seen no objections so far. Can we therefore
> take the expression given by Nick as agreed? For reference it was:
>
> non-DS = [A-Za-z0-9./-()+?][A-Za-z0-9_./-()+?]*
>
> There remains then the treatment of whitespace. Following Nick's
> visit, I have had some time to ponder this topic and have shifted my
> position somewhat. I am not overly swayed by the assertion that
> computer language parsers never use whitespace as a delimiter, so
> neither should we. A CIF file is different from a computer language
> source file. By and large, computer language source files are
> created, edited and maintained by humans, who will generally do
> whatever they can to improve readability, including using whitespace
> to delimit words when appropriate. There is no reason beyond
> enforcing readability to use whitespace as a delimiter (NB Python's
> use of indentation as semantically meaningful). CIF files, on the
> other hand, are almost always computer-generated and computer-read,
> and so unless whitespace is required by the standard it will tend to
> disappear. This erodes CIF readability, one of the pleasant features
> of CIF when compared with other data formats. Therefore, while I
> sympathise with the urge to simplify the BNF description, I believe
> the complexity introduced by whitespace treatment is the price we pay
> for enforcing readability. So I would prefer that all items in a CIF
> file are separated by whitespace, where I view a bracket expression as
> a single item.
>
> That said, we need to disallow delimiters inside delimited strings,
> even if not followed by whitespace. This would simplify parsing,
> editing in delimiter-aware editors, and importation of CIF loops into
> other software (e.g. spreadsheet software often understands double and
> single quote delimited strings, and whitespace as a delimiter). It
> also simplifies treatment of delimited strings inside bracket
> structures, where one might expect that a comma or close bracket could
> follow immediately after a string closing delimiter.
>
> A concern for backwards compatibility has been expressed. There are three
> different types of compatibility issues that I can see:
>
> 1. Ability of legacy software to read new-style (CIF 1.2) CIF files
> 2. Ability of legacy software to write new-style CIF files
> 3. Need for remediation of old-style CIFs.
> 4. Upgrade burden on software writers
>
> Regarding reading: as soon as a triple quote or bracket construct
> appears in a CIF file, legacy software will not parse the CIF
> correctly. I would suggest that it is therefore pointless to worry
> about incompatibilities in the details of string-handling also
> breaking the parse. Quite the opposite, if we are going to break
> compatibility, we might as well do it all at once so that the
> programmer only has to edit their code once.
>
> Regarding writing: I believe that a policy decision has been made not
> to redefine existing datanames to use bracket constructs. Therefore,
> current CIF software for outputting CIFs falls into three categories:
> (a) software with conservative string handling - all non-numeric data
> delimited by quotes, even if not necessary under CIF 1.1
> (b) software which puts the "#CIF1.1" magic comment at the top of its files,
> but outputs strings that might not be correct under CIF 1.2
> (c) software with no "#CIF1.1" magic comment and incorrect CIF 1.2 string
> handling.
>
> I would suggest that only type (c) is of concern, and that these files are
> easily caught and "#CIF1.1" added to the top.
>
> Need for remediation: as Nick has said, this simply means putting a
> "#CIF1.1" string sequence at the top of every file that doesn't have
> one.
>
> Upgrade burden: I think this is where we have to tread carefully, as a
> large part of the success of CIF1.2 will depend on the provision of
> programs that support it. For this reason, Nick's proposal to minimise
> the number of string productions is welcome as it translates into
> reduced work for the programmer. Removing use of delimiters
> internally if not followed by whitespace also simplifies things in a
> small way for the programmer.
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Nick Spadaccini)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Herbert J. Bernstein)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)
- Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Index(es):

