[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- To: Group finalising DDLm and associated dictionaries <firstname.lastname@example.org>
- Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- From: "Herbert J. Bernstein" <email@example.com>
- Date: Wed, 30 Sep 2009 08:58:46 -0400 (EDT)
- In-Reply-To: <firstname.lastname@example.org>
- References: <C6E123F5.11EB6email@example.com><20090924063136.D23301@epsilon.pair.com><firstname.lastname@example.org>
Dear Colleagues, Bottom line -- what is proposed is a very different language that will use a significantly different lexer and parser from the one used for DDL1 and DDL2 CIFS, guaranteeing to leave us with multiple dialects for a very long time. I think that is a shame -- rather than DDLm consolidating DDL1 and DDL2 and adding useful new features, we are simply going to end up with DDL1, DDL2 and DDL3 as three distinct dialects. I think this is unwise. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 email@example.com ===================================================== On Wed, 30 Sep 2009, James Hester wrote: > I am currently connected to the world via a slow dialup connection, so I will > tend to fewer, more wordy communications. > > There are two issues here which we can treat separately. The > first is the restriction of the character set for non-delimited > strings, to which I have seen no objections so far. Can we therefore > take the expression given by Nick as agreed? For reference it was: > > non-DS = [A-Za-z0-9./-()+?][A-Za-z0-9_./-()+?]* > > There remains then the treatment of whitespace. Following Nick's > visit, I have had some time to ponder this topic and have shifted my > position somewhat. I am not overly swayed by the assertion that > computer language parsers never use whitespace as a delimiter, so > neither should we. A CIF file is different from a computer language > source file. By and large, computer language source files are > created, edited and maintained by humans, who will generally do > whatever they can to improve readability, including using whitespace > to delimit words when appropriate. There is no reason beyond > enforcing readability to use whitespace as a delimiter (NB Python's > use of indentation as semantically meaningful). CIF files, on the > other hand, are almost always computer-generated and computer-read, > and so unless whitespace is required by the standard it will tend to > disappear. This erodes CIF readability, one of the pleasant features > of CIF when compared with other data formats. Therefore, while I > sympathise with the urge to simplify the BNF description, I believe > the complexity introduced by whitespace treatment is the price we pay > for enforcing readability. So I would prefer that all items in a CIF > file are separated by whitespace, where I view a bracket expression as > a single item. > > That said, we need to disallow delimiters inside delimited strings, > even if not followed by whitespace. This would simplify parsing, > editing in delimiter-aware editors, and importation of CIF loops into > other software (e.g. spreadsheet software often understands double and > single quote delimited strings, and whitespace as a delimiter). It > also simplifies treatment of delimited strings inside bracket > structures, where one might expect that a comma or close bracket could > follow immediately after a string closing delimiter. > > A concern for backwards compatibility has been expressed. There are three > different types of compatibility issues that I can see: > > 1. Ability of legacy software to read new-style (CIF 1.2) CIF files > 2. Ability of legacy software to write new-style CIF files > 3. Need for remediation of old-style CIFs. > 4. Upgrade burden on software writers > > Regarding reading: as soon as a triple quote or bracket construct > appears in a CIF file, legacy software will not parse the CIF > correctly. I would suggest that it is therefore pointless to worry > about incompatibilities in the details of string-handling also > breaking the parse. Quite the opposite, if we are going to break > compatibility, we might as well do it all at once so that the > programmer only has to edit their code once. > > Regarding writing: I believe that a policy decision has been made not > to redefine existing datanames to use bracket constructs. Therefore, > current CIF software for outputting CIFs falls into three categories: > (a) software with conservative string handling - all non-numeric data > delimited by quotes, even if not necessary under CIF 1.1 > (b) software which puts the "#CIF1.1" magic comment at the top of its files, > but outputs strings that might not be correct under CIF 1.2 > (c) software with no "#CIF1.1" magic comment and incorrect CIF 1.2 string > handling. > > I would suggest that only type (c) is of concern, and that these files are > easily caught and "#CIF1.1" added to the top. > > Need for remediation: as Nick has said, this simply means putting a > "#CIF1.1" string sequence at the top of every file that doesn't have > one. > > Upgrade burden: I think this is where we have to tread carefully, as a > large part of the success of CIF1.2 will depend on the provision of > programs that support it. For this reason, Nick's proposal to minimise > the number of string productions is welcome as it translates into > reduced work for the programmer. Removing use of delimiters > internally if not followed by whitespace also simplifies things in a > small way for the programmer. > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > _______________________________________________ > ddlm-group mailing list > firstname.lastname@example.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > _______________________________________________ ddlm-group mailing list email@example.com http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Nick Spadaccini)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Herbert J. Bernstein)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)
- Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.