Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

On Tue, Mar 1, 2011 at 4:12 AM, James Hester <jamesrhester@gmail.com> wrote:
Dear COMCIFS members:

The DDLm group is currently engaging in developing an elide mechanism
for the CIF2 standard.  Our deliberations have reached something of an
impasse due to disagreement around the use of triple quotes as a
string delimiter.  Python is a popular programming language that also
uses triple quotes to delimit strings. One side of the discussion
considers that use of triple quotes as a string delimiter means that
all escape sequences recognised by Python should also be recognised by
CIF, in order to avoid confusion and improve consistency with
mainstream (ie Python) practice.  The other side of the discussion
sees little to benefit to CIF from including the additional ten or so
escape sequences and advocates leaving them out of the CIF2 standard,
instead adopting the minimal number of escape sequences to allow

I have been through a lot of similar stuff in creating CML and want to emphasize that this is a difficult problem and not one that can be tackled in small responses to problems as they arise. I see the following aspects:
* quoting/escaping. All languages suffer from this and there is no escape from infinite regress. (I have Java code where I had to escape twice and I end up with four concatenated backslashes. XML uses a special construct (CDATA) to escape XML within XML.) The triple quotes have the same potential for recursion.
* The inclusion of Python starts to turn CIF from a data markup language to a declarative/functional language (the ultimate example is LISP and modern derivatives). By including executable python you commit either to a wide range of constructs or you have to draw an arbitrary borderline where the power of the language is reduced.

In creating a language there is usually a spectrum between:
* easy-to implement - often verbose, but precise. Authors usually hate it
* easy to write - minimal markup but difficult to process. This relies on having a large and flexible toolchain. It is possible to write specs that are unimplementable. XML had this problem and reduced the power of the language to solve the problem.

By creating its own language CIF has implicitly had to create a considerable toolchain of parsing, validation, semantics. If the community can support the increased demands for tools, fine. Otherwise I suspect we should work with pragmatic approaches, some of which are theoretically broken and some of which may require specific knowledge for authors.


Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge

Reply to: [list | sender only]