[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <[email protected]>
Subject: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
From: Peter Murray-Rust <[email protected]>
Date: Fri, 4 Mar 2011 08:25:10 +0000
In-Reply-To: <[email protected]>
References: <[email protected]><[email protected]>

I add some comments arising out of my own experience with XML/CML which may be useful. I don't think I am a full member of COMCIFs so feel free to ignore all or any. I comment after significant paragraphs.

On Fri, Mar 4, 2011 at 6:03 AM, James Hester <[email protected]> wrote:

1. A feature should only be added to CIF syntax if all of the
following are satisfied:

(i) implementation or use of equivalent behaviour at dictionary level
is either significantly more cumbersome or not possible;
(ii) the feature provides significant new functionality that is widely
applicable to most scientific domains
(iii) reliable transfer and archiving of data is not compromised
(iv) there is no simpler way of achieving the desired behaviour

I would add:
* a feature should only be added if it has been shown possible to implement it with "reasonable ease". "Rough consensus and running code"
�

Example 2: Unicode support in CIF2. �This is broadly useful, given the
international nature of science and range of symbols used in
scientific papers. �It could have been implemented in dictionaries
using ASCII escapes, but this would have been cumbersome to use, so it
satisfies Principle 1. �We have adopted Unicode (rather than created
our own international character set) and copied the XML character
ranges (Principle 2)

I found the original ASCII escapes difficult/tedious for some code points� and woudl urge full unicode support (with numeric values).

Example 3: Space-separated lists in CIF2. �Lists, especially matrices,
are important in science and cumbersome to implement in dictionaries
(but possible) so lists satisfy principle 1. �Using space separators
is probably less mainstream than using commas - if we had chosen to
use both we would have definitely satisfied rule 2. �I think rule 2
would argue that we should allow both space and comma, but principle
1(iv) would argue choosing one or the other.

We use whitespace separated strings (i.e. including newline, tab, etc.) by default in CML for numeric arrays and matrices. It works well. However for lists of general strings, dates, etc. we allow the author to choose a delimiter which they know is not present in the strings.

Some locales (e.g. DE) use commas for decimal points and this is often added by the operating system. Thus 1.23,3.45 could be emitted as 1,23,3,45. It's possible but tedious to refactor code always to use period as the point.
�

I would also support the use of dictionaries for extending human and machine semantics.

P.

--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Reply to: [list | sender only]

Follow-Ups:

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (James Hester)

References:

Advice on COMCIFS policy regarding compatibility of CIF syntax withother domains (James Hester)

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (James Hester)

Prev by Date: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

Next by Date: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

Prev by thread: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

Next by thread: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

Index(es):

Date

Thread

Discussion List Archives

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains