[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- From: James Hester <jamesrhester@gmail.com>
- Date: Fri, 4 Mar 2011 17:03:29 +1100
- In-Reply-To: <AANLkTikfLNd6mQB9hB9haGek_52ceO3GjXrtAR5tbsnj@mail.gmail.com>
- References: <AANLkTikfLNd6mQB9hB9haGek_52ceO3GjXrtAR5tbsnj@mail.gmail.com>
Here are a set of 3 principles that I think are worth discussing here on COMCIFS, and probably should have been discussed before we embarked on CIF2. =========== Principles guiding development of CIF syntax ----------------------------------------------------------------- Preamble: The CIF syntax describes a human-readable, syntactic container for scientific data. CIF syntax aims to be as simple as possible. The domain dictionaries are the primary location of semantic information in the Crystallographic Information Framework. 1. A feature should only be added to CIF syntax if all of the following are satisfied: (i) implementation or use of equivalent behaviour at dictionary level is either significantly more cumbersome or not possible; (ii) the feature provides significant new functionality that is widely applicable to most scientific domains (iii) reliable transfer and archiving of data is not compromised (iv) there is no simpler way of achieving the desired behaviour 2. As long as the requirements in (1) are satisfied, the CIF framework should: (i) behave in a way that is consistent with common usage (ii) align with pre-existing standards where those standards provide the required behaviour. CIF1 can be considered a pre-existing standard for CIF2 in this context. 3. Non-technical issues should be dealt with in non-technical arenas. (End) ============ Justifications for these principles: Preamble "CIF aims to have as simple a syntax as possible": this is desirable for two reasons: human readability, and maximising the flexibility of the data model with which the dictionary definition languages will work. The syntax makes relatively few assumptions about the most appropriate way to describe scientific data, meaning that the DDL language has a broad scope for creating data structures. Principle 1 If we wish to have a simple syntax, we need to avoid complicating it if at all possible, without excluding new features which are generally useful and significantly more efficiently implemented in syntax. Principle 2 We should not make CIF less accessible than necessary, and should not make more work for ourselves and others if other standards already meet our needs Principle 3 We have a syntax standard (the technical arena). We also have mailing lists, committees, documentation, Wikipedia, journal policy and various other avenues for disseminating information and countering misconceptions. Where principles (1) and (2) conflict, long-term maintenance of a standard that meets the goals in the preamble requires that principle (1) should be the priority and therefore that other avenues should be used to address the non-technical issues. For example, concerns about use of delimiters being inconsistent with the use in some other domain could be addressed by explicit notes in documentation or comments in CIF template files, depending on the persons likely to be confused and the expected magnitude of the problem. Examples: Example 1:The idiosyncratic characteristic of CIF1 files that quote delimiters could appear within strings delimited by the same quotes, provided they were not followed by whitespace. This "feature" provided marginal extra functionality compared to the simpler rule of no delimiters in a string, so fails principles 1(ii) and 1(iv); and is inconsistent with mainstream usage, rule 2(i). It has been removed from CIF2. Example 2: Unicode support in CIF2. This is broadly useful, given the international nature of science and range of symbols used in scientific papers. It could have been implemented in dictionaries using ASCII escapes, but this would have been cumbersome to use, so it satisfies Principle 1. We have adopted Unicode (rather than created our own international character set) and copied the XML character ranges (Principle 2) Example 3: Space-separated lists in CIF2. Lists, especially matrices, are important in science and cumbersome to implement in dictionaries (but possible) so lists satisfy principle 1. Using space separators is probably less mainstream than using commas - if we had chosen to use both we would have definitely satisfied rule 2. I think rule 2 would argue that we should allow both space and comma, but principle 1(iv) would argue choosing one or the other. Example 4: Triple-quoted strings in CIF2. In the current draft these provide no new functionality beyond the ability to quote semicolon-delimited strings, so should probably be rejected unless new functionality can be added. Such new functionality would be the ability to quote arbitrary strings (this may be exaggerating the "significant" in principle 1(ii)). In keeping with principle 2(i), the eliding mechanism should be <backslash><delimiter> as this is the most widespread approach and not markedly more complex than the current proposal of using <backslash><eol>. In keeping with principle 1(i) and 1(ii), no other escape sequences should be defined as they are easily definable at dictionary level (if needed) and do not provide behaviour that is generally needed. In keeping with principle (3), if there are concerns relating to user acceptance or user confusion, they should be addressed in documentation and by providing reference software (for example). James. On Tue, Mar 1, 2011 at 3:12 PM, James Hester <jamesrhester@gmail.com> wrote: > Dear COMCIFS members: > > The DDLm group is currently engaging in developing an elide mechanism > for the CIF2 standard. Our deliberations have reached something of an > impasse due to disagreement around the use of triple quotes as a > string delimiter. Python is a popular programming language that also > uses triple quotes to delimit strings. One side of the discussion > considers that use of triple quotes as a string delimiter means that > all escape sequences recognised by Python should also be recognised by > CIF, in order to avoid confusion and improve consistency with > mainstream (ie Python) practice. The other side of the discussion > sees little to benefit to CIF from including the additional ten or so > escape sequences and advocates leaving them out of the CIF2 standard, > instead adopting the minimal number of escape sequences to allow > eliding. > > We would like COMCIFS participants to provide some input as to the > appropriate policy to be followed in this situation: should we seek > maximum consistency with other usage of identical syntactical > constructs, despite the imposition of unnecessary technical baggage? > Or should we produce a standard as simple and streamlined as possible, > despite the potential for confusion and unorthodox behaviour? > > Details of discussions so far can be found at > http://www.iucr.org/__data/iucr/lists/ddlm-group/ > > James. > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148
Reply to: [list | sender only]
- Follow-Ups:
- References:
- Prev by Date: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- Next by Date: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- Prev by thread: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- Next by thread: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- Index(es):