[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Moving on to DDLm
- To: ddlm-group <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Moving on to DDLm
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Mon, 21 Mar 2011 09:09:37 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <AANLkTimsqafZpWEdP=nrDYajTeKFD5D2sGZQSivfKjaJ@mail.gmail.com>
- References: <AANLkTimsqafZpWEdP=nrDYajTeKFD5D2sGZQSivfKjaJ@mail.gmail.com>
On Monday, March 21, 2011 12:57 AM, James Hester wrote: JH>It is apparent to me that we are not as close as I had hoped to JH>finalising CIF2 syntax. I believe that the remaining issues revolve JH>largely around basic CIF2 semantics, and are limited to: JH> JH>(1) Choice of elide mechanism for triple-quoted strings JH>(2) Inclusion of python-style backslash sequences in triple-quoted strings JH>(3) Meaning of "1.23" vs 1.23 JH>(4) Meaning of <period>, <question mark> and quoted versions thereof JH> JH>While all of these issues need to be resolved, they are not critical JH>to the operation of DDLm or CIF2 in the sense that failsafe strategies JH>exist to avoid the issues. I would therefore like to propose that we JH>adopt the following strategy: JH> JH>(1) Resolve that *only* the above-listed items remain under discussion JH>as far as CIF2 syntax and basic semantics are concerned; JH>(2) Vote to adopt DDLm JH>(3) Vote to adopt dREL JH>(4) Finalise the remaining syntax/semantic issues To the extent that resolution of James's list of remaining CIF issues does not require opening any broader questions -- and I don't foresee that it would -- I think leaving only those aspects of universal CIF syntax and semantics on the table is reasonable. I have devoted some thought over the weekend to the design principle discussion, and I should like to offer it to you now, while it is still fresh. I shall afterward remain silent on the topic at least until COMCIFS is prepared to take it back up. My apologies for the length: On Friday, March 18, 2011 5:20 AM, Herbert J. Bernstein wrote: HJB>No, I do not see a problem with separate syntax and semantics documents any more than I see a problem with separate productions in a grammar. HJB>I do see a problem with _considering_ the design or impact of either syntax or semantics in isolation from each other. I firmly believe that the result of a purely "bottom-up" syntax-first design in isolation from a "top-down" sematics design or of a top-down semantics-first design in isolation from a syntax design is inefficient and likely to walk us into dead-ends. I derive this view from decades of literature on software engineering on failed approaches to software designs, and the continuing success of "the scandanavian method" or "particpatory design" in which work on internal design is intertwined with design of externals. HJB> HJB>As for the requested example -- I already gave one -- the design of the numeric types in CIF 1.0 and CIF 1.1, in which the equivalence classes of numbers (i.e. that 13.45 and 1.345E1 are the "same" number) is simply an assumed semantic feature intimately coupled to the syntax. To give another, much more subtle equivalence class issue, the equivalance of "abc" and abc and 'abc' but the inequivalences of "123" and 123 and of "." and "?" from . and ? HJB>are semantic issues intimately coupled to the syntax. The original design document for CIF was a semantics document with a bit of intertwined syntax intertwined. DDL came later and the pure syntax and semantics documents came long after the intertwined approach _after_ everybody had a clear view of the interaction of CIF1 syntax and semantics. I, for one, do not yet have a clear understanding of those interactions for CIF2 and DDLm. Herbert's points are well made. I agree that the design of numeric types is a soft spot in the CIF 1.x specifications, and that it relies on a close relationship between syntax and semantics. The special data values . and ? depend even more on such a relationship. Herbert is right that CIF 2.0 syntax cannot be designed in isolation from CIF 2.0 semantics, and that these issues in particular should be addressed. The discussion has drifted rather far afield from the original and pressing question, however, which was "[With respect to triple quote syntax,] should we seek maximum consistency with other usage of identical syntactical constructs, despite the imposition of unnecessary technical baggage? Or should we produce a standard as simple and streamlined as possible, despite the potential for confusion and unorthodox behaviour?" I would be happy for COMCIFS to issue broader guidance, as has been suggested, but I hope that decision will not be unduly delayed by a detour into minutiae such as the division and interplay between CIF 2.0 syntax and semantics. In pursuit of broad rather than narrow guidance, therefore, I suggest a change in the terms of discussion. Rather than syntax vs. semantics, it may be more useful to partition CIF into 'base' CIF 2.0, which all CIF 2.0 processors are expected to accept and interpret equivalently, and 'domain-level' CIF encompassing those aspects of CIF semantics and convention that are defined via the dictionary system. The base contains CIF syntax and the common semantics, whereas domain-level CIF adds ontology, constraints, controlled vocabulary, etc.. The key distinction between these layers is, of course, which features "all CIF 2.0 processors are expected to accept and handle equivalently." It is fitting that that dovetails with some of the technical arguments about the triple quote syntax. Base CIF is I think equivalent to "the common syntax and semantics of the CIF language" in Hebert's latest proposed principles. On that basis, I offer this re-couching of the proposed design principles: ============================================== Principles guiding development of Base CIF 2.0 ---------------------------------------------------------------------- Preamble CIF is a framework for exchanging and archiving scientific data, featuring a human-readable, machine-parseable, electronic format designed to serve as an exchange and archive medium. "Base" CIF comprises the definitions and constraints that underlie CIF and apply to all CIF files; those aspects defining the CIF file format are documented in the CIF Syntax specification and the CIF Common Semantic Features specification. Base CIF aims to remain as simple as possible by delegating considerations such as ontology, vocabulary, data relationships, and complex and rich data types to domain dictionaries and the DDL formalisms by which those dictionaries are defined. In the following, the phrase 'domain level' refers to such documents (though it is not anticipated that DDLs will be domain-specific). Definitions and constraints at domain level apply to a particular CIF files only as declared by that file or as required by a particular CIF processor in a particular context. Principles The design of base CIF 2.0 is guided by these principles: 1. A feature should be added to or changed in base CIF only if all of the following are satisfied: (i) Implementation of the desired behavior by changes at the domain level is not feasible, or else such changes, while feasible, would significantly reduce human readability; (ii) the change provides significant new functionality that is widely applicable to most scientific domains (iii) reliable transfer and archiving of data is not compromised (iv) there is no simpler way of achieving the desired behaviour (v) it has been shown possible to implement the change it at a cost commensurate with its benefits, as demonstrated in part by a rough consensus and running code. 2. As long as the requirements in (1) are satisfied, base CIF should: (i) behave in a way that is consistent with common usage (ii) align with pre-existing standards where those standards provide the required behaviour. CIF 1.1 can be considered a pre-existing standard for CIF 2.0 in this context. 3. Non-technical issues should be dealt with in non-technical arenas. 4. Draft changes to base CIF will be made available on the IUCr website for public comment for a period of at least 6 weeks, following which COMCIFS voting members, after consideration of any objections raised, can vote to accept the change. A change will be accepted if 3/4 of COMCIFS voting members approve it. =============== Best Regards, John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Moving on to DDLm (James Hester)
- Prev by Date: [ddlm-group] Moving on to DDLm
- Next by Date: [ddlm-group] Revisiting list delimiters
- Prev by thread: [ddlm-group] Moving on to DDLm
- Next by thread: [ddlm-group] Searching for a compromise on eliding
- Index(es):