Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Advice on COMCIFS policy regarding compatibility of CIFsyntaxwith other domains. .

Dear Colleagues,

   My apolgies for more saying more.  I am responding only to the direct 
question asked of me by James, and will try to avoid further comment
after this.

> I will close by asking Herbert: Do you see the approach of having
> separate syntax and a common semantics documents as being undesirable?
> If so, why?  If you assert that syntax and semantics are inseparable,
> can you give an example from CIF1.1 to make your point?

No, I do not see a problem with separate syntax and semantics documents
any more than I see a problem with separate productions in a grammar.
I do see a problem with _considering_ the design or impact of either
syntax or semantics in isolation from each other.  I firmly believe
that the result of a purely "bottom-up" syntax-first design in
isolation from a "top-down" sematics design or of a top-down 
semantics-first design in isolation from a syntax design is inefficient
and likely to walk us into dead-ends.  I derive this view from
decades of literature on software engineering on failed approaches
to software designs, and the continuing success of "the scandanavian
method" or "particpatory design" in which work on internal design
is intertwined with design of externals.

As for the requested example -- I already gave one -- the design of
the numeric types in CIF 1.0 and CIF 1.1, in which the equivalence
classes of numbers (i.e. that 13.45 and 1.345E1 are the "same"
number) is simply an assumed semantic feature intimately coupled
to the syntax.  To give another, much more subtle equivalence class
issue, the equivalance of "abc" and abc and 'abc' but the
inequivalences of "123" and 123 and of "." and "?" from . and ?
are semantic issues intimately coupled to the syntax.  The original
design document for CIF was a semantics document with
a bit of intertwined syntax intertwined.  DDL came later and the pure 
syntax and semantics documents came long after the intertwined approach 
_after_ everybody had a clear view of the interaction of CIF1 syntax
and semantics.  I, for one, do not yet have a clear understanding
of those interactions for CIF2 and DDLm.

Regards,
   Herbert
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Fri, 18 Mar 2011, James Hester wrote:

> I will be somewhat terse in my reply due to time constraints.  I will
> not address the many ideas raised in Herbert's revised preamble unless
> we can get to some agreement on the principles.
>
> What I have agreed with is that there is an in-principle need for a
> "common semantics" document that is logically separate from the syntax
> document.  A key issue I have with Herbert's completely revised
> formulation below is that it seeks to bundle "common semantics" into
> the syntax document.  I have deliberately separated out semantics from
> syntax for clarity of specification and discussion.  Putting them back
> together in the guidelines simply complicates our task for no reason
> that I can discern.
>
> Note that having separate specifications for syntax and common
> semantics does not mean that programmers are required to separate
> syntax and semantics when writing applications, or that educational
> materials are required to do this.
>
> A further issue with Herbert's new formulation is that the role of the
> DDL is either ignored or conflated with "common semantics".  While
> some DDL changes may require changes to core syntax, there is a vast
> amount of semantics that can be defined within the confines of a DDL,
> with no need to appeal to "common semantics" or changes in syntax.
> Therefore the new principle 1(b) below needs, at the very least, to be
> redrafted to include the possibility of using DDL-based mechanisms to
> implement the required semantics - most obviously, by adding a new
> type in com_val.dic in the case of DDLm.
>
> I will close by asking Herbert: Do you see the approach of having
> separate syntax and a common semantics documents as being undesirable?
> If so, why?  If you assert that syntax and semantics are inseparable,
> can you give an example from CIF1.1 to make your point?
>
> On Wed, Mar 16, 2011 at 11:35 PM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>> Dear James,
>>
>>  I am glad we are getting closer, but now please consider what
>> you have written, and what it really means in practical terms:
>>
>>> Preamble: The CIF syntax describes a human-readable, syntactic
>>> container for scientific data.
>>
>> The word syntactic is misplaced here, and the "human-readable"
>> constratint was lost years ago with the creation of mmCIF.  As
>> we have just agreed, the semantics is an important part of the language,
>
> Human-readability has in no way been lost.  In the context of
> syntax, what I mean by human-readability refers to:
> (i) use of space-separated tokens (like words in sentences);
> (ii) access to file contents using generic text-editing tools;
> (iii) ability for a human reader to immediately understand their
> location within a CIF datastructure (ie in a loop, reading a value,
> reading a dataname etc.)
>
> In the context of syntax, human-readability does *not* refer to the
> contents of datanames or datavalues.  That would be a semantic
> concern, and as Herbert points out, some datavalues are not
> intelligible to the human reader.
>
> We have never disagreed that semantics is important to CIF.  For
> anyone to think otherwise would be bizarre.  What we recently agreed
> was that there was an in-principle need for a 'common semantic
> features' document.
>
>> Also, in practice, one of the most important contributions
>> of CIF to our science has been the controlled vocaubulary it has
>> provided, independent of the form of expression:  tag-value, XML,
>> HDF5, etc.  In addition, for the PDB, the important issue is _not_
>> the human readability, but the preservation of all the essential
>> information of a scientific experiment, and, if you glance throught
>> some Acta C entries, you will see that even for small molecules,
>> the days of human readable CIFs are far behind us.   When we make a change,
>> we need to bear all of that in mind.
>
> Yes, controlled vocabulary is an important contribution.  Another
> important contribution is the entire ontological framework. I do not
> see that the guidelines as I've presented them impact negatively on
> those areas. Furthermore, I maintain that those human-readable aspects
> that I have listed above are worth preserving.  Note that I do not
> even wish to exclude the possibility of an alternative representation
> of CIF datastructures in binary form in future developments, but that
> would necessarily be a different syntax to which the present guiding
> principles are not intended to apply.
>
>> I would recommend starting with a clearer expression of what CIF is:
>>
>> ===================
>> CIF is a language for the management of scientific data.  If combines
>> a controlled vocuabulary with a simple, human-readable form of expression
>> (the CIF syntax) backed by rules clarifying the meaning of the language
>> (the CIF semantics).  The overarching goal of CIF is to ensure that
>> the data of the relevant domains can be generated, transformed, transmitted
>> and archived in ways that facilitate doing the science
>> involved in ways that both serve the individual scientific domains and
>> ensure that different domains can share information reliably.
>> =================
>
> This is not a clearer expression of what CIF is.  It is not a language
> in the accepted sense of the word.  It fulfills some functions
> necessary for the management of scientific data, it does not manage
> the data.  "Facilitation of science" is far too vague a term to be
> useful, as different scientists will legitimately claim different
> features "facilitate" their science.
>
> (Original preamble)
>>> CIF syntax aims to be as simple as
>>> possible.  The domain dictionaries are the primary location of
>>> semantic information in the Crystallographic Information Framework.
>>> In the following, the phrase 'dictionary level' refers either to the
>>> domain
>>> dictionaries, the DDL language in which the domain dictionaries are
>>> written, or the CIF2 common semantic features specification which
>>> imposes minimum requirements on the semantics specified by dictionaries
>>> and DDLs.
>>
>> Given that much modified goals, this next paragraph becomes an
>> inappropriate strait jacket, misallocating responsibilies.  I
>> would suggest we return to what the real practice has been:
>>
>> ============
>> The CIF language tries for an appropriate balance between simplicity
>> and sufficient expressive ability to meet the needs of the scientific
>> domains involved, and changes to the existing syntax and common semantics
>> should only be made for good reason.  If it is possible to make a
>> needed change by simply defining a new term in the controlled vocabulary,
>> in one of the domain dictionaries, then that option whould be considered
>> first, especially because the controlled vocabulary is used in
>> other forms of expression, such as XML and HDF5.  This is what we
>> will call a change "at the dictionary level".  However, there are
>> times, e.g. with the introduction of a new dictionary definition
>> langauge, when changes are needed in the common syntax and semantics
>> that apply to all domains.
>> ================
>
>>>
>>> 1. A feature should only be added to CIF syntax if all of the
>>> following are satisfied:
>>>
>>> (i) Implementation of the desired behavior by
>>> changes at the dictionary level rather than to CIF syntax
>>> is not feasible, or else such changes, while feasible, would
>>> significantly reduce human readability;
>>
>> Then I would suggest the following version of this guideline, recognizing
>> the current division of labor.
>>
>> ==========
>>
>> 1. a. A feature should only be added to or changed in the the common syntax
>> and semantics of the CIF language if implementation of the desired behavior
>> by changes in the controlled vocabulary at the dictionary level is not
>> feasible, or such changes, while feasible, would make it significantly more
>> difficult for either people or software systems to work with the data
>> effectively than when done by a change in the vocabulary; and
>>
>> 1. b. A feature should only be added by changes in the common syntax
>> of the CIF language if implementation of the desired behavior by
>> changes in the common semantics is not feasible, or such changes, while
>> feasible, would make it significantly more difficult for either people or
>> software systems to work with the data effectively than when done
>> by a change in the syntax.
>>
>> ==============
>>
>> -- Herbert
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>   Dowling College, Kramer Science Center, KSC 121
>>        Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                 +1-631-244-3035
>>                 yaya@dowling.edu
>> =====================================================
>>
>> On Wed, 16 Mar 2011, James Hester wrote:
>>
>>> Hi Herbert,
>>>
>>> I agree that there is an in-principle need for a common semantic
>>> features document, and I thank you for directing our attention to this
>>> issue.  John B has suggested a 'base semantics' document to accompany
>>> the 'base syntax' document.  This seems like a workable approach to
>>> me, and we would call the 'base semantics' document 'common semantic
>>> features' in keeping with CIF1.  I would further suggest we hold off
>>> on developing the 'common semantic features' document until we have
>>> finished the syntax.
>>>
>>> Below find a redrafted version of the Preamble and point 1(i) to make
>>> the existence of the common semantic features document clear.
>>>
>>> ===========================================================
>>> Principles guiding development of CIF syntax
>>> -----------------------------------------------------------------
>>>
>>> Preamble: The CIF syntax describes a human-readable, syntactic
>>> container for scientific data.  CIF syntax aims to be as simple as
>>> possible.  The domain dictionaries are the primary location of
>>> semantic information in the Crystallographic Information Framework.
>>> In the following, the phrase 'dictionary level' refers either to the
>>> domain
>>> dictionaries, the DDL language in which the domain dictionaries are
>>> written, or the CIF2 common semantic features specification which
>>> imposes minimum requirements on the semantics specified by dictionaries
>>> and DDLs.
>>>
>>> 1. A feature should only be added to CIF syntax if all of the
>>> following are satisfied:
>>>
>>> (i) Implementation of the desired behavior by
>>> changes at the dictionary level rather than to CIF syntax
>>> is not feasible, or else such changes, while feasible, would
>>> significantly reduce human readability;
>>>
>>> (end of changes)
>>>
>>> On Wed, Mar 16, 2011 at 11:44 AM, Herbert J. Bernstein
>>> <yaya@bernstein-plus-sons.com> wrote:
>>>>
>>>> Dear James,
>>>>
>>>>  I am not objecting to Brian's document.  I think we should keep
>>>> as much of it as possible for CIF2.  The only problem is that it
>>>> is a "semantic" document and your policy according to you and
>>>> John B. seems to want to relegate all semantic issues to the
>>>> dictionaries.  It is that relegation to which I am objecting.
>>>> Most features consist of both syntactic and semantic components,
>>>> and I find it much less confusing to deal with a feature in
>>>> its entirety than to deal with just the syntax.
>>>>
>>>>  Until this discussion, I had thought the intent of the dictionaries
>>>> was to deal with the tag definitions particular to certain domains
>>>> and that both the syantax and semantics of CIF was a global concern.
>>>> I find the relegation of the semantics of CIF2 to the dictionaries
>>>> surprising and recommend against it.  I want to keep Brian's
>>>> document a global document.
>>>>
>>>>  Regards,
>>>>    Herbert
>>>>
>>>> =====================================================
>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>   Dowling College, Kramer Science Center, KSC 121
>>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>>
>>>>                 +1-631-244-3035
>>>>                 yaya@dowling.edu
>>>> =====================================================
>>>>
>>>> On Wed, 16 Mar 2011, James Hester wrote:
>>>>
>>>>> Dear Herbert,
>>>>>
>>>>> Please explain why you think that the latest version of the guiding
>>>>> principles is at variance with the 'Common Semantic Features' document
>>>>> and approach.  For example, what would prevent us from adopting a
>>>>> similar CSF document for CIF2?  It would help if you quoted particular
>>>>> points from the guidelines in your reply.
>>>>>
>>>>> James.
>>>>>
>>>>> On Wed, Mar 16, 2011 at 3:59 AM, Herbert J. Bernstein
>>>>> <yaya@bernstein-plus-sons.com> wrote:
>>>>>>
>>>>>> Dear Colleagues,
>>>>>>
>>>>>>   I would suggest that people review Brian's excellent common
>>>>>> semantic features document for CIF 1.1.  I think keeping those
>>>>>> sort of semantic decisions couple to the syntax decisions for
>>>>>> CIF has worked well, and I do not think the sharp departure
>>>>>> now proposed for handling CIF2 will work as well for the
>>>>>> reasons I stated previously.  It ain't broke.  Why are
>>>>>> we fixing it?  New feautures involve a mix of syntax and
>>>>>> semantics depedending on the feature.  I believe we should
>>>>>> be focusing on features rather than the bin within which
>>>>>> they fit for presentation purposes.
>>>>>>
>>>>>>   Regards,
>>>>>>      Herbert
>>>>>> =====================================================
>>>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>
>>>>>>                  +1-631-244-3035
>>>>>>                  yaya@dowling.edu
>>>>>> =====================================================
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> T +61 (02) 9717 9907
>>>>> F +61 (02) 9717 3145
>>>>> M +61 (04) 0249 4148
>>>>> _______________________________________________
>>>>> comcifs mailing list
>>>>> comcifs@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/comcifs
>>>>
>>>> _______________________________________________
>>>> comcifs mailing list
>>>> comcifs@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/comcifs
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> T +61 (02) 9717 9907
>>> F +61 (02) 9717 3145
>>> M +61 (04) 0249 4148
>>> _______________________________________________
>>> comcifs mailing list
>>> comcifs@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/comcifs
>>
>> _______________________________________________
>> comcifs mailing list
>> comcifs@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/comcifs
>>
>>
>
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs
>

Reply to: [list | sender only]