Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains. .. .

Dear Colleagues,

   I have more than had my say, and will not comment further, other
than to reserve the right to vote against the policy as currently
framed for the reasons previously stated.


At 11:56 AM -0500 3/16/11, Bollinger, John C wrote:
>On Wednesday, March 16, 2011 7:35 AM, Herbert J. Bernstein wrote:
>>    I am glad we are getting closer, but now please consider what 
>>you have written, and what it really means in practical terms:
>>>  Preamble: The CIF syntax describes a human-readable, syntactic
>>>  container for scientific data.
>>The word syntactic is misplaced here, and the "human-readable"
>>constratint was lost years ago with the creation of mmCIF.  As we 
>>have just agreed, the semantics is an important part of the 
>>language, Also, in >practice, one of the most important 
>>contributions of CIF to our science has been the controlled 
>>vocaubulary it has provided, independent of the >form of 
>>expression:  tag-value, XML, HDF5, etc.  In addition, for the PDB, 
>>the important issue is _not_ the human readability, but 
>>the >preservation of all the essential information of a scientific 
>>experiment, and, if you glance throught some Acta C entries, you 
>>will see that even >for small molecules,
>>the days of human readable CIFs are far behind us.   When we
>>make a change, we need to bear all of that in mind.
>Inasmuch as human readability has proven too subjective a criterion 
>to guide the DDLm group on such technical details as it has lately 
>considered, I am happy for that criterion to be rejected.  At the 
>same time, I disagree that human readability as a desired and often 
>achieved result is a lost cause for CIF.  Some people can and do 
>successfully edit CIF by hand, which is possible only because of 
>CIF's human readability.  Maintaining that capability was held by 
>Herbert, me, and others as an important general principle directing 
>some of the CIF 2 work performed by the DDLm group.  If that 
>principle is rejected outright then much of the work to date on CIF 
>2.0 details will have been influenced by a false premise.  In 
>particular, the DDLm group's compromise recommendation about 
>character encoding makes less sense if human readability is not an 
>important factor.
>Herbert writes:
>>I would recommend starting with a clearer expression of what CIF is:
>>CIF is a language for the management of scientific data.  If 
>>combines a controlled vocuabulary with a simple, human-readable 
>>form of expression >(the CIF syntax) backed by rules clarifying the 
>>meaning of the language (the CIF semantics).  The overarching goal 
>>of CIF is to ensure that the >data of the relevant domains can be 
>>generated, transformed, transmitted and archived in ways that 
>>facilitate doing the science involved in ways >that both serve the 
>>individual scientific domains and ensure that different domains can 
>>share information reliably.
>  >=================
>I take that as a proposed insertion at the beginning of the 
>preamble.  It describes the overall CIF system, from data model 
>through syntax and common semantics, up to and including 
>dictionaries.  Although the document's intended audience (COMCIFS 
>and the DDLm group) already has a firm grasp of that information, it 
>would not be harmful to include it.
>Herbert quotes and comments on James's latest text:
>>>  CIF syntax aims to be as simple as
>>>  possible.  The domain dictionaries are the primary location of
>>>  semantic information in the Crystallographic Information Framework.
>>>  In the following, the phrase 'dictionary level' refers either to the
>>>  domain dictionaries, the DDL language in which the domain dictionaries
>>>  are written, or the CIF2 common semantic features specification which
>  >> imposes minimum requirements on the semantics specified by
>>>  dictionaries and DDLs.
>>Given that much modified goals, this next paragraph becomes an 
>>inappropriate strait jacket, misallocating responsibilies.
>James's paragraph is not at odds with Herbert's description of the 
>broader CIF picture.  It is simply a policy statement about the part 
>of that picture on which our present work is focused (i.e. CIF 
>syntax), followed by some definitions of terms.  As part of the 
>definitions, it acknowledges the presence and global scope of a 
>separate common semantic features specification.  Whether the policy 
>statement ("CIF *syntax* aims to be as simple as possible" (emphasis 
>added)) appropriately allocates responsibilities is the crux of the 
>current debate.  A COMCIFS decision will be required here to settle 
>the question at this level, and there will perhaps be a similar 
>policy decision to be made about the common semantic features.
>Herbert continues,
>>   I would suggest we return to what the real practice has been:
>>The CIF language tries for an appropriate balance between 
>>simplicity and sufficient expressive ability to meet the needs of 
>>the scientific domains involved, and changes to the existing syntax 
>>and common semantics should only be made for good reason.  If it is 
>>possible to make a needed change by simply defining a new term in 
>>the controlled vocabulary, in one of the domain dictionaries, then 
>>that option whould be considered first, especially because the 
>>controlled vocabulary is used in other forms of expression, such as 
>>XML and HDF5.  This is what we will call a change "at the 
>>dictionary level".  However, there are times, e.g. with the 
>>introduction of a new dictionary definition langauge, when changes 
>>are needed in the common syntax and semantics that apply to all 
>Even if Herbert's characterization of the historic scope of 
>dictionary-level changes were accepted, that would not imply that 
>continuing such a policy must be the best choice.  However, such a 
>limited characterization ("defining a new term in the controlled 
>vocabulary") does not seem to capture historic practice and intent. 
>For example, DDL2 provides support for much finer-grained data types 
>than CIF 1.1 natively provides, and mmCIF indeed defines such data 
>types, independent of any particular data name.  Consider also the 
>CIF 1.1 "Common Semantic Features" document:
>Paragraph 25, speaking about the character markup conventions used 
>in CIF 1, says "The specification is silent on which fields should 
>be interpreted according to these markup conventions, but the 
>published examples suggest that they may be used in any character 
>field in a CIF data file except as prohibited by a dictionary 
>directive. It is intended that the next CIF version specification 
>shall formally declare where such markup may be used."  Thus, 
>documented CIF 1.1 principles allow dictionaries to control which 
>markup conventions apply to the values of defined items.
>Paragraph 37 says "If it is necessary to convey more complex 
>typographic information than is permitted by these special character 
>codes and conventions, the entire text field should be of a richer 
>content type allowing detailed typographic markup."  Thus CIF 1.1 
>supposes that special semantic rules may be defined -- presumably in 
>a dictionary -- for the values of certain items.
>Overall, it looks like the real practice with CIF 1.1 has indeed 
>been to favor simplicity and stability of the syntax and, to a 
>lesser degree, of the common semantics, delegating considerable 
>control to the dictionary system.  COMCIFS has no obligation to 
>continue that policy, and I urge you to decide based on a 
>consideration of policy goals.  Maintaining consistent policy is 
>only one possible goal among several non-exclusive ones.  Among 
>other possible goals are easing the transition to CIF 2.0, some 
>particular desired degree of backwards compatibility, encouraging 
>development of CIF 2.0 software, and maintaining CIF 2.0's 
>generality and domain-independence.
>John C. Bollinger, Ph.D.
>Department of Structural Biology
>St. Jude Children's Research Hospital
>Email Disclaimer:  www.stjude.org/emaildisclaimer
>comcifs mailing list

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769


Reply to: [list | sender only]