Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF2 semantics

Dear John,

   I have been discussing the meaning of a string that looks like a
number in the absence of a dictionary, saying that that string should
be treated as something other than a number.  When we were writing
the CIF 1.1 spec we did not intend to change the well-established
CIF rule that such a string should be treated as a number.  If
you have found an error in the CIF 1.1 spec which leads to a reading
saying something else, then you should propose a correction to the
CIF 1.1 spec to conform to the well-established CIF rule, just
as I have proposed fixing the error in the CIF 1.1 spec to require
the terminal whitespace at the end of a text field.

   I don't see any problem to be solved in the example you gave.
If the 2 CIFs are intended to be processed without a dictionary,
they are equivalent.  If one or the other is intended to be
processed with a dictionary, they may or may not be equivalent
depending on what the dictionaries say.  How is that a problem?
If I don't want any ambiguity, I should quote the strings and
they will be treated as different strings even without a dictionary.

   Maybe I a missing something in what you are saying.

   This group started with a simple practical charge to get DDLm
dictinaries in use.  We seem to have drifted very far from that
highly desirable goal.  Instead we seem to be focused in creating
a new and different CIF data format.  Why are we doing that?
What problem are we solving?

   I am used to the idea that you start with a user external functional 
spec, freeze that, and then design to that spec.  We seem to be in
the software engineering failure mode called specification creep.
Please, let us stop and start over.


  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769


On Wed, 10 Aug 2011, Bollinger, John C wrote:

> Dear Herbert,
> On Wednesday, August 10, 2011 5:11 AM, you wrote:
>> I don't understand how John's suggestion would work in practice.
>> The most important thing is, I don't understand what problem is
>> being solved.
> During this discussion you have sometimes remarked (I paraphrase) that 
> an unquoted data value having numeric form is of type 'numb'.  The 
> problem being solved is that you are wrong.  The solution I propose is 
> to adjust the CIF2 specifications so that you are right (for CIF2).
> In particular, your claim is wrong with respect to CIF 1.1 because 
> dictionary definitions are specified to override any data type implicit 
> in a value's lexical form.  Thus the value 1 is *not* of type 'numb' if 
> an in-scope dictionary definition declares otherwise.  But that 
> highlights an extra facet of the problem: the type of numeric-form data 
> values is in fact _indeterminate_ outside any particular processing 
> context, and CIF 1.1 actually requires that non-determinism to be 
> resolved inconsistently.  Thus, it is impossible give a general answer 
> to whether the following two CIFs mean the same thing:
> #\#CIF_1.1
> data_example
> _char_or_numb 0.01
> # End of CIF
> #\#CIF_1.1
> data_example
> _char_or_numb 1e-2
> # End of CIF
> In fact, the question would have no general answer even if the CIFs 
> explicitly expressed compliance with a dictionary defining 
> _char_or_numb, because processors are not required to consider that 
> definition.
> As a practical matter, this problem is the source of user surprise when 
> a program such as cif2cif reformats values that the CIF author did not 
> intend to be interpreted as numbers.  My proposed solution does not 
> require different behavior from cif2cif; instead it requires different 
> expectations from CIF authors.  In particular, authors must expect that 
> values that look like numbers will be treated as numbers.
> There does remain the issue of how to consistently handle the case where 
> a provided value is a number, but the processor intends to honor an item 
> definition requiring a string value.  CIF's historical legacy demands 
> that the value be provided to applications as a string (rather than, for 
> example, a validation error being raised).  CIF 1.1 dictates that that 
> string in fact be the particular sequence of characters with which the 
> number was expressed.  CIF 1.1's requirement yields consistency, but if 
> there is any meaningful distinction between numb and char, then the 
> original character sequence is not inherent in numb values.  CIF 1.1's 
> prescription is therefore incompatible with determinate data typing.
> For that case I propose instead to decouple values' lexical typing from 
> their dictionary-defined semantic typing.  Instead of relying on 
> numbers' extrinsic lexical form for a string representation, I propose 
> to use a consistent, yet-to-be-determined form, dependent only on 
> numbers' intrinsic characteristics.  That would admittedly introduce 
> differences between certain data values that a dictionary-driven CIF2 
> processor would provide to an application vs. those that a similar CIF1 
> processor would provide, but only where your advice to quote 
> numeric-appearing 'char' values is not followed.
>>  This really does remind me of the sterile negative prescriptions
>> for Fortran in the 1980's and early 90's until the focus changed
>> from rewriting the langauge to extendng the language.
>> CIF works.  Adding to it can be very useful, but adding new rules
>> that make it difficult for existing data and software to be used
>> can outweigh the utility of additions.  It is a matter of balance.
> CIF does *not* presently work in this regard.  Existing CIF software works consistently only for CIFs that follow additional construction rules beyond those in the CIF specifications.  The CIF 1.1 specifications actually require this inconsistency, so only the combination of CIF 1.1 + additional rules works.  I am proposing, therefore, to add those additional rules to CIF2, so that we can indeed say that CIF works.  This would affect existing data only insomuch as they do not already follow the rules needed to work consistently with CIF 1.1 applications.
>> Right now, I don't see reasonable balance with things much too
>> skewed toward rewriting CIF and not enough consideration for
>> continuity of existing uses.
> You can view my suggestion as being to return to something closer to the original CIF data typing rules, before dictionaries were introduced.  Given that existing uses seem to continue to include applications from that era, and given that it has no effect on CIFs that already are written for consistent CIF 1.1 processing, I think my proposal promotes maintaining continuity of uses.
> John
> Email Disclaimer:  www.stjude.org/emaildisclaimer
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.