Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF2 semantics

Dear Herbert,

On Wednesday, August 10, 2011 5:11 AM, you wrote:

>I don't understand how John's suggestion would work in practice.

>The most important thing is, I don't understand what problem is
>being solved.


During this discussion you have sometimes remarked (I paraphrase) that an unquoted data value having numeric form is of type 'numb'.  The problem being solved is that you are wrong.  The solution I propose is to adjust the CIF2 specifications so that you are right (for CIF2).

In particular, your claim is wrong with respect to CIF 1.1 because dictionary definitions are specified to override any data type implicit in a value's lexical form.  Thus the value 1 is *not* of type 'numb' if an in-scope dictionary definition declares otherwise.  But that highlights an extra facet of the problem: the type of numeric-form data values is in fact _indeterminate_ outside any particular processing context, and CIF 1.1 actually requires that non-determinism to be resolved inconsistently.  Thus, it is impossible give a general answer to whether the following two CIFs mean the same thing:

#\#CIF_1.1
data_example
_char_or_numb 0.01
# End of CIF

#\#CIF_1.1
data_example
_char_or_numb 1e-2
# End of CIF

In fact, the question would have no general answer even if the CIFs explicitly expressed compliance with a dictionary defining _char_or_numb, because processors are not required to consider that definition.

As a practical matter, this problem is the source of user surprise when a program such as cif2cif reformats values that the CIF author did not intend to be interpreted as numbers.  My proposed solution does not require different behavior from cif2cif; instead it requires different expectations from CIF authors.  In particular, authors must expect that values that look like numbers will be treated as numbers.

There does remain the issue of how to consistently handle the case where a provided value is a number, but the processor intends to honor an item definition requiring a string value.  CIF's historical legacy demands that the value be provided to applications as a string (rather than, for example, a validation error being raised).  CIF 1.1 dictates that that string in fact be the particular sequence of characters with which the number was expressed.  CIF 1.1's requirement yields consistency, but if there is any meaningful distinction between numb and char, then the original character sequence is not inherent in numb values.  CIF 1.1's prescription is therefore incompatible with determinate data typing.

For that case I propose instead to decouple values' lexical typing from their dictionary-defined semantic typing.  Instead of relying on numbers' extrinsic lexical form for a string representation, I propose to use a consistent, yet-to-be-determined form, dependent only on numbers' intrinsic characteristics.  That would admittedly introduce differences between certain data values that a dictionary-driven CIF2 processor would provide to an application vs. those that a similar CIF1 processor would provide, but only where your advice to quote numeric-appearing 'char' values is not followed.


>  This really does remind me of the sterile negative prescriptions
>for Fortran in the 1980's and early 90's until the focus changed
>from rewriting the langauge to extendng the language.
>CIF works.  Adding to it can be very useful, but adding new rules
>that make it difficult for existing data and software to be used
>can outweigh the utility of additions.  It is a matter of balance.


CIF does *not* presently work in this regard.  Existing CIF software works consistently only for CIFs that follow additional construction rules beyond those in the CIF specifications.  The CIF 1.1 specifications actually require this inconsistency, so only the combination of CIF 1.1 + additional rules works.  I am proposing, therefore, to add those additional rules to CIF2, so that we can indeed say that CIF works.  This would affect existing data only insomuch as they do not already follow the rules needed to work consistently with CIF 1.1 applications.


>Right now, I don't see reasonable balance with things much too
>skewed toward rewriting CIF and not enough consideration for
>continuity of existing uses.


You can view my suggestion as being to return to something closer to the original CIF data typing rules, before dictionaries were introduced.  Given that existing uses seem to continue to include applications from that era, and given that it has no effect on CIFs that already are written for consistent CIF 1.1 processing, I think my proposal promotes maintaining continuity of uses.


John




Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.