[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] CIF2 semantics
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] CIF2 semantics
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Wed, 10 Aug 2011 15:04:45 -0400 (EDT)
- In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA543C16565B4F@11.stjude.org>
- References: <CAM+dB2eL5jrEFBcmGpDe6RTvpv4qfmxXa722XXzaS_zgCjsxKw@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA543C16565B24@11.stjude.org><a06240803ca54b9a20900@149.72.36.242><CAM+dB2eT83aTPYc_Dg2aQAsp9VoWTpBA79RPLne61LFWfcFEZQ@mail.gmail.com><a06240800ca562eee659f@192.168.2.101><CAM+dB2cskNxHZ3mDeJ0uFLG7KbHba7hj=+=mUiqczdj6ivVb7g@mail.gmail.com><alpine.BSF.2.00.1107280520300.57714@epsilon.pair.com><CAM+dB2cQLX7OGoLkMAQm3iuamNYAp7WJazvftQAriT02Po_ybA@mail.gmail.com><CAM+dB2eG29P3UWmbfR2JxUTScB9uE=MN_baasJkzi3arnRodpg@mail.gmail.com><a06240800ca66b35a4c4a@192.168.2.101><8F77913624F7524AACD2A92EAF3BFA543C16565B49@11.stjude.org><alpine.BSF.2.00.1108091536380.47056@epsilon.pair.com><CAM+dB2dPYzDRcaOpfrday847aQ7npFN29qLCd-2=3wRNvh1o2w@mail.gmail.com><alpine.BSF.2.00.1108100554530.18190@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA543C16565B4F@11.stjude.org>
Dear John, I have been discussing the meaning of a string that looks like a number in the absence of a dictionary, saying that that string should be treated as something other than a number. When we were writing the CIF 1.1 spec we did not intend to change the well-established CIF rule that such a string should be treated as a number. If you have found an error in the CIF 1.1 spec which leads to a reading saying something else, then you should propose a correction to the CIF 1.1 spec to conform to the well-established CIF rule, just as I have proposed fixing the error in the CIF 1.1 spec to require the terminal whitespace at the end of a text field. I don't see any problem to be solved in the example you gave. If the 2 CIFs are intended to be processed without a dictionary, they are equivalent. If one or the other is intended to be processed with a dictionary, they may or may not be equivalent depending on what the dictionaries say. How is that a problem? If I don't want any ambiguity, I should quote the strings and they will be treated as different strings even without a dictionary. Maybe I a missing something in what you are saying. This group started with a simple practical charge to get DDLm dictinaries in use. We seem to have drifted very far from that highly desirable goal. Instead we seem to be focused in creating a new and different CIF data format. Why are we doing that? What problem are we solving? I am used to the idea that you start with a user external functional spec, freeze that, and then design to that spec. We seem to be in the software engineering failure mode called specification creep. Please, let us stop and start over. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Wed, 10 Aug 2011, Bollinger, John C wrote: > Dear Herbert, > > On Wednesday, August 10, 2011 5:11 AM, you wrote: > >> I don't understand how John's suggestion would work in practice. > >> The most important thing is, I don't understand what problem is >> being solved. > > > During this discussion you have sometimes remarked (I paraphrase) that > an unquoted data value having numeric form is of type 'numb'. The > problem being solved is that you are wrong. The solution I propose is > to adjust the CIF2 specifications so that you are right (for CIF2). > > In particular, your claim is wrong with respect to CIF 1.1 because > dictionary definitions are specified to override any data type implicit > in a value's lexical form. Thus the value 1 is *not* of type 'numb' if > an in-scope dictionary definition declares otherwise. But that > highlights an extra facet of the problem: the type of numeric-form data > values is in fact _indeterminate_ outside any particular processing > context, and CIF 1.1 actually requires that non-determinism to be > resolved inconsistently. Thus, it is impossible give a general answer > to whether the following two CIFs mean the same thing: > > #\#CIF_1.1 > data_example > _char_or_numb 0.01 > # End of CIF > > #\#CIF_1.1 > data_example > _char_or_numb 1e-2 > # End of CIF > > In fact, the question would have no general answer even if the CIFs > explicitly expressed compliance with a dictionary defining > _char_or_numb, because processors are not required to consider that > definition. > > As a practical matter, this problem is the source of user surprise when > a program such as cif2cif reformats values that the CIF author did not > intend to be interpreted as numbers. My proposed solution does not > require different behavior from cif2cif; instead it requires different > expectations from CIF authors. In particular, authors must expect that > values that look like numbers will be treated as numbers. > > There does remain the issue of how to consistently handle the case where > a provided value is a number, but the processor intends to honor an item > definition requiring a string value. CIF's historical legacy demands > that the value be provided to applications as a string (rather than, for > example, a validation error being raised). CIF 1.1 dictates that that > string in fact be the particular sequence of characters with which the > number was expressed. CIF 1.1's requirement yields consistency, but if > there is any meaningful distinction between numb and char, then the > original character sequence is not inherent in numb values. CIF 1.1's > prescription is therefore incompatible with determinate data typing. > > For that case I propose instead to decouple values' lexical typing from > their dictionary-defined semantic typing. Instead of relying on > numbers' extrinsic lexical form for a string representation, I propose > to use a consistent, yet-to-be-determined form, dependent only on > numbers' intrinsic characteristics. That would admittedly introduce > differences between certain data values that a dictionary-driven CIF2 > processor would provide to an application vs. those that a similar CIF1 > processor would provide, but only where your advice to quote > numeric-appearing 'char' values is not followed. > > >> This really does remind me of the sterile negative prescriptions >> for Fortran in the 1980's and early 90's until the focus changed >> from rewriting the langauge to extendng the language. >> CIF works. Adding to it can be very useful, but adding new rules >> that make it difficult for existing data and software to be used >> can outweigh the utility of additions. It is a matter of balance. > > > CIF does *not* presently work in this regard. Existing CIF software works consistently only for CIFs that follow additional construction rules beyond those in the CIF specifications. The CIF 1.1 specifications actually require this inconsistency, so only the combination of CIF 1.1 + additional rules works. I am proposing, therefore, to add those additional rules to CIF2, so that we can indeed say that CIF works. This would affect existing data only insomuch as they do not already follow the rules needed to work consistently with CIF 1.1 applications. > > >> Right now, I don't see reasonable balance with things much too >> skewed toward rewriting CIF and not enough consideration for >> continuity of existing uses. > > > You can view my suggestion as being to return to something closer to the original CIF data typing rules, before dictionaries were introduced. Given that existing uses seem to continue to include applications from that era, and given that it has no effect on CIFs that already are written for consistent CIF 1.1 processing, I think my proposal promotes maintaining continuity of uses. > > > John > > > > > Email Disclaimer: www.stjude.org/emaildisclaimer > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] CIF2 semantics (Bollinger, John C)
- References:
- [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (Bollinger, John C)
- Re: [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (Herbert J. Bernstein)
- Re: [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (Bollinger, John C)
- Re: [ddlm-group] CIF2 semantics (Herbert J. Bernstein)
- Re: [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (Herbert J. Bernstein)
- Re: [ddlm-group] CIF2 semantics (Bollinger, John C)
- Prev by Date: Re: [ddlm-group] CIF2 semantics
- Next by Date: Re: [ddlm-group] CIF2 semantics
- Prev by thread: Re: [ddlm-group] CIF2 semantics
- Next by thread: Re: [ddlm-group] CIF2 semantics
- Index(es):