Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF2 semantics

Dear James,

   We differ in our views.  This is very much the same as the
difference in view between languages requiring immediate evaluation
and lazy evaluation, or compile time binding versus run time binding.
It is a choice of efficiency versus power.  For my applications with
users pushing the limits of computer performance, I have to push
for efficiency, and would, therefore, not support the data model
you propose.

I don't understand how John's suggestion would work in practice.
The most important thing is, I don't understand what problem is
being solved.  This really does remind me of the sterile negative
presciptions for Fortran in the 1980's and early 90's until the
focus changed from rewriting the langauge to extendng the language.
CIF works.  Adding to it can be very useful, but adding new rules
that make it difficult for existing data and software to be used
can outweigh the utility of additions.  It is a matter of balance.
Right now, I don't see reasonable balance with things much too
skewed toward rewriting CIF and not enough consideration for
continuity of existing uses.

Regards,
   Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Wed, 10 Aug 2011, James Hester wrote:

> Dear Herbert:  I am not assuming that the dictionary will be accessed after
> parsing. I am merely allowing for the *possibility* that a dictionary will
> be accessed after parsing.  Given this *possibility*, the datamodel I have
> been discussing is applicable.  I repeat, I am not prescribing anything for
> implementations, so an application which already has dictionary information
> available is quite justified in directly converting a numeric string to a
> number immediately on input.  Such an implementation in no way contradicts
> the datamodel I am proposing, as it can be logically justified in terms of
> this datamodel.
> 
> In any case, I suggest we defer discussion of this point until a bit later
> as it does not materially affect actual programs.  A formal datamodel would,
> however, help in clarifying certain subtle aspects of CIF.
> 
> I'd be interested in hearing your thoughts on John B's suggestion to
> disallow interpretation of 'numb' strings as character strings, so that
> anything that looked like a number in the CIF file would necessarily be a
> number.
> 
> On Wed, Aug 10, 2011 at 5:40 AM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>       Normally, I read dictionaries before I read data, so the
>       question of
>       having a number declared as such in a dictionary is known while
>       reading
>       the data.  You seem to be assuming that data will be read before
>       the
>       dictionary.  That certainly is a possible approach, but really
>       an implementation choice, not something intrinsic to CIF.  Lazy
>       evaluation has both advantages and disadvantages.
>         -- Herbert
>
>       =====================================================
>        Herbert J. Bernstein, Professor of Computer Science
>          Dowling College, Kramer Science Center, KSC 121
>               Idle Hour Blvd, Oakdale, NY, 11769
>
>                        +1-631-244-3035
>                        yaya@dowling.edu
>       =====================================================
> 
> On Tue, 9 Aug 2011, Bollinger, John C wrote:
> 
> >
> > On Tuesday, August 09, 2011 5:08 AM, Herbert J. Bernstein wrote:
> >
> >>   The heart of the difference lies in "Therefore, datavalues that
> can
> >> be interpreted as numbers (CIF1.1 'numb' type) must retain
> knowledge
> >> of the source string so that the DDLs and dictionaries are free to
> >> interpret them as 'char' type (see Vol G 2.2.7.4.7.1(17),
> reproduced
> >> below)"  I do not see any requirement in Volume G that we must
> retain
> >> the original source string for _any_ data value, just that we must
> >> faithfully preserve the information in that data value.  In the
> case
> >> of a number, that is the numeric value, not the particular choice
> of
> >> character string used to represent that numeric value, so we may
> >> freely change from 123 to 1.23e2 and back.
> >
> >
> > James's is a practical argument, predicated on the idea of a
> hypothetical CIF parser, operating without dictionary knowledge but
> supporting dictionary-based applications.  The hypothetical
> processor's output is a complex object conforming to the abstract CIF
> data model that James is attempting to define.  He postulates that it
> is both necessary and sufficient for such an abstract data model to
> retain the original character sequence of every data value, including
> those having numeric lexical form.
> >
> > James's argument is supported by Vol G 2.2.5.2, wherein it is
> specified that given
> >
> > _unknown_data_name 1
> >
> > and a dictionary definition assigning type 'char' to that name, "the
> value should be stored as the literal character 1."  As a practical
> matter, then, a CIF 1.1 processor must retain the original character
> sequence at least until it is known whether there is a definition.
>  Therefore James's abstract data model, output by a processor ignorant
> of any dictionary yet supporting dictionary-based applications, indeed
> must retain the original character sequence.
> >
> >
> >>   Saying, as 2.2.7.4.7.1.(17), that "it may be assumed that a
> >> character string interpretable as a number should be taken to
> >> represent an item of type 'numb'" does _not_ say that we need
> >> to retain the original source string
> >
> >
> > No, but saying, as 2.2.7.4.7.1.(17)'s next sentence does, "However,
> an explicit dictionary declaration of type will override such an
> assumption," _does_ require the original source string to be retained
> in the event that a dictionary definition declares type 'char' for the
> value.  James's hypothetical processor must retain the original source
> string because it doesn't yet know whether there is such a definition.
> >
> > [...]
> >
> >
> >>   If we wish to preserve a particular string, we should quote it,
> but
> >> then it is type char, not type numb.
> >
> >
> > That is unquestionably the most pragmatic approach for writing CIFs,
> but what does or should the CIF specifications require when that
> approach is not taken?  There seems more of a sore spot here than I
> appreciated before this discussion, but I now believe that CIF 1.1's
> approach for determining data types is flawed, and furthermore that it
> is inconsistently implemented in practice.  It is unsatisfactory that
> CIF 1.1 does not require dictionaries to be used, yet mandates
> different, incompatible, data typing analysis when they are used than
> when they are not.
> >
> > I think CIF2 can and should adopt a different position, wherein
> there are three base primitive data types for values (char, numb, and
> null), and values are assigned to one of these base types upon
> parsing, based on their lexical form.  If a dictionary definition
> requires a different base type than the one a value was expressed in,
> then the value must be coerced to the needed type after parsing,
> according to a standard set of platform-independent coercion rules.
>  That would achieve a separation between the (optional) dictionary
> layer and the underlying data model that CIF 1.1 lacks.  Without
> something like that one needs the kind of workaround that James
> describes, at least in principle.
> >
> >
> > John
> >
> > --
> > John C. Bollinger, Ph.D.
> > Department of Structural Biology
> > St. Jude Children's Research Hospital
> >
> >
> > Email Disclaimer:  www.stjude.org/emaildisclaimer
> >
> > _______________________________________________
> > ddlm-group mailing list
> > ddlm-group@iucr.org
> > http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> 
> 
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> 
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.