Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] _enumerated_set.table_id

For better or for worse, "Table" is the CIF2 term for this data structure.  I do not think introducing an alias at this point would serve the interest of clarity, but I will try to remember to capitalize  when I use the word the CIF2 sense.

John

> -----Original Message-----
> From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of
> Herbert J. Bernstein
> Sent: Thursday, April 23, 2015 4:52 AM
> To: Group finalising DDLm and associated dictionaries
> Subject: Re: [ddlm-group] _enumerated_set.table_id
> 
> May I suggest maintaining a clear distinction, at least by capitalizing the CIF2
> type, or better, by referring to it as a dictionary type, as in Python?
> 
> On Wed, Apr 22, 2015 at 11:25 PM, James Hester <jamesrhester@gmail.com>
> wrote:
> > Hi Herbert - the very important point here is that we are talking
> > about the 'Table' type in CIF2 i.e. {"key1":value "key2":value}, and
> > most certainly not 'table' in the sense of 'relational database table'
> > (although you will appreciate the very close relationship between the two
> datastructures).
> >
> > all the best,
> > James.
> >
> > On Thu, Apr 23, 2015 at 1:35 AM, Herbert J. Bernstein
> > <yayahjb@gmail.com>
> > wrote:
> >>
> >> Dear Colleagues,
> >>
> >>   I am puzzled  by the idea of constraints on table keys distinct
> >> from the constraints on the values and types for table columns.  From
> >> a database perspective, a table key is just a set of one or more
> >> columns that uniquely identify rows in a table by their contents.  If
> >> a column has been designated as a key or as a member of a composite
> >> key, the normal practice is to use the type and value constraints of
> >> the column as the only constraints on what you are allowed to use.
> Please
> >> explain what is gained by having additional constraints specified?   I
> >> would suggest we keep as close to a relational model for CIF2 tables
> >> as possible.
> >>
> >>   Regards,
> >>     Herbert
> >>
> >> On Wed, Apr 22, 2015 at 11:14 AM, Bollinger, John C
> >> <John.Bollinger@stjude.org> wrote:
> >> > Hi James,
> >> >
> >> > Comments inline below.  ((Lack of) formatting thanks to stupid
> >> > Microsoft
> >> > limitations.)
> >> >
> >> >> > 4. Add a replacement mechanism to define constraints on table keys.
> >> >> > It might be sufficient, and consistent with the apparent intent
> >> >> > of the current dictionary, to establish a parallel to the
> >> >> > _enumeration_set category for constraining key values, maybe
> >> >> > _key_enumeration_set.  It would be a smaller change at the
> >> >> > dictionary level, however, to add a mechanism by which
> >> >> > constraints on key type could be defined by reference to the type of
> another item (see also next).
> >> >>
> >> >> What is the advantage of being able to validate key strings?
> >> >
> >> > What is the advantage of validating *anything*?  If there is a
> >> > constraint on document form and content then one would like to be
> >> > able to determine whether instance documents comply with that
> >> > constraint.  It can be useful to perform such validation for its
> >> > own sake, or programs can validate up front in order to minimize or
> >> > eliminate the need to sprinkle hand-rolled validity testing throughout
> their implementation code.
> >> >
> >> > I suppose the real question is about the advantage of defining
> >> > constraints on table keys in the first place.  There are all sorts
> >> > of possible examples, but for now let's stick with _input.get.  In
> >> > each element (a table) of the list value of that attribute, a few
> >> > specific possible keys are meaningful, and all others are
> >> > meaningless / erroneous.  We might like to be able to diagnose key
> >> > misspellings in those tables.  We might like to be able to process
> >> > the values as lists of (key, value) pairs without fear that any of
> >> > the keys are invalid.  We might simply like to provide a machine-
> readable definition of which keys are meaningful / allowed.
> >> >
> >> >>  As outlined in my previous email, I don't see that validating the
> >> >> keys will have much benefit as tables are rarely used.  That
> >> >> aside, simply introducing an extra DDLm attribute is OK,
> >> >> especially as we are dropping _enumeration_set.table_id we are not
> enlarging DDLm.
> >> >
> >> > If it were going to require a great deal of additional work and
> >> > complexity to provide for constraints on table keys then I would
> >> > hesitate to suggest doing so.  I don't think that's the case.
> >> >
> >> > As it is, the current DDLm dictionary provides a mechanism intended
> >> > to support constraining table keys, and it uses it, albeit only
> >> > once.  Removing that ability without replacement would not only
> >> > delete the ability it supports, it would also change the semantics
> >> > of the DDLm item that currently
> >> > *uses* that ability.
> >> >
> >> > I am inclined to suppose that one reason tables are rarely used in
> >> > the current dictionaries is that the item descriptions in the 2012
> >> > DDLm dictionary do a poor job of explaining how to define items
> >> > taking tables as their values, especially with respect to
> >> > constraints.  Furthermore, all of the current dictionaries -- even
> >> > DDLm -- spring from a history and dictionary development tradition
> >> > that hadn't table values to rely on until now, so it is not
> >> > surprising that DDLm versions of those dictionaries have little
> >> > reliance on tables.  That does not mean that tables cannot serve
> >> > more prominently in future dictionaries, or future versions of the
> current dictionaries.
> >> >
> >> >> > 5. Add a mechanism to allow items' content type to be defined by
> >> >> > reference to another item.  This could be signaled by a new code
> >> >> > for _type.contents, with a new attribute defining which other
> >> >> > item’s type is to be used.  I don’t think that the existing
> >> >> > contents code 'Inherited' can serve this purpose, but perhaps I’m
> mistaken.
> >> >
> >> >> This is an intriguing idea.  As it happens, the demonstration DDLm
> >> >> dictionaries introduce setting the type of an item based on the
> >> >> type of a different item using a dREL-like function (although I
> >> >> have replaced these with explicit types in the latest version of the new
> cif_core dictionary).
> >> >> Your suggestion replaces this by a non-dREL approach, which is in
> >> >> general desirable for simple applications.  To check that I've
> >> >> understood your
> >> >> (corrected) example:
> >> >> (1) the elements of the _import.get List are items of the same
> >> >> type as _import.get_contents_type
> >> >
> >> > Yes.
> >> >
> >> >> (2) _import.get_contents_type is a Table, so _type.contents for it
> >> >> is the type of values in the table i.e. Text
> >> >
> >> > Yes.
> >> >
> >> >> (3) The possible key values are given by the possible values taken
> >> >> by the _type.key_type_reference dataname
> >> >
> >> > Yes, in this case.  My idea is that _type.keys would be parallel to
> >> > _type.contents, so that, for example, it might also take the value
> >> > 'Code' or 'Date' or 'Text' or an extension type, and in that case
> >> > not rely on a reference to a separate item definition.
> >> >
> >> >> We have two new 'internal' DDLm attributes as a result, as well as
> >> >> the new _type.keys, _type.key_content_reference and
> >> >> _type.key_type_reference datanames for a total of 5 new attributes.
> >> >
> >> > Those aren't exactly the data names I proposed, but yes, that's the
> >> > way my proposal plays out for DDLm.
> >> >
> >> >>  If we put the key list into the definition to which it relates,
> >> >> we can cut down on the number of new attributes, e.g:
> >> >> save_import.get_contents_type
> >> >>    # ...
> >> >>    _type.purpose             'Internal'
> >> >>    _type.container           'Table'
> >> >>    _type.contents            'Text'
> >> >>    loop_
> >> >>      _table_key_set.state
> >> >>      _table_key_set.detail
> >> >>        'file' 'filename/URI of source dictionary'
> >> >>        'save' 'save framecode of source definition'
> >> >>        'mode' 'mode for including save frames'
> >> >>        'dupl' 'option for duplicate entries'
> >> >>        'miss' 'option for missing duplicate entries'
> >> >> save_
> >> >
> >> > Yes, that would be a viable alternative to support the needs of
> >> > DDLm itself.  It would reduce the number of new items needed from 3
> >> > to 2 (the two other proposed new items being related to defining
> >> > table *contents* by reference, which is a separate issue).  The
> >> > statistics look different for dictionaries other than DDLm itself.
> >> >
> >> > Your alternative appears to be roughly what I described in passing
> >> > as "to establish a parallel to the _enumeration_set category for
> >> > constraining key values."  Although it serves DDLm's own needs just
> >> > fine, it may be too restrictive for other dictionaries that want to
> >> > define (and constrain) tables, as it supports only enumerable sets
> >> > of keys.  In some other uses one might instead want to constrain
> >> > keys to the same form that (for values) is represented by
> >> > _type.contents = 'Date' or 'Version' or some extension type, where it is
> not possible to enumerate all possible keys.
> >> >
> >> >> which results in new attributes _type.key_content_reference,
> >> >> _table_key_set.state and _table_key_set.detail with one internal
> >> >> attribute _import.get_contents_type, and also reduces the
> >> >> non-locality of the definition - that is, one less reference to track
> through the file.
> >> >> _import.get is admittedly an extreme example, because it is the
> >> >> only occurrence of a list of tables rather than just a table,
> >> >> which is what requires the creation of the 'internal' data attribute.
> >> >
> >> > Yes and no.  The creation of the new 'Internal' value for
> >> > _type.purpose and of items that use it are more a consequence of my
> >> > approach to lightening the load on _type.dimension, whose current
> >> > description and use appear to task it with providing a complete
> >> > layout of values of the item being defined.  Note in particular the
> >> > dimension specified in the current definition of _import.get:
> >> > '[{}]'.  I don't think we want to continue in that direction.
> >> >
> >> > The structure of _import.get's values does not inherently require
> >> > internal types to be defined under my proposed structure.  If there
> >> > were an ordinary item in the dictionary that had the wanted type of
> >> > the elements of an _import.get list, then that type could be
> >> > referenced instead of an internal one.  I can imagine circumstances
> >> > under which such a reference would even be sensible.
> >> >
> >> >>  It is, however, a nice demonstration of how the attributes might
> >> >> work for future dictionary writers.  The new 'internal' dataname
> >> >> does have some meaning along the lines of 'a single import
> >> >> instruction' so a better dataname might be _import.single.
> >> >
> >> > Sure, that name would be fine with me.
> >> >
> >> >>  Is there any reason that you introduced a reference in order to
> >> >> specify the table keys?
> >> >
> >> > I introduced a reference in order to specify table keys so as to
> >> > provide for more alternatives than an enumeration of possible keys,
> >> > while minimizing the number of new DDLm items required.  Also,
> >> > inasmuch as I was already proposing type-by-reference for values,
> >> > it seemed consistent to follow a parallel approach for key constraints.
> >> >
> >> >>  And do you agree that the alternative I've proposed above would
> >> >> also be sufficient?
> >> >
> >> > I agree that your alternative would be sufficient *for DDLm
> >> > itself*, but I would prefer more flexibility to be available to other
> dictionaries.
> >> > Because DDLm itself will be harder to change than other DDLm
> >> > dictionaries, I would like to avoid it being overly restrictive.
> >> > At the same time, I don't think we need to go crazy by trying to
> >> > make DDLm capable of defining completely arbitrary CIF2 data
> >> > structures.  I have tried to choose a happy medium that is
> >> > minimally disruptive for existing DDLm dictionaries and software.
> >> >
> >> >> On a final note for _import.get, the dREL is broken as it assumes
> >> >> that there is only one value for each of the constituent _import
> >> >> datanames, which would make a list superfluous (only one element),
> >> >> but what it really wants to do is to create a list from a loop of
> >> >> _import.file etc. values.  To do this it needs a sequence number,
> >> >> which isn't defined.  Once this *is* defined, we could
> >> >> alternatively present the import instructions as a loop over
> >> >> _import.sequence and _import.single, or else _import.seqence,
> _import.file etc.
> >> >
> >> > I can't say I'm much surprised.  _import.get shows evidence of
> >> > having gone through a change at some point, and I don't think that
> >> > was fully and consistently implemented.  I note in particular that
> >> > its description (in the
> >> > 2012 version) is "A table of attributes [...]", not "A list of
> >> > tables of attributes [...]" or similar.  I also note that its
> >> > _type.container is given as 'List[Table]', which is not among the
> >> > enumerated alternatives for values of that attribute.
> >> >
> >> > As for the dREL, though, why do you need a sequence number, and /
> >> > or why can the dREL not generate one itself as it iterates over the
> >> > values of _import.get?  Given that each value is a table providing
> >> > the attributes describing one import; co-occurrence in the same
> >> > table already associates the various attributes of each import together.
> >> >
> >> >> To wrap up, I like the suggestion of a _type.contents that can
> >> >> work by reference to another dataname.  I don't see a particular
> >> >> need for a similar reference for table keys, nor do I particularly
> >> >> think explicitly specifying the keys is likely to be that useful,
> >> >> but I'm not against adding this capability.  We envisage adding
> >> >> quite a few other attributes later on to improve DDL2 - DDLm
> translation anyway.
> >> >
> >> > I'm glad you like the idea of defining content type by reference.
> >> > I hope I've persuaded you about the keys, but even if not, I still
> >> > think that the ability to define machine-readable specifications of
> >> > allowed keys is important.  I'm not hung up on the exact
> implementation I proposed, however.
> >> >
> >> >
> >> > Cheers,
> >> >
> >> > John
> >> >
> >> > --
> >> > John C. Bollinger, Ph.D.
> >> > Computing and X-Ray Scientist
> >> > Department of Structural Biology
> >> > St. Jude Children's Research Hospital John.Bollinger@StJude.org
> >> > (901) 595-3166 [office]
> >> > www.stjude.org
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > ddlm-group mailing list
> >> > ddlm-group@iucr.org
> >> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
> >> _______________________________________________
> >> ddlm-group mailing list
> >> ddlm-group@iucr.org
> >> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
> >
> >
> >
> >
> > --
> > T +61 (02) 9717 9907
> > F +61 (02) 9717 3145
> > M +61 (04) 0249 4148
> >
> > _______________________________________________
> > ddlm-group mailing list
> > ddlm-group@iucr.org
> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
> >
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.