[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] _enumerated_set.table_id

This gives rise to an interesting possible extension andsimplification for the future:  really making a Table into a table, asa way to carry all the information in an instance of a category asmanipulable data, then all the typing issues could be dealt with byexisting CIF typing, and we would be able to carry multipleorder-independent rows unambiguously.
On Thu, Apr 23, 2015 at 10:51 AM, Bollinger, John C<John.Bollinger@stjude.org> wrote:> For better or for worse, "Table" is the CIF2 term for this data structure.  I do not think introducing an alias at this point would serve the interest of clarity, but I will try to remember to capitalize  when I use the word the CIF2 sense.>> John>>> -----Original Message----->> From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of>> Herbert J. Bernstein>> Sent: Thursday, April 23, 2015 4:52 AM>> To: Group finalising DDLm and associated dictionaries>> Subject: Re: [ddlm-group] _enumerated_set.table_id>>>> May I suggest maintaining a clear distinction, at least by capitalizing the CIF2>> type, or better, by referring to it as a dictionary type, as in Python?>>>> On Wed, Apr 22, 2015 at 11:25 PM, James Hester <jamesrhester@gmail.com>>> wrote:>> > Hi Herbert - the very important point here is that we are talking>> > about the 'Table' type in CIF2 i.e. {"key1":value "key2":value}, and>> > most certainly not 'table' in the sense of 'relational database table'>> > (although you will appreciate the very close relationship between the two>> datastructures).>> >>> > all the best,>> > James.>> >>> > On Thu, Apr 23, 2015 at 1:35 AM, Herbert J. Bernstein>> > <yayahjb@gmail.com>>> > wrote:>> >>>> >> Dear Colleagues,>> >>>> >>   I am puzzled  by the idea of constraints on table keys distinct>> >> from the constraints on the values and types for table columns.  From>> >> a database perspective, a table key is just a set of one or more>> >> columns that uniquely identify rows in a table by their contents.  If>> >> a column has been designated as a key or as a member of a composite>> >> key, the normal practice is to use the type and value constraints of>> >> the column as the only constraints on what you are allowed to use.>> Please>> >> explain what is gained by having additional constraints specified?   I>> >> would suggest we keep as close to a relational model for CIF2 tables>> >> as possible.>> >>>> >>   Regards,>> >>     Herbert>> >>>> >> On Wed, Apr 22, 2015 at 11:14 AM, Bollinger, John C>> >> <John.Bollinger@stjude.org> wrote:>> >> > Hi James,>> >> >>> >> > Comments inline below.  ((Lack of) formatting thanks to stupid>> >> > Microsoft>> >> > limitations.)>> >> >>> >> >> > 4. Add a replacement mechanism to define constraints on table keys.>> >> >> > It might be sufficient, and consistent with the apparent intent>> >> >> > of the current dictionary, to establish a parallel to the>> >> >> > _enumeration_set category for constraining key values, maybe>> >> >> > _key_enumeration_set.  It would be a smaller change at the>> >> >> > dictionary level, however, to add a mechanism by which>> >> >> > constraints on key type could be defined by reference to the type of>> another item (see also next).>> >> >>>> >> >> What is the advantage of being able to validate key strings?>> >> >>> >> > What is the advantage of validating *anything*?  If there is a>> >> > constraint on document form and content then one would like to be>> >> > able to determine whether instance documents comply with that>> >> > constraint.  It can be useful to perform such validation for its>> >> > own sake, or programs can validate up front in order to minimize or>> >> > eliminate the need to sprinkle hand-rolled validity testing throughout>> their implementation code.>> >> >>> >> > I suppose the real question is about the advantage of defining>> >> > constraints on table keys in the first place.  There are all sorts>> >> > of possible examples, but for now let's stick with _input.get.  In>> >> > each element (a table) of the list value of that attribute, a few>> >> > specific possible keys are meaningful, and all others are>> >> > meaningless / erroneous.  We might like to be able to diagnose key>> >> > misspellings in those tables.  We might like to be able to process>> >> > the values as lists of (key, value) pairs without fear that any of>> >> > the keys are invalid.  We might simply like to provide a machine->> readable definition of which keys are meaningful / allowed.>> >> >>> >> >>  As outlined in my previous email, I don't see that validating the>> >> >> keys will have much benefit as tables are rarely used.  That>> >> >> aside, simply introducing an extra DDLm attribute is OK,>> >> >> especially as we are dropping _enumeration_set.table_id we are not>> enlarging DDLm.>> >> >>> >> > If it were going to require a great deal of additional work and>> >> > complexity to provide for constraints on table keys then I would>> >> > hesitate to suggest doing so.  I don't think that's the case.>> >> >>> >> > As it is, the current DDLm dictionary provides a mechanism intended>> >> > to support constraining table keys, and it uses it, albeit only>> >> > once.  Removing that ability without replacement would not only>> >> > delete the ability it supports, it would also change the semantics>> >> > of the DDLm item that currently>> >> > *uses* that ability.>> >> >>> >> > I am inclined to suppose that one reason tables are rarely used in>> >> > the current dictionaries is that the item descriptions in the 2012>> >> > DDLm dictionary do a poor job of explaining how to define items>> >> > taking tables as their values, especially with respect to>> >> > constraints.  Furthermore, all of the current dictionaries -- even>> >> > DDLm -- spring from a history and dictionary development tradition>> >> > that hadn't table values to rely on until now, so it is not>> >> > surprising that DDLm versions of those dictionaries have little>> >> > reliance on tables.  That does not mean that tables cannot serve>> >> > more prominently in future dictionaries, or future versions of the>> current dictionaries.>> >> >>> >> >> > 5. Add a mechanism to allow items' content type to be defined by>> >> >> > reference to another item.  This could be signaled by a new code>> >> >> > for _type.contents, with a new attribute defining which other>> >> >> > item’s type is to be used.  I don’t think that the existing>> >> >> > contents code 'Inherited' can serve this purpose, but perhaps I’m>> mistaken.>> >> >>> >> >> This is an intriguing idea.  As it happens, the demonstration DDLm>> >> >> dictionaries introduce setting the type of an item based on the>> >> >> type of a different item using a dREL-like function (although I>> >> >> have replaced these with explicit types in the latest version of the new>> cif_core dictionary).>> >> >> Your suggestion replaces this by a non-dREL approach, which is in>> >> >> general desirable for simple applications.  To check that I've>> >> >> understood your>> >> >> (corrected) example:>> >> >> (1) the elements of the _import.get List are items of the same>> >> >> type as _import.get_contents_type>> >> >>> >> > Yes.>> >> >>> >> >> (2) _import.get_contents_type is a Table, so _type.contents for it>> >> >> is the type of values in the table i.e. Text>> >> >>> >> > Yes.>> >> >>> >> >> (3) The possible key values are given by the possible values taken>> >> >> by the _type.key_type_reference dataname>> >> >>> >> > Yes, in this case.  My idea is that _type.keys would be parallel to>> >> > _type.contents, so that, for example, it might also take the value>> >> > 'Code' or 'Date' or 'Text' or an extension type, and in that case>> >> > not rely on a reference to a separate item definition.>> >> >>> >> >> We have two new 'internal' DDLm attributes as a result, as well as>> >> >> the new _type.keys, _type.key_content_reference and>> >> >> _type.key_type_reference datanames for a total of 5 new attributes.>> >> >>> >> > Those aren't exactly the data names I proposed, but yes, that's the>> >> > way my proposal plays out for DDLm.>> >> >>> >> >>  If we put the key list into the definition to which it relates,>> >> >> we can cut down on the number of new attributes, e.g:>> >> >> save_import.get_contents_type>> >> >>    # ...>> >> >>    _type.purpose             'Internal'>> >> >>    _type.container           'Table'>> >> >>    _type.contents            'Text'>> >> >>    loop_>> >> >>      _table_key_set.state>> >> >>      _table_key_set.detail>> >> >>        'file' 'filename/URI of source dictionary'>> >> >>        'save' 'save framecode of source definition'>> >> >>        'mode' 'mode for including save frames'>> >> >>        'dupl' 'option for duplicate entries'>> >> >>        'miss' 'option for missing duplicate entries'>> >> >> save_>> >> >>> >> > Yes, that would be a viable alternative to support the needs of>> >> > DDLm itself.  It would reduce the number of new items needed from 3>> >> > to 2 (the two other proposed new items being related to defining>> >> > table *contents* by reference, which is a separate issue).  The>> >> > statistics look different for dictionaries other than DDLm itself.>> >> >>> >> > Your alternative appears to be roughly what I described in passing>> >> > as "to establish a parallel to the _enumeration_set category for>> >> > constraining key values."  Although it serves DDLm's own needs just>> >> > fine, it may be too restrictive for other dictionaries that want to>> >> > define (and constrain) tables, as it supports only enumerable sets>> >> > of keys.  In some other uses one might instead want to constrain>> >> > keys to the same form that (for values) is represented by>> >> > _type.contents = 'Date' or 'Version' or some extension type, where it is>> not possible to enumerate all possible keys.>> >> >>> >> >> which results in new attributes _type.key_content_reference,>> >> >> _table_key_set.state and _table_key_set.detail with one internal>> >> >> attribute _import.get_contents_type, and also reduces the>> >> >> non-locality of the definition - that is, one less reference to track>> through the file.>> >> >> _import.get is admittedly an extreme example, because it is the>> >> >> only occurrence of a list of tables rather than just a table,>> >> >> which is what requires the creation of the 'internal' data attribute.>> >> >>> >> > Yes and no.  The creation of the new 'Internal' value for>> >> > _type.purpose and of items that use it are more a consequence of my>> >> > approach to lightening the load on _type.dimension, whose current>> >> > description and use appear to task it with providing a complete>> >> > layout of values of the item being defined.  Note in particular the>> >> > dimension specified in the current definition of _import.get:>> >> > '[{}]'.  I don't think we want to continue in that direction.>> >> >>> >> > The structure of _import.get's values does not inherently require>> >> > internal types to be defined under my proposed structure.  If there>> >> > were an ordinary item in the dictionary that had the wanted type of>> >> > the elements of an _import.get list, then that type could be>> >> > referenced instead of an internal one.  I can imagine circumstances>> >> > under which such a reference would even be sensible.>> >> >>> >> >>  It is, however, a nice demonstration of how the attributes might>> >> >> work for future dictionary writers.  The new 'internal' dataname>> >> >> does have some meaning along the lines of 'a single import>> >> >> instruction' so a better dataname might be _import.single.>> >> >>> >> > Sure, that name would be fine with me.>> >> >>> >> >>  Is there any reason that you introduced a reference in order to>> >> >> specify the table keys?>> >> >>> >> > I introduced a reference in order to specify table keys so as to>> >> > provide for more alternatives than an enumeration of possible keys,>> >> > while minimizing the number of new DDLm items required.  Also,>> >> > inasmuch as I was already proposing type-by-reference for values,>> >> > it seemed consistent to follow a parallel approach for key constraints.>> >> >>> >> >>  And do you agree that the alternative I've proposed above would>> >> >> also be sufficient?>> >> >>> >> > I agree that your alternative would be sufficient *for DDLm>> >> > itself*, but I would prefer more flexibility to be available to other>> dictionaries.>> >> > Because DDLm itself will be harder to change than other DDLm>> >> > dictionaries, I would like to avoid it being overly restrictive.>> >> > At the same time, I don't think we need to go crazy by trying to>> >> > make DDLm capable of defining completely arbitrary CIF2 data>> >> > structures.  I have tried to choose a happy medium that is>> >> > minimally disruptive for existing DDLm dictionaries and software.>> >> >>> >> >> On a final note for _import.get, the dREL is broken as it assumes>> >> >> that there is only one value for each of the constituent _import>> >> >> datanames, which would make a list superfluous (only one element),>> >> >> but what it really wants to do is to create a list from a loop of>> >> >> _import.file etc. values.  To do this it needs a sequence number,>> >> >> which isn't defined.  Once this *is* defined, we could>> >> >> alternatively present the import instructions as a loop over>> >> >> _import.sequence and _import.single, or else _import.seqence,>> _import.file etc.>> >> >>> >> > I can't say I'm much surprised.  _import.get shows evidence of>> >> > having gone through a change at some point, and I don't think that>> >> > was fully and consistently implemented.  I note in particular that>> >> > its description (in the>> >> > 2012 version) is "A table of attributes [...]", not "A list of>> >> > tables of attributes [...]" or similar.  I also note that its>> >> > _type.container is given as 'List[Table]', which is not among the>> >> > enumerated alternatives for values of that attribute.>> >> >>> >> > As for the dREL, though, why do you need a sequence number, and />> >> > or why can the dREL not generate one itself as it iterates over the>> >> > values of _import.get?  Given that each value is a table providing>> >> > the attributes describing one import; co-occurrence in the same>> >> > table already associates the various attributes of each import together.>> >> >>> >> >> To wrap up, I like the suggestion of a _type.contents that can>> >> >> work by reference to another dataname.  I don't see a particular>> >> >> need for a similar reference for table keys, nor do I particularly>> >> >> think explicitly specifying the keys is likely to be that useful,>> >> >> but I'm not against adding this capability.  We envisage adding>> >> >> quite a few other attributes later on to improve DDL2 - DDLm>> translation anyway.>> >> >>> >> > I'm glad you like the idea of defining content type by reference.>> >> > I hope I've persuaded you about the keys, but even if not, I still>> >> > think that the ability to define machine-readable specifications of>> >> > allowed keys is important.  I'm not hung up on the exact>> implementation I proposed, however.>> >> >>> >> >>> >> > Cheers,>> >> >>> >> > John>> >> >>> >> > -->> >> > John C. Bollinger, Ph.D.>> >> > Computing and X-Ray Scientist>> >> > Department of Structural Biology>> >> > St. Jude Children's Research Hospital John.Bollinger@StJude.org>> >> > (901) 595-3166 [office]>> >> > www.stjude.org>> >> >>> >> >>> >> >>> >> > _______________________________________________>> >> > ddlm-group mailing list>> >> > ddlm-group@iucr.org>> >> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>> >> _______________________________________________>> >> ddlm-group mailing list>> >> ddlm-group@iucr.org>> >> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>> >>> >>> >>> >>> > -->> > T +61 (02) 9717 9907>> > F +61 (02) 9717 3145>> > M +61 (04) 0249 4148>> >>> > _______________________________________________>> > ddlm-group mailing list>> > ddlm-group@iucr.org>> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>> >>> _______________________________________________>> ddlm-group mailing list>> ddlm-group@iucr.org>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group> _______________________________________________> ddlm-group mailing list> ddlm-group@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]