[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] _enumerated_set.table_id

May I suggest maintaining a clear distinction, at least bycapitalizing the CIF2 type, or better, by referring to it as adictionary type, as in Python?
On Wed, Apr 22, 2015 at 11:25 PM, James Hester <jamesrhester@gmail.com> wrote:> Hi Herbert - the very important point here is that we are talking about the> 'Table' type in CIF2 i.e. {"key1":value "key2":value}, and most certainly> not 'table' in the sense of 'relational database table' (although you will> appreciate the very close relationship between the two datastructures).>> all the best,> James.>> On Thu, Apr 23, 2015 at 1:35 AM, Herbert J. Bernstein <yayahjb@gmail.com>> wrote:>>>> Dear Colleagues,>>>>   I am puzzled  by the idea of constraints on table keys distinct from>> the constraints on the values and types for table columns.  From a>> database perspective, a table key is just a set of one or more columns>> that uniquely identify rows in a table by their contents.  If a column>> has been designated as a key or as a member of a composite key, the>> normal practice is to use the type and value constraints of the column>> as the only constraints on what you are allowed to use.  Please>> explain what is gained by having additional constraints specified?   I>> would suggest we keep as close to a relational model for CIF2 tables>> as possible.>>>>   Regards,>>     Herbert>>>> On Wed, Apr 22, 2015 at 11:14 AM, Bollinger, John C>> <John.Bollinger@stjude.org> wrote:>> > Hi James,>> >>> > Comments inline below.  ((Lack of) formatting thanks to stupid Microsoft>> > limitations.)>> >>> >> > 4. Add a replacement mechanism to define constraints on table keys.>> >> > It might be sufficient, and consistent with the apparent intent of the>> >> > current dictionary, to establish a parallel to the _enumeration_set category>> >> > for constraining key values, maybe _key_enumeration_set.  It would be a>> >> > smaller change at the dictionary level, however, to add a mechanism by which>> >> > constraints on key type could be defined by reference to the type of another>> >> > item (see also next).>> >>>> >> What is the advantage of being able to validate key strings?>> >>> > What is the advantage of validating *anything*?  If there is a>> > constraint on document form and content then one would like to be able to>> > determine whether instance documents comply with that constraint.  It can be>> > useful to perform such validation for its own sake, or programs can validate>> > up front in order to minimize or eliminate the need to sprinkle hand-rolled>> > validity testing throughout their implementation code.>> >>> > I suppose the real question is about the advantage of defining>> > constraints on table keys in the first place.  There are all sorts of>> > possible examples, but for now let's stick with _input.get.  In each element>> > (a table) of the list value of that attribute, a few specific possible keys>> > are meaningful, and all others are meaningless / erroneous.  We might like>> > to be able to diagnose key misspellings in those tables.  We might like to>> > be able to process the values as lists of (key, value) pairs without fear>> > that any of the keys are invalid.  We might simply like to provide a>> > machine-readable definition of which keys are meaningful / allowed.>> >>> >>  As outlined in my previous email, I don't see that validating the keys>> >> will have much benefit as tables are rarely used.  That aside, simply>> >> introducing an extra DDLm attribute is OK, especially as we are dropping>> >> _enumeration_set.table_id we are not enlarging DDLm.>> >>> > If it were going to require a great deal of additional work and>> > complexity to provide for constraints on table keys then I would hesitate to>> > suggest doing so.  I don't think that's the case.>> >>> > As it is, the current DDLm dictionary provides a mechanism intended to>> > support constraining table keys, and it uses it, albeit only once.  Removing>> > that ability without replacement would not only delete the ability it>> > supports, it would also change the semantics of the DDLm item that currently>> > *uses* that ability.>> >>> > I am inclined to suppose that one reason tables are rarely used in the>> > current dictionaries is that the item descriptions in the 2012 DDLm>> > dictionary do a poor job of explaining how to define items taking tables as>> > their values, especially with respect to constraints.  Furthermore, all of>> > the current dictionaries -- even DDLm -- spring from a history and>> > dictionary development tradition that hadn't table values to rely on until>> > now, so it is not surprising that DDLm versions of those dictionaries have>> > little reliance on tables.  That does not mean that tables cannot serve more>> > prominently in future dictionaries, or future versions of the current>> > dictionaries.>> >>> >> > 5. Add a mechanism to allow items' content type to be defined by>> >> > reference to another item.  This could be signaled by a new code for>> >> > _type.contents, with a new attribute defining which other item’s type is to>> >> > be used.  I don’t think that the existing contents code 'Inherited' can>> >> > serve this purpose, but perhaps I’m mistaken.>> >>> >> This is an intriguing idea.  As it happens, the demonstration DDLm>> >> dictionaries introduce setting the type of an item based on the type of a>> >> different item using a dREL-like function (although I have replaced these>> >> with explicit types in the latest version of the new cif_core dictionary).>> >> Your suggestion replaces this by a non-dREL approach, which is in general>> >> desirable for simple applications.  To check that I've understood your>> >> (corrected) example:>> >> (1) the elements of the _import.get List are items of the same type as>> >> _import.get_contents_type>> >>> > Yes.>> >>> >> (2) _import.get_contents_type is a Table, so _type.contents for it is>> >> the type of values in the table i.e. Text>> >>> > Yes.>> >>> >> (3) The possible key values are given by the possible values taken by>> >> the _type.key_type_reference dataname>> >>> > Yes, in this case.  My idea is that _type.keys would be parallel to>> > _type.contents, so that, for example, it might also take the value 'Code' or>> > 'Date' or 'Text' or an extension type, and in that case not rely on a>> > reference to a separate item definition.>> >>> >> We have two new 'internal' DDLm attributes as a result, as well as the>> >> new _type.keys, _type.key_content_reference and _type.key_type_reference>> >> datanames for a total of 5 new attributes.>> >>> > Those aren't exactly the data names I proposed, but yes, that's the way>> > my proposal plays out for DDLm.>> >>> >>  If we put the key list into the definition to which it relates, we can>> >> cut down on the number of new attributes, e.g:>> >> save_import.get_contents_type>> >>    # ...>> >>    _type.purpose             'Internal'>> >>    _type.container           'Table'>> >>    _type.contents            'Text'>> >>    loop_>> >>      _table_key_set.state>> >>      _table_key_set.detail>> >>        'file' 'filename/URI of source dictionary'>> >>        'save' 'save framecode of source definition'>> >>        'mode' 'mode for including save frames'>> >>        'dupl' 'option for duplicate entries'>> >>        'miss' 'option for missing duplicate entries'>> >> save_>> >>> > Yes, that would be a viable alternative to support the needs of DDLm>> > itself.  It would reduce the number of new items needed from 3 to 2 (the two>> > other proposed new items being related to defining table *contents* by>> > reference, which is a separate issue).  The statistics look different for>> > dictionaries other than DDLm itself.>> >>> > Your alternative appears to be roughly what I described in passing as>> > "to establish a parallel to the _enumeration_set category for constraining>> > key values."  Although it serves DDLm's own needs just fine, it may be too>> > restrictive for other dictionaries that want to define (and constrain)>> > tables, as it supports only enumerable sets of keys.  In some other uses one>> > might instead want to constrain keys to the same form that (for values) is>> > represented by _type.contents = 'Date' or 'Version' or some extension type,>> > where it is not possible to enumerate all possible keys.>> >>> >> which results in new attributes _type.key_content_reference,>> >> _table_key_set.state and _table_key_set.detail with one internal attribute>> >> _import.get_contents_type, and also reduces the non-locality of the>> >> definition - that is, one less reference to track through the file.>> >> _import.get is admittedly an extreme example, because it is the only>> >> occurrence of a list of tables rather than just a table, which is what>> >> requires the creation of the 'internal' data attribute.>> >>> > Yes and no.  The creation of the new 'Internal' value for _type.purpose>> > and of items that use it are more a consequence of my approach to lightening>> > the load on _type.dimension, whose current description and use appear to>> > task it with providing a complete layout of values of the item being>> > defined.  Note in particular the dimension specified in the current>> > definition of _import.get: '[{}]'.  I don't think we want to continue in>> > that direction.>> >>> > The structure of _import.get's values does not inherently require>> > internal types to be defined under my proposed structure.  If there were an>> > ordinary item in the dictionary that had the wanted type of the elements of>> > an _import.get list, then that type could be referenced instead of an>> > internal one.  I can imagine circumstances under which such a reference>> > would even be sensible.>> >>> >>  It is, however, a nice demonstration of how the attributes might work>> >> for future dictionary writers.  The new 'internal' dataname does have some>> >> meaning along the lines of 'a single import instruction' so a better>> >> dataname might be _import.single.>> >>> > Sure, that name would be fine with me.>> >>> >>  Is there any reason that you introduced a reference in order to>> >> specify the table keys?>> >>> > I introduced a reference in order to specify table keys so as to provide>> > for more alternatives than an enumeration of possible keys, while minimizing>> > the number of new DDLm items required.  Also, inasmuch as I was already>> > proposing type-by-reference for values, it seemed consistent to follow a>> > parallel approach for key constraints.>> >>> >>  And do you agree that the alternative I've proposed above would also>> >> be sufficient?>> >>> > I agree that your alternative would be sufficient *for DDLm itself*, but>> > I would prefer more flexibility to be available to other dictionaries.>> > Because DDLm itself will be harder to change than other DDLm dictionaries, I>> > would like to avoid it being overly restrictive.  At the same time, I don't>> > think we need to go crazy by trying to make DDLm capable of defining>> > completely arbitrary CIF2 data structures.  I have tried to choose a happy>> > medium that is minimally disruptive for existing DDLm dictionaries and>> > software.>> >>> >> On a final note for _import.get, the dREL is broken as it assumes that>> >> there is only one value for each of the constituent _import datanames, which>> >> would make a list superfluous (only one element), but what it really wants>> >> to do is to create a list from a loop of _import.file etc. values.  To do>> >> this it needs a sequence number, which isn't defined.  Once this *is*>> >> defined, we could alternatively present the import instructions as a loop>> >> over _import.sequence and _import.single, or else _import.seqence,>> >> _import.file etc.>> >>> > I can't say I'm much surprised.  _import.get shows evidence of having>> > gone through a change at some point, and I don't think that was fully and>> > consistently implemented.  I note in particular that its description (in the>> > 2012 version) is "A table of attributes [...]", not "A list of tables of>> > attributes [...]" or similar.  I also note that its _type.container is given>> > as 'List[Table]', which is not among the enumerated alternatives for values>> > of that attribute.>> >>> > As for the dREL, though, why do you need a sequence number, and / or why>> > can the dREL not generate one itself as it iterates over the values of>> > _import.get?  Given that each value is a table providing the attributes>> > describing one import; co-occurrence in the same table already associates>> > the various attributes of each import together.>> >>> >> To wrap up, I like the suggestion of a _type.contents that can work by>> >> reference to another dataname.  I don't see a particular need for a similar>> >> reference for table keys, nor do I particularly think explicitly specifying>> >> the keys is likely to be that useful, but I'm not against adding this>> >> capability.  We envisage adding quite a few other attributes later on to>> >> improve DDL2 - DDLm translation anyway.>> >>> > I'm glad you like the idea of defining content type by reference.  I>> > hope I've persuaded you about the keys, but even if not, I still think that>> > the ability to define machine-readable specifications of allowed keys is>> > important.  I'm not hung up on the exact implementation I proposed, however.>> >>> >>> > Cheers,>> >>> > John>> >>> > -->> > John C. Bollinger, Ph.D.>> > Computing and X-Ray Scientist>> > Department of Structural Biology>> > St. Jude Children's Research Hospital>> > John.Bollinger@StJude.org>> > (901) 595-3166 [office]>> > www.stjude.org>> >>> >>> >>> > _______________________________________________>> > ddlm-group mailing list>> > ddlm-group@iucr.org>> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>> _______________________________________________>> ddlm-group mailing list>> ddlm-group@iucr.org>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>>>>> --> T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148>> _______________________________________________> ddlm-group mailing list> ddlm-group@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]