Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] _enumerated_set.table_id

May I suggest maintaining a clear distinction, at least bycapitalizing the CIF2 type, or better, by referring to it as adictionary type, as in Python?
On Wed, Apr 22, 2015 at 11:25 PM, James Hester <jamesrhester@gmail.com> wrote:> Hi Herbert - the very important point here is that we are talking about the> 'Table' type in CIF2 i.e. {"key1":value "key2":value}, and most certainly> not 'table' in the sense of 'relational database table' (although you will> appreciate the very close relationship between the two datastructures).>> all the best,> James.>> On Thu, Apr 23, 2015 at 1:35 AM, Herbert J. Bernstein <yayahjb@gmail.com>> wrote:>>>> Dear Colleagues,>>>>   I am puzzled  by the idea of constraints on table keys distinct from>> the constraints on the values and types for table columns.  From a>> database perspective, a table key is just a set of one or more columns>> that uniquely identify rows in a table by their contents.  If a column>> has been designated as a key or as a member of a composite key, the>> normal practice is to use the type and value constraints of the column>> as the only constraints on what you are allowed to use.  Please>> explain what is gained by having additional constraints specified?   I>> would suggest we keep as close to a relational model for CIF2 tables>> as possible.>>>>   Regards,>>     Herbert>>>> On Wed, Apr 22, 2015 at 11:14 AM, Bollinger, John C>> <John.Bollinger@stjude.org> wrote:>> > Hi James,>> >>> > Comments inline below.  ((Lack of) formatting thanks to stupid Microsoft>> > limitations.)>> >>> >> > 4. Add a replacement mechanism to define constraints on table keys.>> >> > It might be sufficient, and consistent with the apparent intent of the>> >> > current dictionary, to establish a parallel to the _enumeration_set category>> >> > for constraining key values, maybe _key_enumeration_set.  It would be a>> >> > smaller change at the dictionary level, however, to add a mechanism by which>> >> > constraints on key type could be defined by reference to the type of another>> >> > item (see also next).>> >>>> >> What is the advantage of being able to validate key strings?>> >>> > What is the advantage of validating *anything*?  If there is a>> > constraint on document form and content then one would like to be able to>> > determine whether instance documents comply with that constraint.  It can be>> > useful to perform such validation for its own sake, or programs can validate>> > up front in order to minimize or eliminate the need to sprinkle hand-rolled>> > validity testing throughout their implementation code.>> >>> > I suppose the real question is about the advantage of defining>> > constraints on table keys in the first place.  There are all sorts of>> > possible examples, but for now let's stick with _input.get.  In each element>> > (a table) of the list value of that attribute, a few specific possible keys>> > are meaningful, and all others are meaningless / erroneous.  We might like>> > to be able to diagnose key misspellings in those tables.  We might like to>> > be able to process the values as lists of (key, value) pairs without fear>> > that any of the keys are invalid.  We might simply like to provide a>> > machine-readable definition of which keys are meaningful / allowed.>> >>> >>  As outlined in my previous email, I don't see that validating the keys>> >> will have much benefit as tables are rarely used.  That aside, simply>> >> introducing an extra DDLm attribute is OK, especially as we are dropping>> >> _enumeration_set.table_id we are not enlarging DDLm.>> >>> > If it were going to require a great deal of additional work and>> > complexity to provide for constraints on table keys then I would hesitate to>> > suggest doing so.  I don't think that's the case.>> >>> > As it is, the current DDLm dictionary provides a mechanism intended to>> > support constraining table keys, and it uses it, albeit only once.  Removing>> > that ability without replacement would not only delete the ability it>> > supports, it would also change the semantics of the DDLm item that currently>> > *uses* that ability.>> >>> > I am inclined to suppose that one reason tables are rarely used in the>> > current dictionaries is that the item descriptions in the 2012 DDLm>> > dictionary do a poor job of explaining how to define items taking tables as>> > their values, especially with respect to constraints.  Furthermore, all of>> > the current dictionaries -- even DDLm -- spring from a history and>> > dictionary development tradition that hadn't table values to rely on until>> > now, so it is not surprising that DDLm versions of those dictionaries have>> > little reliance on tables.  That does not mean that tables cannot serve more>> > prominently in future dictionaries, or future versions of the current>> > dictionaries.>> >>> >> > 5. Add a mechanism to allow items' content type to be defined by>> >> > reference to another item.  This could be signaled by a new code for>> >> > _type.contents, with a new attribute defining which other item’s type is to>> >> > be used.  I don’t think that the existing contents code 'Inherited' can>> >> > serve this purpose, but perhaps I’m mistaken.>> >>> >> This is an intriguing idea.  As it happens, the demonstration DDLm>> >> dictionaries introduce setting the type of an item based on the type of a>> >> different item using a dREL-like function (although I have replaced these>> >> with explicit types in the latest version of the new cif_core dictionary).>> >> Your suggestion replaces this by a non-dREL approach, which is in general>> >> desirable for simple applications.  To check that I've understood your>> >> (corrected) example:>> >> (1) the elements of the _import.get List are items of the same type as>> >> _import.get_contents_type>> >>> > Yes.>> >>> >> (2) _import.get_contents_type is a Table, so _type.contents for it is>> >> the type of values in the table i.e. Text>> >>> > Yes.>> >>> >> (3) The possible key values are given by the possible values taken by>> >> the _type.key_type_reference dataname>> >>> > Yes, in this case.  My idea is that _type.keys would be parallel to>> > _type.contents, so that, for example, it might also take the value 'Code' or>> > 'Date' or 'Text' or an extension type, and in that case not rely on a>> > reference to a separate item definition.>> >>> >> We have two new 'internal' DDLm attributes as a result, as well as the>> >> new _type.keys, _type.key_content_reference and _type.key_type_reference>> >> datanames for a total of 5 new attributes.>> >>> > Those aren't exactly the data names I proposed, but yes, that's the way>> > my proposal plays out for DDLm.>> >>> >>  If we put the key list into the definition to which it relates, we can>> >> cut down on the number of new attributes, e.g:>> >> save_import.get_contents_type>> >>    # ...>> >>    _type.purpose             'Internal'>> >>    _type.container           'Table'>> >>    _type.contents            'Text'>> >>    loop_>> >>      _table_key_set.state>> >>      _table_key_set.detail>> >>        'file' 'filename/URI of source dictionary'>> >>        'save' 'save framecode of source definition'>> >>        'mode' 'mode for including save frames'>> >>        'dupl' 'option for duplicate entries'>> >>        'miss' 'option for missing duplicate entries'>> >> save_>> >>> > Yes, that would be a viable alternative to support the needs of DDLm>> > itself.  It would reduce the number of new items needed from 3 to 2 (the two>> > other proposed new items being related to defining table *contents* by>> > reference, which is a separate issue).  The statistics look different for>> > dictionaries other than DDLm itself.>> >>> > Your alternative appears to be roughly what I described in passing as>> > "to establish a parallel to the _enumeration_set category for constraining>> > key values."  Although it serves DDLm's own needs just fine, it may be too>> > restrictive for other dictionaries that want to define (and constrain)>> > tables, as it supports only enumerable sets of keys.  In some other uses one>> > might instead want to constrain keys to the same form that (for values) is>> > represented by _type.contents = 'Date' or 'Version' or some extension type,>> > where it is not possible to enumerate all possible keys.>> >>> >> which results in new attributes _type.key_content_reference,>> >> _table_key_set.state and _table_key_set.detail with one internal attribute>> >> _import.get_contents_type, and also reduces the non-locality of the>> >> definition - that is, one less reference to track through the file.>> >> _import.get is admittedly an extreme example, because it is the only>> >> occurrence of a list of tables rather than just a table, which is what>> >> requires the creation of the 'internal' data attribute.>> >>> > Yes and no.  The creation of the new 'Internal' value for _type.purpose>> > and of items that use it are more a consequence of my approach to lightening>> > the load on _type.dimension, whose current description and use appear to>> > task it with providing a complete layout of values of the item being>> > defined.  Note in particular the dimension specified in the current>> > definition of _import.get: '[{}]'.  I don't think we want to continue in>> > that direction.>> >>> > The structure of _import.get's values does not inherently require>> > internal types to be defined under my proposed structure.  If there were an>> > ordinary item in the dictionary that had the wanted type of the elements of>> > an _import.get list, then that type could be referenced instead of an>> > internal one.  I can imagine circumstances under which such a reference>> > would even be sensible.>> >>> >>  It is, however, a nice demonstration of how the attributes might work>> >> for future dictionary writers.  The new 'internal' dataname does have some>> >> meaning along the lines of 'a single import instruction' so a better>> >> dataname might be _import.single.>> >>> > Sure, that name would be fine with me.>> >>> >>  Is there any reason that you introduced a reference in order to>> >> specify the table keys?>> >>> > I introduced a reference in order to specify table keys so as to provide>> > for more alternatives than an enumeration of possible keys, while minimizing>> > the number of new DDLm items required.  Also, inasmuch as I was already>> > proposing type-by-reference for values, it seemed consistent to follow a>> > parallel approach for key constraints.>> >>> >>  And do you agree that the alternative I've proposed above would also>> >> be sufficient?>> >>> > I agree that your alternative would be sufficient *for DDLm itself*, but>> > I would prefer more flexibility to be available to other dictionaries.>> > Because DDLm itself will be harder to change than other DDLm dictionaries, I>> > would like to avoid it being overly restrictive.  At the same time, I don't>> > think we need to go crazy by trying to make DDLm capable of defining>> > completely arbitrary CIF2 data structures.  I have tried to choose a happy>> > medium that is minimally disruptive for existing DDLm dictionaries and>> > software.>> >>> >> On a final note for _import.get, the dREL is broken as it assumes that>> >> there is only one value for each of the constituent _import datanames, which>> >> would make a list superfluous (only one element), but what it really wants>> >> to do is to create a list from a loop of _import.file etc. values.  To do>> >> this it needs a sequence number, which isn't defined.  Once this *is*>> >> defined, we could alternatively present the import instructions as a loop>> >> over _import.sequence and _import.single, or else _import.seqence,>> >> _import.file etc.>> >>> > I can't say I'm much surprised.  _import.get shows evidence of having>> > gone through a change at some point, and I don't think that was fully and>> > consistently implemented.  I note in particular that its description (in the>> > 2012 version) is "A table of attributes [...]", not "A list of tables of>> > attributes [...]" or similar.  I also note that its _type.container is given>> > as 'List[Table]', which is not among the enumerated alternatives for values>> > of that attribute.>> >>> > As for the dREL, though, why do you need a sequence number, and / or why>> > can the dREL not generate one itself as it iterates over the values of>> > _import.get?  Given that each value is a table providing the attributes>> > describing one import; co-occurrence in the same table already associates>> > the various attributes of each import together.>> >>> >> To wrap up, I like the suggestion of a _type.contents that can work by>> >> reference to another dataname.  I don't see a particular need for a similar>> >> reference for table keys, nor do I particularly think explicitly specifying>> >> the keys is likely to be that useful, but I'm not against adding this>> >> capability.  We envisage adding quite a few other attributes later on to>> >> improve DDL2 - DDLm translation anyway.>> >>> > I'm glad you like the idea of defining content type by reference.  I>> > hope I've persuaded you about the keys, but even if not, I still think that>> > the ability to define machine-readable specifications of allowed keys is>> > important.  I'm not hung up on the exact implementation I proposed, however.>> >>> >>> > Cheers,>> >>> > John>> >>> > -->> > John C. Bollinger, Ph.D.>> > Computing and X-Ray Scientist>> > Department of Structural Biology>> > St. Jude Children's Research Hospital>> > John.Bollinger@StJude.org>> > (901) 595-3166 [office]>> > www.stjude.org>> >>> >>> >>> > _______________________________________________>> > ddlm-group mailing list>> > ddlm-group@iucr.org>> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>> _______________________________________________>> ddlm-group mailing list>> ddlm-group@iucr.org>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>>>>> --> T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148>> _______________________________________________> ddlm-group mailing list> ddlm-group@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.