[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] _enumerated_set.table_id
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] _enumerated_set.table_id
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Thu, 23 Apr 2015 14:51:27 +0000
- Accept-Language: en-US
- authentication-results: iucr.org; dkim=none (message not signed) header.d=none;
- In-Reply-To: <CABcsX25QwBm6bQR6RQYQ=NNDGSu271bgpDKs7a++TbxJ6VsK+g@mail.gmail.com>
- References: <CAM+dB2ecOvjBT8OnV2tLy6rpJF2s=j4mLwJ09+x9AePUiByyXQ@mail.gmail.com><BY2PR0401MB0936963785E7A96CE3BBDE7AE0E00@BY2PR0401MB0936.namprd04.prod.outlook.com><CAM+dB2covps-EK0K-kpz9j_E1nJUmVvdNraHTSGJXKst=mo=SQ@mail.gmail.com><BY2PR0401MB0936F054069A9F1DEA47ABD7E0EF0@BY2PR0401MB0936.namprd04.prod.outlook.com><CAM+dB2ffNROJiGQQpL+ySL96MQaR=kgJ90MnYXXU70DEOTk5AQ@mail.gmail.com><BN1PR0401MB093231E98CCA3D98C5165B8DE0EE0@BN1PR0401MB0932.namprd04.prod.outlook.com><CABcsX24NNmfvacYYwRTBYaTysC=M_F9tyyQcDApT2A59ydDn6A@mail.gmail.com><CAM+dB2eCdu-Orz5iiigw3exnoas1Spzw4o50yxcey6V-qcV6wg@mail.gmail.com><CABcsX25QwBm6bQR6RQYQ=NNDGSu271bgpDKs7a++TbxJ6VsK+g@mail.gmail.com>
For better or for worse, "Table" is the CIF2 term for this data structure. I do not think introducing an alias at this point would serve the interest of clarity, but I will try to remember to capitalize when I use the word the CIF2 sense. John > -----Original Message----- > From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of > Herbert J. Bernstein > Sent: Thursday, April 23, 2015 4:52 AM > To: Group finalising DDLm and associated dictionaries > Subject: Re: [ddlm-group] _enumerated_set.table_id > > May I suggest maintaining a clear distinction, at least by capitalizing the CIF2 > type, or better, by referring to it as a dictionary type, as in Python? > > On Wed, Apr 22, 2015 at 11:25 PM, James Hester <jamesrhester@gmail.com> > wrote: > > Hi Herbert - the very important point here is that we are talking > > about the 'Table' type in CIF2 i.e. {"key1":value "key2":value}, and > > most certainly not 'table' in the sense of 'relational database table' > > (although you will appreciate the very close relationship between the two > datastructures). > > > > all the best, > > James. > > > > On Thu, Apr 23, 2015 at 1:35 AM, Herbert J. Bernstein > > <yayahjb@gmail.com> > > wrote: > >> > >> Dear Colleagues, > >> > >> I am puzzled by the idea of constraints on table keys distinct > >> from the constraints on the values and types for table columns. From > >> a database perspective, a table key is just a set of one or more > >> columns that uniquely identify rows in a table by their contents. If > >> a column has been designated as a key or as a member of a composite > >> key, the normal practice is to use the type and value constraints of > >> the column as the only constraints on what you are allowed to use. > Please > >> explain what is gained by having additional constraints specified? I > >> would suggest we keep as close to a relational model for CIF2 tables > >> as possible. > >> > >> Regards, > >> Herbert > >> > >> On Wed, Apr 22, 2015 at 11:14 AM, Bollinger, John C > >> <John.Bollinger@stjude.org> wrote: > >> > Hi James, > >> > > >> > Comments inline below. ((Lack of) formatting thanks to stupid > >> > Microsoft > >> > limitations.) > >> > > >> >> > 4. Add a replacement mechanism to define constraints on table keys. > >> >> > It might be sufficient, and consistent with the apparent intent > >> >> > of the current dictionary, to establish a parallel to the > >> >> > _enumeration_set category for constraining key values, maybe > >> >> > _key_enumeration_set. It would be a smaller change at the > >> >> > dictionary level, however, to add a mechanism by which > >> >> > constraints on key type could be defined by reference to the type of > another item (see also next). > >> >> > >> >> What is the advantage of being able to validate key strings? > >> > > >> > What is the advantage of validating *anything*? If there is a > >> > constraint on document form and content then one would like to be > >> > able to determine whether instance documents comply with that > >> > constraint. It can be useful to perform such validation for its > >> > own sake, or programs can validate up front in order to minimize or > >> > eliminate the need to sprinkle hand-rolled validity testing throughout > their implementation code. > >> > > >> > I suppose the real question is about the advantage of defining > >> > constraints on table keys in the first place. There are all sorts > >> > of possible examples, but for now let's stick with _input.get. In > >> > each element (a table) of the list value of that attribute, a few > >> > specific possible keys are meaningful, and all others are > >> > meaningless / erroneous. We might like to be able to diagnose key > >> > misspellings in those tables. We might like to be able to process > >> > the values as lists of (key, value) pairs without fear that any of > >> > the keys are invalid. We might simply like to provide a machine- > readable definition of which keys are meaningful / allowed. > >> > > >> >> As outlined in my previous email, I don't see that validating the > >> >> keys will have much benefit as tables are rarely used. That > >> >> aside, simply introducing an extra DDLm attribute is OK, > >> >> especially as we are dropping _enumeration_set.table_id we are not > enlarging DDLm. > >> > > >> > If it were going to require a great deal of additional work and > >> > complexity to provide for constraints on table keys then I would > >> > hesitate to suggest doing so. I don't think that's the case. > >> > > >> > As it is, the current DDLm dictionary provides a mechanism intended > >> > to support constraining table keys, and it uses it, albeit only > >> > once. Removing that ability without replacement would not only > >> > delete the ability it supports, it would also change the semantics > >> > of the DDLm item that currently > >> > *uses* that ability. > >> > > >> > I am inclined to suppose that one reason tables are rarely used in > >> > the current dictionaries is that the item descriptions in the 2012 > >> > DDLm dictionary do a poor job of explaining how to define items > >> > taking tables as their values, especially with respect to > >> > constraints. Furthermore, all of the current dictionaries -- even > >> > DDLm -- spring from a history and dictionary development tradition > >> > that hadn't table values to rely on until now, so it is not > >> > surprising that DDLm versions of those dictionaries have little > >> > reliance on tables. That does not mean that tables cannot serve > >> > more prominently in future dictionaries, or future versions of the > current dictionaries. > >> > > >> >> > 5. Add a mechanism to allow items' content type to be defined by > >> >> > reference to another item. This could be signaled by a new code > >> >> > for _type.contents, with a new attribute defining which other > >> >> > item’s type is to be used. I don’t think that the existing > >> >> > contents code 'Inherited' can serve this purpose, but perhaps I’m > mistaken. > >> > > >> >> This is an intriguing idea. As it happens, the demonstration DDLm > >> >> dictionaries introduce setting the type of an item based on the > >> >> type of a different item using a dREL-like function (although I > >> >> have replaced these with explicit types in the latest version of the new > cif_core dictionary). > >> >> Your suggestion replaces this by a non-dREL approach, which is in > >> >> general desirable for simple applications. To check that I've > >> >> understood your > >> >> (corrected) example: > >> >> (1) the elements of the _import.get List are items of the same > >> >> type as _import.get_contents_type > >> > > >> > Yes. > >> > > >> >> (2) _import.get_contents_type is a Table, so _type.contents for it > >> >> is the type of values in the table i.e. Text > >> > > >> > Yes. > >> > > >> >> (3) The possible key values are given by the possible values taken > >> >> by the _type.key_type_reference dataname > >> > > >> > Yes, in this case. My idea is that _type.keys would be parallel to > >> > _type.contents, so that, for example, it might also take the value > >> > 'Code' or 'Date' or 'Text' or an extension type, and in that case > >> > not rely on a reference to a separate item definition. > >> > > >> >> We have two new 'internal' DDLm attributes as a result, as well as > >> >> the new _type.keys, _type.key_content_reference and > >> >> _type.key_type_reference datanames for a total of 5 new attributes. > >> > > >> > Those aren't exactly the data names I proposed, but yes, that's the > >> > way my proposal plays out for DDLm. > >> > > >> >> If we put the key list into the definition to which it relates, > >> >> we can cut down on the number of new attributes, e.g: > >> >> save_import.get_contents_type > >> >> # ... > >> >> _type.purpose 'Internal' > >> >> _type.container 'Table' > >> >> _type.contents 'Text' > >> >> loop_ > >> >> _table_key_set.state > >> >> _table_key_set.detail > >> >> 'file' 'filename/URI of source dictionary' > >> >> 'save' 'save framecode of source definition' > >> >> 'mode' 'mode for including save frames' > >> >> 'dupl' 'option for duplicate entries' > >> >> 'miss' 'option for missing duplicate entries' > >> >> save_ > >> > > >> > Yes, that would be a viable alternative to support the needs of > >> > DDLm itself. It would reduce the number of new items needed from 3 > >> > to 2 (the two other proposed new items being related to defining > >> > table *contents* by reference, which is a separate issue). The > >> > statistics look different for dictionaries other than DDLm itself. > >> > > >> > Your alternative appears to be roughly what I described in passing > >> > as "to establish a parallel to the _enumeration_set category for > >> > constraining key values." Although it serves DDLm's own needs just > >> > fine, it may be too restrictive for other dictionaries that want to > >> > define (and constrain) tables, as it supports only enumerable sets > >> > of keys. In some other uses one might instead want to constrain > >> > keys to the same form that (for values) is represented by > >> > _type.contents = 'Date' or 'Version' or some extension type, where it is > not possible to enumerate all possible keys. > >> > > >> >> which results in new attributes _type.key_content_reference, > >> >> _table_key_set.state and _table_key_set.detail with one internal > >> >> attribute _import.get_contents_type, and also reduces the > >> >> non-locality of the definition - that is, one less reference to track > through the file. > >> >> _import.get is admittedly an extreme example, because it is the > >> >> only occurrence of a list of tables rather than just a table, > >> >> which is what requires the creation of the 'internal' data attribute. > >> > > >> > Yes and no. The creation of the new 'Internal' value for > >> > _type.purpose and of items that use it are more a consequence of my > >> > approach to lightening the load on _type.dimension, whose current > >> > description and use appear to task it with providing a complete > >> > layout of values of the item being defined. Note in particular the > >> > dimension specified in the current definition of _import.get: > >> > '[{}]'. I don't think we want to continue in that direction. > >> > > >> > The structure of _import.get's values does not inherently require > >> > internal types to be defined under my proposed structure. If there > >> > were an ordinary item in the dictionary that had the wanted type of > >> > the elements of an _import.get list, then that type could be > >> > referenced instead of an internal one. I can imagine circumstances > >> > under which such a reference would even be sensible. > >> > > >> >> It is, however, a nice demonstration of how the attributes might > >> >> work for future dictionary writers. The new 'internal' dataname > >> >> does have some meaning along the lines of 'a single import > >> >> instruction' so a better dataname might be _import.single. > >> > > >> > Sure, that name would be fine with me. > >> > > >> >> Is there any reason that you introduced a reference in order to > >> >> specify the table keys? > >> > > >> > I introduced a reference in order to specify table keys so as to > >> > provide for more alternatives than an enumeration of possible keys, > >> > while minimizing the number of new DDLm items required. Also, > >> > inasmuch as I was already proposing type-by-reference for values, > >> > it seemed consistent to follow a parallel approach for key constraints. > >> > > >> >> And do you agree that the alternative I've proposed above would > >> >> also be sufficient? > >> > > >> > I agree that your alternative would be sufficient *for DDLm > >> > itself*, but I would prefer more flexibility to be available to other > dictionaries. > >> > Because DDLm itself will be harder to change than other DDLm > >> > dictionaries, I would like to avoid it being overly restrictive. > >> > At the same time, I don't think we need to go crazy by trying to > >> > make DDLm capable of defining completely arbitrary CIF2 data > >> > structures. I have tried to choose a happy medium that is > >> > minimally disruptive for existing DDLm dictionaries and software. > >> > > >> >> On a final note for _import.get, the dREL is broken as it assumes > >> >> that there is only one value for each of the constituent _import > >> >> datanames, which would make a list superfluous (only one element), > >> >> but what it really wants to do is to create a list from a loop of > >> >> _import.file etc. values. To do this it needs a sequence number, > >> >> which isn't defined. Once this *is* defined, we could > >> >> alternatively present the import instructions as a loop over > >> >> _import.sequence and _import.single, or else _import.seqence, > _import.file etc. > >> > > >> > I can't say I'm much surprised. _import.get shows evidence of > >> > having gone through a change at some point, and I don't think that > >> > was fully and consistently implemented. I note in particular that > >> > its description (in the > >> > 2012 version) is "A table of attributes [...]", not "A list of > >> > tables of attributes [...]" or similar. I also note that its > >> > _type.container is given as 'List[Table]', which is not among the > >> > enumerated alternatives for values of that attribute. > >> > > >> > As for the dREL, though, why do you need a sequence number, and / > >> > or why can the dREL not generate one itself as it iterates over the > >> > values of _import.get? Given that each value is a table providing > >> > the attributes describing one import; co-occurrence in the same > >> > table already associates the various attributes of each import together. > >> > > >> >> To wrap up, I like the suggestion of a _type.contents that can > >> >> work by reference to another dataname. I don't see a particular > >> >> need for a similar reference for table keys, nor do I particularly > >> >> think explicitly specifying the keys is likely to be that useful, > >> >> but I'm not against adding this capability. We envisage adding > >> >> quite a few other attributes later on to improve DDL2 - DDLm > translation anyway. > >> > > >> > I'm glad you like the idea of defining content type by reference. > >> > I hope I've persuaded you about the keys, but even if not, I still > >> > think that the ability to define machine-readable specifications of > >> > allowed keys is important. I'm not hung up on the exact > implementation I proposed, however. > >> > > >> > > >> > Cheers, > >> > > >> > John > >> > > >> > -- > >> > John C. Bollinger, Ph.D. > >> > Computing and X-Ray Scientist > >> > Department of Structural Biology > >> > St. Jude Children's Research Hospital John.Bollinger@StJude.org > >> > (901) 595-3166 [office] > >> > www.stjude.org > >> > > >> > > >> > > >> > _______________________________________________ > >> > ddlm-group mailing list > >> > ddlm-group@iucr.org > >> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group > >> _______________________________________________ > >> ddlm-group mailing list > >> ddlm-group@iucr.org > >> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group > > > > > > > > > > -- > > T +61 (02) 9717 9907 > > F +61 (02) 9717 3145 > > M +61 (04) 0249 4148 > > > > _______________________________________________ > > ddlm-group mailing list > > ddlm-group@iucr.org > > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group _______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] _enumerated_set.table_id (Herbert J. Bernstein)
- References:
- [ddlm-group] _enumerated_set.table_id (James Hester)
- Re: [ddlm-group] _enumerated_set.table_id (Bollinger, John C)
- Re: [ddlm-group] _enumerated_set.table_id (James Hester)
- Re: [ddlm-group] _enumerated_set.table_id (Bollinger, John C)
- Re: [ddlm-group] _enumerated_set.table_id (James Hester)
- Re: [ddlm-group] _enumerated_set.table_id (Bollinger, John C)
- Re: [ddlm-group] _enumerated_set.table_id (Herbert J. Bernstein)
- Re: [ddlm-group] _enumerated_set.table_id (James Hester)
- Re: [ddlm-group] _enumerated_set.table_id (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] _enumerated_set.table_id
- Next by Date: Re: [ddlm-group] _enumerated_set.table_id
- Prev by thread: Re: [ddlm-group] _enumerated_set.table_id
- Next by thread: Re: [ddlm-group] _enumerated_set.table_id
- Index(es):