[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] _enumerated_set.table_id
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] _enumerated_set.table_id
- From: "Herbert J. Bernstein" <yayahjb@gmail.com>
- Date: Thu, 23 Apr 2015 05:52:07 -0400
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:in-reply-to:references:date:message-id:subject:from:to:content-type:content-transfer-encoding;bh=PP40VuzcIodaatnE8Etx6sL7nIQKvl+hfRDjg9qNqek=;b=K0DRd02DOFqgbtnZBBwccsiINtPraEyDHn5FCUHGGwMdyB/U6+s85iP8dM9YKgFjI94BURmToiHn93o9xtIKTfFaS86tTB2t2pliQQ99c44CHhDeI4FduaoY1ogIEx9WjnE5X402GymrPhtmx6VFnVooaLdlMLXUf/eIFSxSWOBnH32ejbvkbgTbDYx2s+YDoSneoWRd8T7rC6ofG+SQDRYy5Wm8vH24XWUt8hJNDGFoxlWy8BQx+zjgpb3tuQG63T4lKG0DRSLqa8vBvYOknaZkJwRzYBWpwFMKD7ZwqYXTRBHmxVMSk/sNoWimiSOZak6Hd2ueS56Mzi2RZ06ovg==
- In-Reply-To: <CAM+dB2eCdu-Orz5iiigw3exnoas1Spzw4o50yxcey6V-qcV6wg@mail.gmail.com>
- References: <CAM+dB2ecOvjBT8OnV2tLy6rpJF2s=j4mLwJ09+x9AePUiByyXQ@mail.gmail.com><BY2PR0401MB0936963785E7A96CE3BBDE7AE0E00@BY2PR0401MB0936.namprd04.prod.outlook.com><CAM+dB2covps-EK0K-kpz9j_E1nJUmVvdNraHTSGJXKst=mo=SQ@mail.gmail.com><BY2PR0401MB0936F054069A9F1DEA47ABD7E0EF0@BY2PR0401MB0936.namprd04.prod.outlook.com><CAM+dB2ffNROJiGQQpL+ySL96MQaR=kgJ90MnYXXU70DEOTk5AQ@mail.gmail.com><BN1PR0401MB093231E98CCA3D98C5165B8DE0EE0@BN1PR0401MB0932.namprd04.prod.outlook.com><CABcsX24NNmfvacYYwRTBYaTysC=M_F9tyyQcDApT2A59ydDn6A@mail.gmail.com><CAM+dB2eCdu-Orz5iiigw3exnoas1Spzw4o50yxcey6V-qcV6wg@mail.gmail.com>
May I suggest maintaining a clear distinction, at least bycapitalizing the CIF2 type, or better, by referring to it as adictionary type, as in Python? On Wed, Apr 22, 2015 at 11:25 PM, James Hester <jamesrhester@gmail.com> wrote:> Hi Herbert - the very important point here is that we are talking about the> 'Table' type in CIF2 i.e. {"key1":value "key2":value}, and most certainly> not 'table' in the sense of 'relational database table' (although you will> appreciate the very close relationship between the two datastructures).>> all the best,> James.>> On Thu, Apr 23, 2015 at 1:35 AM, Herbert J. Bernstein <yayahjb@gmail.com>> wrote:>>>> Dear Colleagues,>>>> I am puzzled by the idea of constraints on table keys distinct from>> the constraints on the values and types for table columns. From a>> database perspective, a table key is just a set of one or more columns>> that uniquely identify rows in a table by their contents. If a column>> has been designated as a key or as a member of a composite key, the>> normal practice is to use the type and value constraints of the column>> as the only constraints on what you are allowed to use. Please>> explain what is gained by having additional constraints specified? I>> would suggest we keep as close to a relational model for CIF2 tables>> as possible.>>>> Regards,>> Herbert>>>> On Wed, Apr 22, 2015 at 11:14 AM, Bollinger, John C>> <John.Bollinger@stjude.org> wrote:>> > Hi James,>> >>> > Comments inline below. ((Lack of) formatting thanks to stupid Microsoft>> > limitations.)>> >>> >> > 4. Add a replacement mechanism to define constraints on table keys.>> >> > It might be sufficient, and consistent with the apparent intent of the>> >> > current dictionary, to establish a parallel to the _enumeration_set category>> >> > for constraining key values, maybe _key_enumeration_set. It would be a>> >> > smaller change at the dictionary level, however, to add a mechanism by which>> >> > constraints on key type could be defined by reference to the type of another>> >> > item (see also next).>> >>>> >> What is the advantage of being able to validate key strings?>> >>> > What is the advantage of validating *anything*? If there is a>> > constraint on document form and content then one would like to be able to>> > determine whether instance documents comply with that constraint. It can be>> > useful to perform such validation for its own sake, or programs can validate>> > up front in order to minimize or eliminate the need to sprinkle hand-rolled>> > validity testing throughout their implementation code.>> >>> > I suppose the real question is about the advantage of defining>> > constraints on table keys in the first place. There are all sorts of>> > possible examples, but for now let's stick with _input.get. In each element>> > (a table) of the list value of that attribute, a few specific possible keys>> > are meaningful, and all others are meaningless / erroneous. We might like>> > to be able to diagnose key misspellings in those tables. We might like to>> > be able to process the values as lists of (key, value) pairs without fear>> > that any of the keys are invalid. We might simply like to provide a>> > machine-readable definition of which keys are meaningful / allowed.>> >>> >> As outlined in my previous email, I don't see that validating the keys>> >> will have much benefit as tables are rarely used. That aside, simply>> >> introducing an extra DDLm attribute is OK, especially as we are dropping>> >> _enumeration_set.table_id we are not enlarging DDLm.>> >>> > If it were going to require a great deal of additional work and>> > complexity to provide for constraints on table keys then I would hesitate to>> > suggest doing so. I don't think that's the case.>> >>> > As it is, the current DDLm dictionary provides a mechanism intended to>> > support constraining table keys, and it uses it, albeit only once. Removing>> > that ability without replacement would not only delete the ability it>> > supports, it would also change the semantics of the DDLm item that currently>> > *uses* that ability.>> >>> > I am inclined to suppose that one reason tables are rarely used in the>> > current dictionaries is that the item descriptions in the 2012 DDLm>> > dictionary do a poor job of explaining how to define items taking tables as>> > their values, especially with respect to constraints. Furthermore, all of>> > the current dictionaries -- even DDLm -- spring from a history and>> > dictionary development tradition that hadn't table values to rely on until>> > now, so it is not surprising that DDLm versions of those dictionaries have>> > little reliance on tables. That does not mean that tables cannot serve more>> > prominently in future dictionaries, or future versions of the current>> > dictionaries.>> >>> >> > 5. Add a mechanism to allow items' content type to be defined by>> >> > reference to another item. This could be signaled by a new code for>> >> > _type.contents, with a new attribute defining which other item’s type is to>> >> > be used. I don’t think that the existing contents code 'Inherited' can>> >> > serve this purpose, but perhaps I’m mistaken.>> >>> >> This is an intriguing idea. As it happens, the demonstration DDLm>> >> dictionaries introduce setting the type of an item based on the type of a>> >> different item using a dREL-like function (although I have replaced these>> >> with explicit types in the latest version of the new cif_core dictionary).>> >> Your suggestion replaces this by a non-dREL approach, which is in general>> >> desirable for simple applications. To check that I've understood your>> >> (corrected) example:>> >> (1) the elements of the _import.get List are items of the same type as>> >> _import.get_contents_type>> >>> > Yes.>> >>> >> (2) _import.get_contents_type is a Table, so _type.contents for it is>> >> the type of values in the table i.e. Text>> >>> > Yes.>> >>> >> (3) The possible key values are given by the possible values taken by>> >> the _type.key_type_reference dataname>> >>> > Yes, in this case. My idea is that _type.keys would be parallel to>> > _type.contents, so that, for example, it might also take the value 'Code' or>> > 'Date' or 'Text' or an extension type, and in that case not rely on a>> > reference to a separate item definition.>> >>> >> We have two new 'internal' DDLm attributes as a result, as well as the>> >> new _type.keys, _type.key_content_reference and _type.key_type_reference>> >> datanames for a total of 5 new attributes.>> >>> > Those aren't exactly the data names I proposed, but yes, that's the way>> > my proposal plays out for DDLm.>> >>> >> If we put the key list into the definition to which it relates, we can>> >> cut down on the number of new attributes, e.g:>> >> save_import.get_contents_type>> >> # ...>> >> _type.purpose 'Internal'>> >> _type.container 'Table'>> >> _type.contents 'Text'>> >> loop_>> >> _table_key_set.state>> >> _table_key_set.detail>> >> 'file' 'filename/URI of source dictionary'>> >> 'save' 'save framecode of source definition'>> >> 'mode' 'mode for including save frames'>> >> 'dupl' 'option for duplicate entries'>> >> 'miss' 'option for missing duplicate entries'>> >> save_>> >>> > Yes, that would be a viable alternative to support the needs of DDLm>> > itself. It would reduce the number of new items needed from 3 to 2 (the two>> > other proposed new items being related to defining table *contents* by>> > reference, which is a separate issue). The statistics look different for>> > dictionaries other than DDLm itself.>> >>> > Your alternative appears to be roughly what I described in passing as>> > "to establish a parallel to the _enumeration_set category for constraining>> > key values." Although it serves DDLm's own needs just fine, it may be too>> > restrictive for other dictionaries that want to define (and constrain)>> > tables, as it supports only enumerable sets of keys. In some other uses one>> > might instead want to constrain keys to the same form that (for values) is>> > represented by _type.contents = 'Date' or 'Version' or some extension type,>> > where it is not possible to enumerate all possible keys.>> >>> >> which results in new attributes _type.key_content_reference,>> >> _table_key_set.state and _table_key_set.detail with one internal attribute>> >> _import.get_contents_type, and also reduces the non-locality of the>> >> definition - that is, one less reference to track through the file.>> >> _import.get is admittedly an extreme example, because it is the only>> >> occurrence of a list of tables rather than just a table, which is what>> >> requires the creation of the 'internal' data attribute.>> >>> > Yes and no. The creation of the new 'Internal' value for _type.purpose>> > and of items that use it are more a consequence of my approach to lightening>> > the load on _type.dimension, whose current description and use appear to>> > task it with providing a complete layout of values of the item being>> > defined. Note in particular the dimension specified in the current>> > definition of _import.get: '[{}]'. I don't think we want to continue in>> > that direction.>> >>> > The structure of _import.get's values does not inherently require>> > internal types to be defined under my proposed structure. If there were an>> > ordinary item in the dictionary that had the wanted type of the elements of>> > an _import.get list, then that type could be referenced instead of an>> > internal one. I can imagine circumstances under which such a reference>> > would even be sensible.>> >>> >> It is, however, a nice demonstration of how the attributes might work>> >> for future dictionary writers. The new 'internal' dataname does have some>> >> meaning along the lines of 'a single import instruction' so a better>> >> dataname might be _import.single.>> >>> > Sure, that name would be fine with me.>> >>> >> Is there any reason that you introduced a reference in order to>> >> specify the table keys?>> >>> > I introduced a reference in order to specify table keys so as to provide>> > for more alternatives than an enumeration of possible keys, while minimizing>> > the number of new DDLm items required. Also, inasmuch as I was already>> > proposing type-by-reference for values, it seemed consistent to follow a>> > parallel approach for key constraints.>> >>> >> And do you agree that the alternative I've proposed above would also>> >> be sufficient?>> >>> > I agree that your alternative would be sufficient *for DDLm itself*, but>> > I would prefer more flexibility to be available to other dictionaries.>> > Because DDLm itself will be harder to change than other DDLm dictionaries, I>> > would like to avoid it being overly restrictive. At the same time, I don't>> > think we need to go crazy by trying to make DDLm capable of defining>> > completely arbitrary CIF2 data structures. I have tried to choose a happy>> > medium that is minimally disruptive for existing DDLm dictionaries and>> > software.>> >>> >> On a final note for _import.get, the dREL is broken as it assumes that>> >> there is only one value for each of the constituent _import datanames, which>> >> would make a list superfluous (only one element), but what it really wants>> >> to do is to create a list from a loop of _import.file etc. values. To do>> >> this it needs a sequence number, which isn't defined. Once this *is*>> >> defined, we could alternatively present the import instructions as a loop>> >> over _import.sequence and _import.single, or else _import.seqence,>> >> _import.file etc.>> >>> > I can't say I'm much surprised. _import.get shows evidence of having>> > gone through a change at some point, and I don't think that was fully and>> > consistently implemented. I note in particular that its description (in the>> > 2012 version) is "A table of attributes [...]", not "A list of tables of>> > attributes [...]" or similar. I also note that its _type.container is given>> > as 'List[Table]', which is not among the enumerated alternatives for values>> > of that attribute.>> >>> > As for the dREL, though, why do you need a sequence number, and / or why>> > can the dREL not generate one itself as it iterates over the values of>> > _import.get? Given that each value is a table providing the attributes>> > describing one import; co-occurrence in the same table already associates>> > the various attributes of each import together.>> >>> >> To wrap up, I like the suggestion of a _type.contents that can work by>> >> reference to another dataname. I don't see a particular need for a similar>> >> reference for table keys, nor do I particularly think explicitly specifying>> >> the keys is likely to be that useful, but I'm not against adding this>> >> capability. We envisage adding quite a few other attributes later on to>> >> improve DDL2 - DDLm translation anyway.>> >>> > I'm glad you like the idea of defining content type by reference. I>> > hope I've persuaded you about the keys, but even if not, I still think that>> > the ability to define machine-readable specifications of allowed keys is>> > important. I'm not hung up on the exact implementation I proposed, however.>> >>> >>> > Cheers,>> >>> > John>> >>> > -->> > John C. Bollinger, Ph.D.>> > Computing and X-Ray Scientist>> > Department of Structural Biology>> > St. Jude Children's Research Hospital>> > John.Bollinger@StJude.org>> > (901) 595-3166 [office]>> > www.stjude.org>> >>> >>> >>> > _______________________________________________>> > ddlm-group mailing list>> > ddlm-group@iucr.org>> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>> _______________________________________________>> ddlm-group mailing list>> ddlm-group@iucr.org>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>>>>> --> T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148>> _______________________________________________> ddlm-group mailing list> ddlm-group@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] _enumerated_set.table_id (Bollinger, John C)
- References:
- [ddlm-group] _enumerated_set.table_id (James Hester)
- Re: [ddlm-group] _enumerated_set.table_id (Bollinger, John C)
- Re: [ddlm-group] _enumerated_set.table_id (James Hester)
- Re: [ddlm-group] _enumerated_set.table_id (Bollinger, John C)
- Re: [ddlm-group] _enumerated_set.table_id (James Hester)
- Re: [ddlm-group] _enumerated_set.table_id (Bollinger, John C)
- Re: [ddlm-group] _enumerated_set.table_id (Herbert J. Bernstein)
- Re: [ddlm-group] _enumerated_set.table_id (James Hester)
- Prev by Date: Re: [ddlm-group] _enumerated_set.table_id
- Next by Date: Re: [ddlm-group] _enumerated_set.table_id
- Prev by thread: Re: [ddlm-group] _enumerated_set.table_id
- Next by thread: Re: [ddlm-group] _enumerated_set.table_id
- Index(es):