[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] _enumerated_set.table_id

Hi John B,

I'm happy to go with your original proposal (i.e. including a reference link where the set of allowable Table keys is specified).  I suggest that you and I take this offline in order to put together a formal proposal which would include writing out the full DDLm definitions of the new DDLm attributes, and a list of any consequent changes to the current DDLm dictionary, which we can then post here for any final comment.

all the best,
James.

On Thu, Apr 23, 2015 at 1:14 AM, Bollinger, John C <John.Bollinger@stjude.org> wrote:
Hi James,

Comments inline below.  ((Lack of) formatting thanks to stupid Microsoft limitations.)
 
> > 4. Add a replacement mechanism to define constraints on table keys.  It might be sufficient, and consistent with the apparent intent of the current dictionary, to establish a parallel to the _enumeration_set category for constraining key values, maybe _key_enumeration_set.  It would be a smaller change at the dictionary level, however, to add a mechanism by which constraints on key type could be defined by reference to the type of another item (see also next).
>
> What is the advantage of being able to validate key strings?

What is the advantage of validating *anything*?  If there is a constraint on document form and content then one would like to be able to determine whether instance documents comply with that constraint.  It can be useful to perform such validation for its own sake, or programs can validate up front in order to minimize or eliminate the need to sprinkle hand-rolled validity testing throughout their implementation code.

I suppose the real question is about the advantage of defining constraints on table keys in the first place.  There are all sorts of possible examples, but for now let's stick with _input.get.  In each element (a table) of the list value of that attribute, a few specific possible keys are meaningful, and all others are meaningless / erroneous.  We might like to be able to diagnose key misspellings in those tables.  We might like to be able to process the values as lists of (key, value) pairs without fear that any of the keys are invalid.  We might simply like to provide a machine-readable definition of which keys are meaningful / allowed.

>  As outlined in my previous email, I don't see that validating the keys will have much benefit as tables are rarely used.  That aside, simply introducing an extra DDLm attribute is OK, especially as we are dropping _enumeration_set.table_id we are not enlarging DDLm.

If it were going to require a great deal of additional work and complexity to provide for constraints on table keys then I would hesitate to suggest doing so.  I don't think that's the case.

As it is, the current DDLm dictionary provides a mechanism intended to support constraining table keys, and it uses it, albeit only once.  Removing that ability without replacement would not only delete the ability it supports, it would also change the semantics of the DDLm item that currently *uses* that ability.

I am inclined to suppose that one reason tables are rarely used in the current dictionaries is that the item descriptions in the 2012 DDLm dictionary do a poor job of explaining how to define items taking tables as their values, especially with respect to constraints.  Furthermore, all of the current dictionaries -- even DDLm -- spring from a history and dictionary development tradition that hadn't table values to rely on until now, so it is not surprising that DDLm versions of those dictionaries have little reliance on tables.  That does not mean that tables cannot serve more prominently in future dictionaries, or future versions of the current dictionaries.

> > 5. Add a mechanism to allow items' content type to be defined by reference to another item.  This could be signaled by a new code for _type.contents, with a new attribute defining which other item’s type is to be used.  I don’t think that the existing contents code 'Inherited' can serve this purpose, but perhaps I’m mistaken.
 
> This is an intriguing idea.  As it happens, the demonstration DDLm dictionaries introduce setting the type of an item based on the type of a different item using a dREL-like function (although I have replaced these with explicit types in the latest version of the new cif_core dictionary).  Your suggestion replaces this by a non-dREL approach, which is in general desirable for simple applications.  To check that I've understood your (corrected) example:
> (1) the elements of the _import.get List are items of the same type as _import.get_contents_type

Yes.

> (2) _import.get_contents_type is a Table, so _type.contents for it is the type of values in the table i.e. Text 

Yes.

> (3) The possible key values are given by the possible values taken by the _type.key_type_reference dataname

Yes, in this case.  My idea is that _type.keys would be parallel to _type.contents, so that, for example, it might also take the value 'Code' or 'Date' or 'Text' or an extension type, and in that case not rely on a reference to a separate item definition.

> We have two new 'internal' DDLm attributes as a result, as well as the new _type.keys, _type.key_content_reference and _type.key_type_reference datanames for a total of 5 new attributes.

Those aren't exactly the data names I proposed, but yes, that's the way my proposal plays out for DDLm.

>  If we put the key list into the definition to which it relates, we can cut down on the number of new attributes, e.g:
> save_import.get_contents_type
>    # ...
>    _type.purpose             'Internal'
>    _type.container           'Table'
>    _type.contents            'Text'
>    loop_
>      _table_key_set.state
>      _table_key_set.detail 
>        'file' 'filename/URI of source dictionary'
>        'save' 'save framecode of source definition'
>        'mode' 'mode for including save frames'
>        'dupl' 'option for duplicate entries'
>        'miss' 'option for missing duplicate entries'
> save_
 
Yes, that would be a viable alternative to support the needs of DDLm itself.  It would reduce the number of new items needed from 3 to 2 (the two other proposed new items being related to defining table *contents* by reference, which is a separate issue).  The statistics look different for dictionaries other than DDLm itself.

Your alternative appears to be roughly what I described in passing as "to establish a parallel to the _enumeration_set category for constraining key values."  Although it serves DDLm's own needs just fine, it may be too restrictive for other dictionaries that want to define (and constrain) tables, as it supports only enumerable sets of keys.  In some other uses one might instead want to constrain keys to the same form that (for values) is represented by _type.contents = 'Date' or 'Version' or some extension type, where it is not possible to enumerate all possible keys.

> which results in new attributes _type.key_content_reference, _table_key_set.state and _table_key_set.detail with one internal attribute _import.get_contents_type, and also reduces the non-locality of the definition - that is, one less reference to track through the file.  _import.get is admittedly an extreme example, because it is the only occurrence of a list of tables rather than just a table, which is what requires the creation of the 'internal' data attribute.

Yes and no.  The creation of the new 'Internal' value for _type.purpose and of items that use it are more a consequence of my approach to lightening the load on _type.dimension, whose current description and use appear to task it with providing a complete layout of values of the item being defined.  Note in particular the dimension specified in the current definition of _import.get: '[{}]'.  I don't think we want to continue in that direction.

The structure of _import.get's values does not inherently require internal types to be defined under my proposed structure.  If there were an ordinary item in the dictionary that had the wanted type of the elements of an _import.get list, then that type could be referenced instead of an internal one.  I can imagine circumstances under which such a reference would even be sensible.

>  It is, however, a nice demonstration of how the attributes might work for future dictionary writers.  The new 'internal' dataname does have some meaning along the lines of 'a single import instruction' so a better dataname might be _import.single.

Sure, that name would be fine with me.

>  Is there any reason that you introduced a reference in order to specify the table keys?

I introduced a reference in order to specify table keys so as to provide for more alternatives than an enumeration of possible keys, while minimizing the number of new DDLm items required.  Also, inasmuch as I was already proposing type-by-reference for values, it seemed consistent to follow a parallel approach for key constraints.

>  And do you agree that the alternative I've proposed above would also be sufficient?

I agree that your alternative would be sufficient *for DDLm itself*, but I would prefer more flexibility to be available to other dictionaries.  Because DDLm itself will be harder to change than other DDLm dictionaries, I would like to avoid it being overly restrictive.  At the same time, I don't think we need to go crazy by trying to make DDLm capable of defining completely arbitrary CIF2 data structures.  I have tried to choose a happy medium that is minimally disruptive for existing DDLm dictionaries and software.

> On a final note for _import.get, the dREL is broken as it assumes that there is only one value for each of the constituent _import datanames, which would make a list superfluous (only one element), but what it really wants to do is to create a list from a loop of _import.file etc. values.  To do this it needs a sequence number, which isn't defined.  Once this *is* defined, we could alternatively present the import instructions as a loop over _import.sequence and _import.single, or else _import.seqence, _import.file etc.

I can't say I'm much surprised.  _import.get shows evidence of having gone through a change at some point, and I don't think that was fully and consistently implemented.  I note in particular that its description (in the 2012 version) is "A table of attributes [...]", not "A list of tables of attributes [...]" or similar.  I also note that its _type.container is given as 'List[Table]', which is not among the enumerated alternatives for values of that attribute.

As for the dREL, though, why do you need a sequence number, and / or why can the dREL not generate one itself as it iterates over the values of _import.get?  Given that each value is a table providing the attributes describing one import; co-occurrence in the same table already associates the various attributes of each import together.

> To wrap up, I like the suggestion of a _type.contents that can work by reference to another dataname.  I don't see a particular need for a similar reference for table keys, nor do I particularly think explicitly specifying the keys is likely to be that useful, but I'm not against adding this capability.  We envisage adding quite a few other attributes later on to improve DDL2 - DDLm translation anyway.

I'm glad you like the idea of defining content type by reference.  I hope I've persuaded you about the keys, but even if not, I still think that the ability to define machine-readable specifications of allowed keys is important.  I'm not hung up on the exact implementation I proposed, however.


Cheers,

John

--
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
John.Bollinger@StJude.org
(901) 595-3166 [office]
www.stjude.org



_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]