Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] _enumerated_set.table_id

Hi John, see inline comments.

On Wed, Apr 22, 2015 at 2:38 AM, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Hi James,

 

Yes, we agree that _enumeration_set.table_id can be dropped.  I am uncertain whether we agree about whether it should be replaced with something else.

 

I am prepared to accept these limitations on the data types that can be defined by a DDLm dictionary (including DDLm itself), if indeed DDLm itself and the other existing DDLm dictionaries can be expressed adequately under such constraints:

 

- The allowed types of values within a list cannot depend on their position in the list

- The allowed types of values within a table cannot depend on their associated keys

 

These assign primacy to categories / loops for defining complex, heterogeneous data, so that it is unnecessary (I think) to be able to define data types that use lists and / or tables analogously to C structs.


Yes, I agree with the above statements.

 

I am inclined to think that one of the greater weaknesses of the 2012 version of the DDLm dictionary is its provisions for defining complex data types.  They are somewhat inconsistent, and the provided definition text is unclear about exactly how one would go about defining complex data.  Moreover, if _type.dimension is intended to be the primary vehicle for defining complex internal structure then it must bear the weight of an entire schema language.  That seems to be exactly what it’s trying to do, but the details of that language are by no means adequately documented, and it seems an odd approach given that it’s hosted inside another language that itself can serve as a schema language.

 

This is what I think we should do:

 

1. Remove _enumeration_set.table_id.  It doesn’t work well for its intended purpose.

  

Agreed.
 

2. Redefine _type.dimension so that it is used only to specify the dimension(s) of values of items having _type.container in { 'List', 'Array', 'Matrix' }.  Relieve it of any responsibility for defining element types.  Possibly remove the ability to define ragged multi-dimensional arrays (which conflict with the proposed limitation that allowed types of values within a list cannot depend on their position in the list).


Agreed

 

3. Clarify that when _type.container has value 'Table', _type.contents defines the characteristics of the *values* in the table.


Agreed

 

4. Add a replacement mechanism to define constraints on table keys.  It might be sufficient, and consistent with the apparent intent of the current dictionary, to establish a parallel to the _enumeration_set category for constraining key values, maybe _key_enumeration_set.  It would be a smaller change at the dictionary level, however, to add a mechanism by which constraints on key type could be defined by reference to the type of another item (see also next).


What is the advantage of being able to validate key strings?  As outlined in my previous email, I don't see that validating the keys will have much benefit as tables are rarely used.  That aside, simply introducing an extra DDLm attribute is OK, especially as we are dropping _enumeration_set.table_id we are not enlarging DDLm.

 

5. Add a mechanism to allow items' content type to be defined by reference to another item.  This could be signaled by a new code for _type.contents, with a new attribute defining which other item’s type is to be used.  I don’t think that the existing contents code 'Inherited' can serve this purpose, but perhaps I’m mistaken.

 

This is an intriguing idea.  As it happens, the demonstration DDLm dictionaries introduce setting the type of an item based on the type of a different item using a dREL-like function (although I have replaced these with explicit types in the latest version of the new cif_core dictionary).  Your suggestion replaces this by a non-dREL approach, which is in general desirable for simple applications.  To check that I've understood your (corrected) example:
(1) the elements of the _import.get List are items of the same type as _import.get_contents_type
(2) _import.get_contents_type is a Table, so _type.contents for it is the type of values in the table i.e. Text 
(3) The possible key values are given by the possible values taken by the _type.key_type_reference dataname

We have two new 'internal' DDLm attributes as a result, as well as the new _type.keys, _type.key_content_reference and _type.key_type_reference datanames for a total of 5 new attributes.  If we put the key list into the definition to which it relates, we can cut down on the number of new attributes, e.g:

save_import.get_contents_type

    # ...

    _type.purpose             'Internal'

    _type.container           'Table'

    _type.contents            'Text'

    loop_

      _table_key_set.state

      _table_key_set.detail 

        'file' 'filename/URI of source dictionary'

        'save' 'save framecode of source definition'

        'mode' 'mode for including save frames'

        'dupl' 'option for duplicate entries'

        'miss' 'option for missing duplicate entries'

save_

 

which results in new attributes _type.key_content_reference, _table_key_set.state and _table_key_set.detail with one internal attribute _import.get_contents_type, and also reduces the non-locality of the definition - that is, one less reference to track through the file.  _import.get is admittedly an extreme example, because it is the only occurrence of a list of tables rather than just a table, which is what requires the creation of the 'internal' data attribute.  It is, however, a nice demonstration of how the attributes might work for future dictionary writers.  The new 'internal' dataname does have some meaning along the lines of 'a single import instruction' so a better dataname might be _import.single.  Is there any reason that you introduced a reference in order to specify the table keys?  And do you agree that the alternative I've proposed above would also be sufficient?

On a final note for _import.get, the dREL is broken as it assumes that there is only one value for each of the constituent _import datanames, which would make a list superfluous (only one element), but what it really wants to do is to create a list from a loop of _import.file etc. values.  To do this it needs a sequence number, which isn't defined.  Once this *is* defined, we could alternatively present the import instructions as a loop over _import.sequence and _import.single, or else _import.seqence, _import.file etc.

To wrap up, I like the suggestion of a _type.contents that can work by reference to another dataname.  I don't see a particular need for a similar reference for table keys, nor do I particularly think explicitly specifying the keys is likely to be that useful, but I'm not against adding this capability.  We envisage adding quite a few other attributes later on to improve DDL2 - DDLm translation anyway.

all the best,
James.


Allowing types of keys / values to be defined by reference to the types of other items raises the possibility that dictionaries will occasionally want to define items solely for the purpose of defining their content type for reference by other definitions.  I don’t think this is harmful, but it might be best supported by a new value for _type.purpose, as demonstrated below.

 

If all those changes were implemented then the definition for DDLm_import.get might be revised like so:

 

    _type.purpose             'Import'

    _type.container           'List'

    _type.contents            'Text'

    _type.keys                'ByReference'

    _type.key_type_reference  'import.get_key_type'

 

That would require addition of a new attribute to category IMPORT, its definition containing the following (among other necessary attributes not shown):

 

save_import.get_key_type

    # ...

    _type.purpose             'Internal'  # New value

    _type.container           'Single'

    _type.contents            'Code'

 

     loop_

    _enumeration_set.state

    _enumeration_set.detail

        'file' 'filename/URI of source dictionary'

        'save' 'save framecode of source definition'

        'mode' 'mode for including save frames'

        'dupl' 'option for duplicate entries'

        'miss' 'option for missing duplicate entries'

save_

 

Additional attributes needed in category TYPE would be _type.keys (accepting the same values as _type.contents where those values describe string data), _type.key_type_reference (containing the _definition.id of the referenced item), and _type.contents_type_reference (not demonstrated; analogous to _type.key_type_reference).

 

 

John

 



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.