Re: [ddlm-group] _enumerated_set.table_id
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] _enumerated_set.table_id
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Tue, 21 Apr 2015 16:38:59 +0000
- Accept-Language: en-US
- authentication-results: iucr.org; dkim=none (message not signed) header.d=none;
- In-Reply-To: <CAM+dB2covps-EK0K-kpz9j_E1nJUmVvdNraHTSGJXKst=mo=SQ@mail.gmail.com>
- References: <CAM+dB2ecOvjBT8OnV2tLy6rpJF2s=j4mLwJ09+x9AePUiByyXQ@mail.gmail.com><BY2PR0401MB0936963785E7A96CE3BBDE7AE0E00@BY2PR0401MB0936.namprd04.prod.outlook.com><CAM+dB2covps-EK0K-kpz9j_E1nJUmVvdNraHTSGJXKst=mo=SQ@mail.gmail.com>
Hi James, Yes, we agree that
_enumeration_set.table_id can be dropped. I am uncertain whether we agree about whether it should be replaced with something else. I am prepared to accept these limitations on the data types that can be defined by a DDLm dictionary (including DDLm itself), if indeed DDLm itself and the
other existing DDLm dictionaries can be expressed adequately under such constraints: - The allowed types of values within a list cannot depend on their position in the list - The allowed types of values within a table cannot depend on their associated keys These assign primacy to categories / loops for defining complex, heterogeneous data, so that it is unnecessary (I think) to be able to define data types that
use lists and / or tables analogously to C structs. I am inclined to think that one of the greater weaknesses of the 2012 version of the DDLm dictionary is its provisions for defining complex data types. They
are somewhat inconsistent, and the provided definition text is unclear about exactly how one would go about defining complex data. Moreover, if _type.dimension is intended to be the primary vehicle for defining complex internal structure then it must bear
the weight of an entire schema language. That seems to be exactly what it’s trying to do, but the details of that language are by no means adequately documented, and it seems an odd approach given that it’s hosted inside another language that itself can serve
as a schema language. This is what I think we should do: 1. Remove _enumeration_set.table_id. It doesn’t work well for its intended purpose. 2. Redefine _type.dimension so that it is used only to specify the dimension(s) of values of items having _type.container in {
'List',
'Array',
'Matrix'
}. Relieve it of any responsibility for defining element types. Possibly remove the ability to define ragged multi-dimensional arrays (which conflict with the proposed limitation that
allowed types of values within a list cannot depend on their position in the list). 3. Clarify that when _type.container has value
'Table',
_type.contents defines the characteristics of the *values* in the table. 4. Add a replacement mechanism to define constraints on table keys. It might be sufficient, and consistent with the apparent intent of the current dictionary,
to establish a parallel to the _enumeration_set category for constraining key values, maybe _key_enumeration_set. It would be a smaller change at the dictionary level, however, to add a mechanism by which constraints on key type could be defined by reference
to the type of another item (see also next). 5. Add a mechanism to allow items'
content type to be defined by reference to another item. This could be signaled by a new code for _type.contents, with a new attribute defining which other item’s type is to be used. I don’t think that the existing contents code
'Inherited'
can serve this purpose, but perhaps I’m mistaken. Allowing types of keys / values to be defined by reference to the types of other items raises the possibility that dictionaries will occasionally want to define
items solely for the purpose of defining their content type for reference by other definitions. I don’t think this is harmful, but it might be best supported by a new value for _type.purpose, as demonstrated below. If all those changes were implemented then the definition for DDLm_import.get might be revised like so: _type.purpose 'Import' _type.container 'List' _type.contents 'Text' _type.keys 'ByReference' _type.key_type_reference 'import.get_key_type' That would require addition of a new attribute to category IMPORT, its definition containing the following (among other necessary attributes not shown): save_import.get_key_type # ... _type.purpose 'Internal' # New value _type.container 'Single' _type.contents 'Code' loop_
_enumeration_set.state _enumeration_set.detail
'file' 'filename/URI of source dictionary'
'save' 'save framecode of source definition'
'mode' 'mode for including save frames'
'dupl' 'option for duplicate entries' 'miss' 'option for missing duplicate entries' save_ Additional attributes needed in category TYPE would be _type.keys (accepting the same values as _type.contents where those values describe string data), _type.key_type_reference
(containing the _definition.id of the referenced item), and _type.contents_type_reference (not demonstrated; analogous to _type.key_type_reference). John From: ddlm-group [mailto:ddlm-group-bounces@iucr.org]
On Behalf Of James Hester Hi John: On Tue, Apr 21, 2015 at 1:19 AM, Bollinger, John C <John.Bollinger@stjude.org> wrote: Hi James, I agree that _enumeration_set.table_id seems a misfit. Moreover, I observe that it is not documented
in the 2008 DDLm paper. That paper is aging a bit, but I take the attribute’s omission as an additional signal that it does not serve a role of any major import. The canonical reference is now J. Chem. Inf. Model., 2012 52(8) pp 1907-1916.
I think we agree then that it is superfluous and can be dropped (or simply not picked up by COMCIFS).
Judging from the demonstration DDLm dictionaries, CIF2 tables are quite rare, and strictly speaking superfluous, as they can be directly transformed into a CIF loop structure with a small loss in concision. They are used once in cif_core.dic
to carry the individual atom form factor contributions to each hkl reflection so that a separate loop keyed on h,k,l and atom type doesn't have to be defined for such intermediate values. I think that anything remotely complicated (e.g. optional keys) would
be better described using looped datanames. This policy would allow us to restrict ourselves to simple cases. Therefore, we could settle for Doug's solution (but see below), with the meaning that the keys given in the _type.contents entry must be present
for the item to be valid. I would however be unruffled if DDLm *didn't* have a mechanism to constrain the form that tables may take on a per-item basis, for the above reasons and those in my next paragraph below.
I have lately been contemplating the level of datavalue complexity we should actually cover in DDLm. The initial assumption in writing DDLm was that the _type.contents and _type.dimension attributes should
be able to describe arbitrarily complex datastructures. I now think that this is unnecessary, because any inhomogeneous datastructure can be split into its component parts, each of which I would assert have a well-defined individual meaning. The dictionary
will necessarily need to describe those individual parts. The *only* use-case I can find (counterexamples welcome) for inhomogeneous datastructures in the demonstration ddlm dictionaries is to conveniently create single-dataitem keys for joined categories,
but even this use case can be replaced by e.g. a simple string concatenation. Any use of the composite structure can be replaced in dREL by access to the individual components - which must be happening already anyway, because the values are inhomogeneous
and so must be treated differently. I am therefore planning to suggest that COMCIFS adopt a dictionary authoring policy which explicitly avoids using inhomogeneous datastructures (i.e. Arrays and Tables with values of a single type are OK, mixtures and irregular nesting are
not).
Yes indeed, a _type.contents value which is a table with arbitrary keys as suggested by Doug can't be part of a (finite) _type.contents enumerated list of datavalues, and so the current approach to _type.contents
wouldn't work. Frankly, however, I think that such tables are not something we need to particularly support (see above), so I would be happy for us to use 'Table' as the _type.contents of _import.get and leave any detailed validation either to software that
wishes to execute the dREL method or define a _type.contents_regex and do regular expression matching. all the best, James.
T +61 (02) 9717 9907 |
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] _enumerated_set.table_id (James Hester)
- Re: [ddlm-group] _enumerated_set.table_id (Bollinger, John C)
- References:
- [ddlm-group] _enumerated_set.table_id (James Hester)
- Re: [ddlm-group] _enumerated_set.table_id (Bollinger, John C)
- Re: [ddlm-group] _enumerated_set.table_id (James Hester)
- Prev by Date: Re: [ddlm-group] _enumerated_set.table_id
- Next by Date: Re: [ddlm-group] _enumerated_set.table_id
- Prev by thread: Re: [ddlm-group] _enumerated_set.table_id
- Next by thread: Re: [ddlm-group] _enumerated_set.table_id
- Index(es):