Re: [ddlm-group] Multi block principles
- To: Group finalising DDLm and associated dictionaries <firstname.lastname@example.org>
- Subject: Re: [ddlm-group] Multi block principles
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Tue, 16 Nov 2021 16:40:52 +0000
- Accept-Language: en-US
- ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=passsmtp.mailfrom=stjude.org; dmarc=pass action=none header.from=stjude.org;dkim=pass header.d=stjude.org; arc=none
- ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;bh=OJ8HzW9QUQpiTcya6TI1eL4h6CImLhBxi8uzOiNsZXE=;b=mhSI8WqnKz9QUJ2v1om9iQ/dIL/q1BEVTIlpAHqbpt++jTY2+RhcVYCj6yxkQQLskoMDV+qaQgqFmV408PtK7Ewqx1hoosebtwIM+b2PNX5jzxGVl99lNAI5pdzZTCWeg0hi4TX8RjhMKlh4Z39VvnDhKZPeo9COBXh4mT/WS7wo+IeohrE1rn9YMdw1EYYcQo24CO5nlQu5sso8w/mo6GI6JZSxYmTFN6ts7vvJv8sZmsHIB1jlGg0lC4FL6WvqMl+NReep2yG17c33hvIsqazCvclQ5FthnJ/Bj8SZ1I5jrejCsD0M7usJ6IccvEwmqf6Nu/nh0CEs10MjUMzZsw==
- ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;b=EIbBp8VEbaftQe0Xs+TBq+uUv3XmAVDEA+2e4lKOIkHfJyBnW4V9bRVu+XFhlHwM1JyNbqKtXVExXezSeAJZ3GPF2Lw1bMlQLHxrV9COsyI5tNDkvceXvcl1UKYDgWFPdlP0Fnt77XVNYhzGLhHmb0lU2A1Z9BisYBiZz/3OdugZr8hPRY+fScOq8iYiQ3JjuTnovea9e1PUYnyrCKwGSHAC8JOUFdf8BXSSam6eIEpOEc4EctUZlpoicSebNu9/x5stcH8FGu0Cr4ee5GAu04n56iuxPw+CrhytmPawdCG4glD/IaLqiK9saAyftXI6lO9yU+MPQgIagEKHZZocDg==
- In-Reply-To: <CAM+dB2fajH1c1vhrCJU9v-QQw0kt4Y2udDEx4HBK9QzDq=LD3w@mail.gmail.com>
- IronPort-SDR: tg1ksdqw/Uf4wB+k2nMfzB+S5JBlymSz39R0eVkNfXlzcx0mfoScV9D9zqgG379KQ8kAmB3ZhOpKkqP2Gy21ao/zygVqG+0XZWouHXl/yaQxYZ83aSc6tR7gyrDff2fLjlMJ8Zm66zlEneIc5t+TPAe1YQIuWeX8LM6u8st2n06NO2ES22kQiEFclYKD++hIuWlD5he+0Qy59ajI4dwRObsOPUmubiuQHU04LFhK9qjGjCWcGkhKe6xCxc6cofyCvBByM6eyqeLvV4ygxrKcONY+N3Iy7W+PaEvXJn9PHpA=
- References: <CAM+dB2fajH1c1vhrCJU9v-QQw0kt4Y2udDEx4HBK9QzDq=LD3w@mail.gmail.com>
Dear DDLm group,
I have no fundamental objection to setting out specific principles guiding the use of multiple data blocks to represent coherent collections of data. I do, however, have some comments about the specifics of the draft presented. Most of these are technical, but a couple are editorial:
1. “Choice” is widely considered desirable. How about choosing a less positively connoted word to use in the introduction, such as “variability”?
2. The document refers to _audit.schema a lot, and in particular, at places it recommends defining new values that it may take. It would be helpful to clarify that _audit.schema is defined in the Core dictionary, and especially to give or point to some guidance on exactly how one would define new values for it. I suggest also clarifying that although the values for this item do indicate which categories are Set categories, they do not provide direct support for enumerating those categories programmatically.
3. It might be useful to clarify that writing data blocks using the _audit.schema 'Base' implies that all the categories on which each category depends -- both Set categories and related Loop categories -- are presented in the same data block. In relational terms, one might say that each data block provides a distinct, implicit key value that associates the categories presented within. I would like to avoid giving the impression that multiple data blocks, each specifying _audit.schema 'Base' and valid against (say) the Core dictionary, and without any duplicate items or key conflicts among them, can or should be interpreted the same as a single data block containing the union of the multiple blocks’ contents.
4. Speaking of data blocks defining an implicit key, I think the draft overemphasizes the relationships between Loop categories and Set categories. When considering _audit.schema values other than 'Base', one has to recognize and account for the fact that there are relationships between pairs of Set categories, too. These tend to be weak in the Core dictionary because it is fairly well factored, but for an example, take Set categories _exptl_crystal and _chemical_formula. There is a non-trivial dependency there via (at least) _exptl_crystal.density_diffrn. Also along these lines, it would be appropriate to say not that Set categories *may be* equipped with a category key, but that they *are* equipped with one. If that can’t be considered technically correct, then we should make it so. We could introduce the possibility of a zero-column key for this, which would offer some mathematical consistency both with there being only one possible category key value for Set categories, and with the effects of expanding that key with additional columns.
5. “categories whose values do not depend on any of the `Set` category values” does not make sense to me. I understand that the objective is to specify that unnecessary data duplication should be avoided, but surely it is a matter of the selected _audit.schema which categories need to be need to be presented in which data blocks. Right? If this is about schema design, then maybe it would be better to express the principle in terms of the non-duplication objective.
6. It is unclear what “allow[ing] the context to determine aggregation” means. It could be taken to imply that there is some well-defined contextual mechanism available. I think less would be more here: “The CIF standard does not stipulate how to identify data blocks belonging to a single data set. Optionally, dictionaries may define data names that help in this task.[END]” Or maybe only the first of those sentences.
7. I don’t understand in any significant detail what the description of summary blocks is trying to say. Perhaps it would be more meaningful to someone experienced in using powder or modulated structure CIF, but the name and description convey only a vague impression to me.
8. I disfavor relying on parent categories to identify their child categories. That approach already constrains how DDL2 dictionaries may be supplemented by extension dictionaries in the more constrained context of Set categories not needing to participate in child-declaration, especially if one wants to use multiple extension dictionaries together. I just don’t see it being sustainable in an environment where we must consider substantially every category to be a potential Loop category. A plan that localizes the required definition changes as much as possible is to be preferred. As an alternative, it may be useful to come up with a standard way to encode the additional dependency information into DDLm dictionaries *now*. That could at least provide for automating the generation of the needed additional definitions an extension dictionaries.
Caution: External Sender. Do not open unless you know the content is safe.
Dear DDLm group,
Please see below a draft version of principles guiding the use of multiple data blocks for encapsulating CIF data. Something similar to this has long been in use for powder data and modulated structure data, and this is essentially an attempt to formalise an approach in terms of DDLm. Getting this right has implications both for those two dictionaries, and for how we combine imgCIF data names with data names from our other dictionaries. As far as I can tell what I am proposing conforms pretty closely to what we have already agreed. Please comment either here or in the repository (github.com/COMCIFS/comcifs.github.io).
This document can also be viewed at https://github.com/COMCIFS/comcifs.github.io/draft/multi-block-summary.md
# Principles for reading and writing CIF information using multiple data blocks
T +61 (02) 9717 9907
Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________ ddlm-group mailing list email@example.com http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- [ddlm-group] Multi block principles (James H)