- To: Group finalising DDLm and associated dictionaries <email@example.com>
- Subject: [ddlm-group] =?utf-8?q?=28no_subject=29?=
- From: James Hester <firstname.lastname@example.org>
- Date: Thu, 16 Jun 2016 09:39:29 +1000
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:from:date:message-id:subject:to;bh=pYAA67RNo33I4610hlPw4rMotEZ/WzCO/w7uxAU3Rwk=;b=av86i7oUSM4YWjsQNpedid4UHYPi0th4jIMvCQVYIoQmgs/UW1holUBqKwofE8RHJECkMT9N+VlBwxX+A4OFkIpkKMS0v/Clsf6HqxeMgVMGsxgeRGkNyQAxNqkrWlRNXabdUtFo2bWVL04qSj9DcwPm+AXp2ivTSk2FtRVAbDWUn3T+eoshCY2rP8foyl0e+d+ubmZj7EL8SULBUq2GM1hHhk4zDoCx1hs2WLdGhQdNnws1QKCm1xWpINpwJXLL3Z0NdtTQwKQUexZHVgO04LKgjPQ8Tbi8Q+HaeAh5Jbm6fPnKExtRkkwbWkDbodFRnKhxEX2gMCn5Voz6oBazjQ==
John's points have underlined that, to work effectively, the _audit.schema proposal would require that any CIF-reading software using a non-default value for _audit.schema needs to parse and check the dictionaries provided in _audit_conform in order to understand which categories have been provided with a new key, as this information could change with dictionary version. Dictionary parsing and analysis is a significant extra burden which I have argued is unreasonable in the case of the default _audit.schema value. I am not as concerned with non-default, non-mainstream situations, as long as the system can be made to work.
Here are what I see as the essential issues we are wrangling over with respect to the Set / Loop problem:
1. Whether we require a solution that prevents future data files from being misinterpreted by current software.
Inasmuch as proposal #2 is not such a solution, we seem to have settled on "no".
2. Whether we require a solution that allows software to insulate itself against future Set / Loop changes, and if so, how.
That this is a desirable characteristic seems uncontroversial, but the "how" part is not settled.
In particular, proposal #2 does not provide a complete solution to this issue. It provides for declaring what Set categories have been or may have been presented with multiple values, but it does not provide for defining the dimension(s) along which the values vary, and there could be more than one alternative for that. The existing audit_conform category does offer a complete solution (or would do if changed to a Loop to match its mmCIF and the DDL1 Core analogs), but it is not as precise as proposal #2’s _audit.schema would be.
3. Whether we want to provide for Sets of items that can take multiple values, or whether we must convert Set categories to Loops to enable their items to take multiple values.
This is to some extent a philosophical difference; it is not particularly relevant to actually writing or reading data files, though it does bear on the next issue. Having a category key is a defining characteristic of Loop categories, as evidenced by DDLm’s definitions of _definition.class, _category.key_id, and _category_key.name. Having at most one value per item is a defining characteristic of Set categories. I disfavor changing that, especially to support a use case expected to be uncommon, and I see no particular need to do so. I would rather convert Sets to Loops, either as-needed or proactively.
James has argued that keeping current Set categories as Sets but giving them category keys where needed would make the implicit assertion that that providing multiple values for the items in such categories is exceptional. I don’t disagree with that, but I think the same assertion is implicit in defining a default value for the keys of such categories, which presumably we would want to do whether we convert Sets to Loops or not. Moreover, making assumptions about what is or is not normal or expected is exactly how we got into this situation. If we are going to double down on that, then I think we need to first formulate a clearer strategy on when and how to make such assumptions.
4. Whether we need to change DDLm itself, or whether the needed changes can be restricted to dictionaries.
It’s not clear to me that we can resolve the issue without modifying DDLm, but I would prefer to avoid modifying it if that is possible.
5. Whether all category attributes need to be global
In particular, I raised the possibility that some category attributes, especially keys and therefore the nature of some of the relationships among categories, could be specified on a per-dictionary basis instead of globally. This approach promotes dictionaries over individual definitions as the vehicle for addressing the problem, and it’s not so far away from proposal #2 in that prop2 also involves multiple dictionaries in providing full definitions for each category. The main difference is that under prop2 there is (only) a single aggregate definition for each category, and that definition contains all possible keys, whereas per-dictionary category relationships allow for a subset appropriate to the data domain to be selected and used, simply by choice of dictionary.
_______________________________________________ ddlm-group mailing list email@example.com http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Prev by Date: Re: [ddlm-group] Second proposal to allow looping of'Set' categories
- Next by Date: Re: [ddlm-group] Dictionary conformance (was Re: Second proposal toallow looping of 'Set' categories)
- Prev by thread: Re: [ddlm-group] Further discussion of proposal #2
- Next by thread: Re: [ddlm-group] (no subject)