Re: [ddlm-group] Second proposal to allow looping of'Set' categories
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Second proposal to allow looping of'Set' categories
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Wed, 15 Jun 2016 16:20:26 +0000
- Accept-Language: en-US
- authentication-results: spf=none (sender IP is )smtp.mailfrom=John.Bollinger@STJUDE.ORG;
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=SJCRH.onmicrosoft.com; s=selector1-stjude-org;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;bh=5ogb8bQsKIJQ6rot7n4Ii3wSBwOvnM29lo/R3q+8hsQ=;b=rZ0i+gmc9ZG4p+2zAGEOWUHhJzerxGlkGnneLjbx7r7qMSXZ4rPjxJCim3cpoVSTfx1qzMSYOD9FamY9pV2ZoYxWfo+T55AE+nym3xpg1Xa9YbYK4AumavidTc+8S1aQSJXTfEof6O2IiRKyALggzH2VVPRkqSmCeLn1SlaVsMc=
- In-Reply-To: <CAM+dB2c-0vPf38bPPximBB4ssOspLKdPJ=q=+sLXPhpo5T_1mw@mail.gmail.com>
- References: <CAM+dB2fTUYtNQNaFMGQFnNyqnAgmU4koexAu-ZsiKm5L+S7qBg@mail.gmail.com><BY2PR0401MB09365AE6ADE7ECE63BF2B300E0500@BY2PR0401MB0936.namprd04.prod.outlook.com><BY2PR0401MB093608D8433F50A7B7BA1E4FE0500@BY2PR0401MB0936.namprd04.prod.outlook.com><CABcsX26ioPbqb_NCj5M2sPr3gd2es3YhYXr2Ub2Z9R5+ewUcYg@mail.gmail.com><CAM+dB2eGqm46GdDtTAt9snt1GbQqWeGUJzwzQsQMVH2ctekSMg@mail.gmail.com><CABcsX27Srairc6-PC9ZQuNw8DP6ceTKbH0g1L48hTRQm5NqWtQ@mail.gmail.com><CAM+dB2c-0vPf38bPPximBB4ssOspLKdPJ=q=+sLXPhpo5T_1mw@mail.gmail.com>
- spamdiagnosticmetadata: NSPM
- spamdiagnosticoutput: 1:99
Dear All, I sent a previous version of this to the group yesterday, but I never received a copy back from the list server, so it seems to have been eaten. My apologies
if anyone receives both. This version is revised with respect to yesterday’s, so if anyone does receive both then please disregard the other. ---- Here are what I see as the essential issues we are wrangling over with respect to the Set / Loop problem: 1. Whether we require a solution that prevents future data files from being misinterpreted by current software. Inasmuch as proposal #2 is not such a solution, we seem to have settled on "no". I mention it now in part to afford everyone a chance to object, and in part
to observe that other solutions that were rejected or would have been rejected on the basis of leaving data files open to misinterpretation should now be open for reconsideration. 2. Whether we require a solution that allows software to easily insulate itself against future Set / Loop changes, and if so, how. That this is a desirable characteristic seems uncontroversial, but the "how" part is not settled. In particular, proposal #2, as I originally understood it, does not provide a complete solution to this issue. It provides for declaring what Set categories
have been or may have been presented with multiple values, but I did not interpret it to provide for defining the dimension(s) along which the values vary, and there could be more than one alternative for that. James’s subsequent comments suggest that I have
misunderstood, so this bears further discussion. On the other hand, the existing audit_conform category offers several versions of a complete solution (or would do if changed to a Loop to match its mmCIF and
the DDL1 Core analogs). This should be unsurprising, as the problem is at minimum closely related to audit_conform’s purpose. One specific option would be to put the extra category keys in a separate dictionary (as P2 also proposes), and for data files to be expected to specify conformance
with such additional dictionaries when they in fact rely on them. The possibly-multiple values of _audit_conform.dict_name could then be used in a manner very similar to P2’s use of _audit.schema. Furthermore, this would provide for a reasonably agile approach
to validation of data files relying on the added keys. Alternatively, suppose we fully commit to semantic versioning (http://semver.org) of dictionaries. An application could then test the first segment of the value
of _audit_conform.dict_version to determine whether data files rely on / require a library version incompatible with the one assumed by the application. In this case, converting one or more Set categories to allow them to take multiple values would require
an increment to the affected library’s major version number. This is not as precise as _audit.schema would be, but it follows a pattern that I think is well understood by most programmers. I have also argued that P2 ultimately does not offer a reliable solution to this problem. Neither, for that matter, does any use I can think of for audit_conform.
The only reliable solution is for software to affirmatively check whether its inputs conform to its expectations. How much added weight should be attributed to solutions that offer additional, less reliable checks is a point on which it seems we are unlikely
to come to consensus. 3. Whether we want to provide for Sets of items that can take multiple values, or whether we must convert Set categories to Loops to enable their items to take
multiple values. This is to some extent a philosophical difference; it is not particularly relevant to actually writing or reading data files, though it does bear on the next
issue. Having a category key is a defining characteristic of Loop categories, as evidenced by DDLm’s definitions of _definition.class, _category.key_id, and _category_key.name. Having one value per item is a defining characteristic of Set categories. I
disfavor changing that, especially to support a use case expected to be uncommon, and I see no particular need to do so. I would rather convert Sets to Loops, either as-needed or proactively. James has argued that keeping current Set categories as Sets but giving them category keys where needed would make the implicit assertion that providing multiple
values for the items in such categories is exceptional. I don’t disagree with that, but I think the same assertion is implicit in defining a default value for the keys of such categories, which we would want to do whether we convert Sets to Loops or not. 4. Whether we need to change DDLm itself, or whether the needed changes can be restricted to dictionaries. It’s not clear to me that we can resolve the issue without modifying DDLm, but I would prefer a solution that only modifies data dictionaries. ---- Regards, John --
John C. Bollinger, Ph.D. Computing and X-Ray Scientist Department of Structural Biology St. Jude Children's Research Hospital (901) 595-3166 [office] Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer |
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Second proposal to allow looping of 'Set' categories (James Hester)
- Re: [ddlm-group] Second proposal to allow looping of 'Set'categories (Bollinger, John C)
- Re: [ddlm-group] Second proposal to allow looping of'Set' categories (Bollinger, John C)
- Re: [ddlm-group] Second proposal to allow looping of 'Set'categories (Herbert J. Bernstein)
- Re: [ddlm-group] Second proposal to allow looping of 'Set'categories (James Hester)
- Re: [ddlm-group] Second proposal to allow looping of 'Set'categories (Herbert J. Bernstein)
- Re: [ddlm-group] Second proposal to allow looping of 'Set'categories (James Hester)
- Prev by Date: Re: [ddlm-group] Dictionary conformance (was Re: Second proposal toallow looping of 'Set' categories)
- Next by Date: [ddlm-group] =?utf-8?q?=28no_subject=29?=
- Prev by thread: Re: [ddlm-group] Second proposal to allow looping of 'Set'categories
- Next by thread: Re: [ddlm-group] Second proposal to allow looping of 'Set'categories
- Index(es):