Re: [ddlm-group] Proposal to enhance the behaviour of a DDLm "Set"category: please consider
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Proposal to enhance the behaviour of a DDLm "Set"category: please consider
- From: James Hester <jamesrhester@gmail.com>
- Date: Sat, 28 May 2016 12:47:43 +1000
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:in-reply-to:references:date:message-id:subject:from:to;bh=thk8KL0bINjRvBkGJcKW7WBNTY+Ec/v4nVvWgsoWaSc=;b=h2Mh0kldNY8wbw8EqyeNpZq7uKj6l2w6/kSI0NMop49jL17wtVgal3F4tSzgng5hNTjYQTM9xmlTN56YSiyAtHwlDl1tmS/UnzFMAy+2xn+w/JwMKQ98LA/2OULEZeCOCseNYV2PLauKiLBMVThb+fZaXtsDuToOb1AIi/QYcWH2i1Oe45DEg6vsvzmtGCpJ8CySVwRKKZm+b4dls+rnXG0y5HQViqsaj/eIEryDK/4WjsS+uyJeWxwB36qE2hXAaEI/n6X6yPkQ8QMv9/SPNxqM4GJGNrO2jOCqpaA5CyWFbU+jKMsxJgZntf22hvzrjNw3CU1l/Nm2xDIl5wL2VQ==
- In-Reply-To: <BY2PR0401MB09365F9E49DBB602C55D875DE0400@BY2PR0401MB0936.namprd04.prod.outlook.com>
- References: <CAM+dB2cQ3c3HSOBiyH=F4Bm55ceZmL4g4KrTjHCcTHTsmYn3cw@mail.gmail.com><BY2PR0401MB09365F9E49DBB602C55D875DE0400@BY2PR0401MB0936.namprd04.prod.outlook.com>
Dear James and DDLm group,
I’m not sure that I have fully comprehended the proposal to alter the meaning of the 'Set' definition class, so let me try to summarize in my own words:
(*) Presently, DDLm categories defined as 'Sets' contain items that must not be looped, or at least must not appear in multi-packet loops. Items in such categories take at most one value per data block or save frame.
(*) The choice between the 'Set' and 'Loop' category classes is made by dictionary developers based on the envisioned use of the category in data files. For example, the SYMMETRY category in the DDLm version of the core dictionary is defined to be a ‘Set’ because the dictionary is structured around the idea that each data block or save frame in a data file describes at most one structure, and a structure has exactly one set of symmetry information.
(*) Substantially the same item may be relevant to different kinds of overall data sets, and the appropriate choice between 'Set' and 'Loop' (as they are presently defined) may vary between kinds of data sets. This mismatch prevents some desired re-uses of definitions across dictionaries.
(*) To enable the desired kinds of re-use, it is proposed that the 'Set' category class be redefined to require uniqueness only with respect to a category key. New constraints are placed on the other categories that can appear in the same block or frame, so as to ensure that each datum can be associated with at most one value for any item in any 'Set' category.
Based on that understanding of the proposal:
1. I am concerned about the proposed new constraint on other categories that may appear in the same container with a 'Set' category. I think I understand the purpose, but I also think this will be easier to get wrong and more complicated to validate. Moreover, it introduces an unresolved conflict with categories that really ought to be 'Sets' as they currently are defined, as the proposal itself acknowledges with respect to the AUDIT category.
2. The proposed change almost completely erases the distinction between 'Set' and 'Loop' categories. I am not convinced that retaining the two as separate classes with such a fine distinction between them is the best course of action.
So the distinction is not so much 'Set' and 'Loop', but 'overall information' and 'per datum information'. What the proposal describes is under what circumstances you can re-use the 'overall information' dataname as a 'per datum' dataname.
3. I am not fond of how conditional the proposed new definition text is.
4. It seems likely that all existing methods of current 'Set' items would be broken by the proposed change.
My present thinking is that changing specific 'Set' categories into bona fide 'Loop' categories would be better than making all 'Sets' loop-like without actually making them 'Loops'. This could be reconciled with existing data files by introducing a mechanism for defaulting category key values or by allowing category keys to be omitted from category data when only one set of date from that category is presented. I think an approach along these lines could solve the problem at hand while addressing my concerns 1-3. I am uncertain whether a solution is possible that fully addresses my concern #4, but if we convert 'Sets' into 'Loops' only selectively, then at least we narrow the scope of the problems with methods, and perhaps also allow an incremental approach to be taken for updating dictionaries.
Regarding (i), while a system of defaulting key values and omitting them if only a single item is defined is a consistent description of currently-existing datafiles, the real issue is the opposite: what will happen when current software is faced with a file that *does* have multiple values in a 'Set' category? Are we sure that it will not silently e.g. calculate too many atomic sites because we have listed symmetry operators from multiple spacegroups?
Regarding (ii), we gain nothing by use of the PDB 'entry.id' trick as we are once again left with single-valued categories that can't have more than one datum.
Regards,
John
--
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
(901) 595-3166 [office]
--
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- References:
- Prev by Date: Re: [ddlm-group] Proposal to enhance the behaviour of a DDLm"Set" category: please consider
- Next by Date: Re: [ddlm-group] Proposal to enhance the behaviour of a DDLm "Set"category: please consider
- Prev by thread: Re: [ddlm-group] Proposal to enhance the behaviour of a DDLm"Set" category: please consider
- Next by thread: Re: [ddlm-group] Proposal to enhance the behaviour of a DDLm "Set"category: please consider
- Index(es):