[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Finalizing DDLm
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Finalizing DDLm
- From: John Westbrook <jwest@rcsb.rutgers.edu>
- Date: Wed, 31 Mar 2010 20:41:44 -0400
- In-Reply-To: <alpine.BSF.2.00.1003311021180.14239@epsilon.pair.com>
- References: <C7D8E0F6.13106%nick@csse.uwa.edu.au><alpine.BSF.2.00.1003311021180.14239@epsilon.pair.com>
Hi all, Coming in late on this in support of Herb's position. I have never understood the necessity of marking a category as a 'list' type in the dictionary in the early CIF DDL, and in DDLm I find this even more confusing. Given that DDLm supports a category key which provides a well defined basis for each category, this alone would seem to provide the appropriate expression of cardinality. The choice of exporting a category with a single row as a collection of keyword-value pairs or using a table format via a loop_ seems like a presentation style matter rather than dictionary issue. As Herb has observed, the vast majority of DDL2 files opt for key-value output for any category with a single row. I do not see what additional semantics are conveyed by regulating the manner data presentation in these cases. John On 3/31/10 10:28 AM, Herbert J. Bernstein wrote: > Dear Colleagues, > > I just checked the RCSB mmCIF of 4INS, and it seems to be following the > consistent practice of doing all one-row categories as individual tags > and values, e.g.: > > _struct_sheet.id B > _struct_sheet.type ? > _struct_sheet.number_strands 2 > _struct_sheet.details ? > > As part of the CIF2 transition, are we going to tell people that the > very large number of CIFS written this way all need to be rewritten? > Why? There is no ambiguity here. There is no conflict here. It is > algorithmically trivial to handle this as a one-row loop. > > This reminds me very much of why Pascal died as a CS language -- being > too fussy about the wrong things. > > Regards, > Herbert > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Wed, 31 Mar 2010, Nick Spadaccini wrote: > >> >> >> >> On 31/03/10 10:51 AM, "Herbert J. Bernstein" >> <yaya@bernstein-plus-sons.com> >> wrote: >> >>> I really don't understand why >>> >>> _atom_site_label Cu >>> _atom_site.fract_x 0.0 >>> _atom_site.fract_y 0.0 >>> _atom_site.fract_z 0.0 >>> >>> is a problem. It looks very clear and is easily coerced to >>> >>> loop_ >>> _atom_site_label >>> _atom_site.fract_x >>> _atom_site.fract_y >>> _atom_site.fract_z >>> Cu 0.0 0.0 0.0 >>> >>> when a dictionary is available to tell you that this is supposed to be a >>> looped list. >> >> While I am a big fan of using dictionaries I am repeatedly told there are >> many users who don't and won't. Those people will not know it is a List >> category in the former example. >> >> David's other example below that you refer to is just de-normalising the >> data. Considering DB courses all over the world spend so much time on >> normalisation begs the question why it is attractive to undo all that. >> But >> if people want to repeat data unnecessary so be it. >> >> But I think what David wants is a consequence of the category-subcategory >> semantics built in to DDLm. In the specification category-subcategory >> loops >> can appear separately or as an outer/inner join. The former makes mm >> people >> happy, the latter makes small molecule people happy. >> >> Because we have focused on List categories there is a key on which to >> join. >> Though Syd and I didn't consider the case, a semantically consistent view >> for a Set subcategory of a List parent category, would be a Cartesian >> Product. This would repeat a Set data for every row of the List. In some >> sense this is already catered for within the semantics of >> category-subcategory relationships. >> >> However the inverse case, unrolling List data into Set data is not >> part of >> DDLm. But as I have repeatedly stated, the IUCr is free to extend the >> specification of DDLm to suit its purposes for local implementation. >> >> >>> >>> And, while >>> >>> _atom_site_label Cu >>> _atom_site.fract_x 0.0 >>> >>> loop_ >>> _atom_site.fract_y 0.0 >>> _atom_site.fract_z 0.0 >>> >>> is a little strange it is just as clear. The more interesting case is >>> related to what David suggested, of mixing a single tag value pair >>> with a >>> loop with more than one row. That seems very useful and echoes some of >>> what was already done in the mmCIF dictionary. >>> >>> If a sensible coercion is clear from the dictionary, why not just do it? >>> >>> ===================================================== >>> Herbert J. Bernstein, Professor of Computer Science >>> Dowling College, Kramer Science Center, KSC 121 >>> Idle Hour Blvd, Oakdale, NY, 11769 >>> >>> +1-631-244-3035 >>> yaya@dowling.edu >>> ===================================================== >>> >>> On Wed, 31 Mar 2010, Nick Spadaccini wrote: >>> >>>> >>>> >>>> >>>> On 31/03/10 12:23 AM, "Herbert J. Bernstein" >>>> <yaya@bernstein-plus-sons.com> >>>> wrote: >>>> >>>>> 1. Existing DDL2 style dictionaries have many looped categories with >>>>> subcategories. >>>> >>>> As do the dictionaries written in the DDLm. >>>> >>>>> 2. I do not understand why there is any special need to forbid the >>>>> presentation of a single row list category as separate tags and >>>>> values, nor why there is any special need to forbid the presentation >>>>> of an unlooped catagory as a single row loop. Any necessary coercion >>>>> is easily done in either case, and deserves at most a warning, if even >>>>> that. >>>> >>>> The reasoning is that you are enforcing its "type" specification. It >>>> is a >>>> List category object. ALL List category objects MUST be syntactically >>>> presented by the loop_ keyword, followed by a sequence of tags, >>>> followed by >>>> a list of values whose type then matches the tag type. Anybody >>>> reading the >>>> data in the absence of a dictionary will immediately know it is a List >>>> object. >>>> >>>> I must say Syd and my reasoning is pretty clear as to why we enforce >>>> it the >>>> way we do. However I can't see the reasoning behind your and David's >>>> desire >>>> to allow for, >>>> >>>> _atom_site_label Cu >>>> _atom_site.fract_x 0.0 >>>> >>>> loop_ >>>> _atom_site.fract_y 0.0 >>>> >>>> _atom_site.fract_z 0.0 >>>> >>>> The above is a logical consequence of what is being suggested. You >>>> may not >>>> intend it, but "when you'se open the can, you'se eat the worms". >>>> >>>> Again (and again and again) I repeat IF the IUCr wishes this to be an >>>> extension in its implementation of DDLm that is for it to decide. >>>> >>>> >>>>> >>>>> >>>>> ===================================================== >>>>> Herbert J. Bernstein, Professor of Computer Science >>>>> Dowling College, Kramer Science Center, KSC 121 >>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>> >>>>> +1-631-244-3035 >>>>> yaya@dowling.edu >>>>> ===================================================== >>>>> >>>>> On Tue, 30 Mar 2010, David Brown wrote: >>>>> >>>>>> James seems to have summarized matters pretty well. The >>>>>> implication is >>>>>> that >>>>>> a list category must be the end of the line - it cannot have a >>>>>> subcategory. >>>>>> My real questions was whether a list category must explicitly >>>>>> included as a >>>>>> loop, or whether the loop structure is unnecessary if it only >>>>>> contained a >>>>>> single row. It is easy enough to be safe by always inculding the >>>>>> loop, and >>>>>> I will probably arrange to do this in the dictionaries. There are >>>>>> likely >>>>>> to >>>>>> be several places with single fow loops appear, e.g., in examples or >>>>>> aliases. >>>>>> >>>>>> David >>>>>> >>>>>> >>>>>> >>>>>> James Hester wrote: >>>>>> Thanks Nick for clarifying this. We then return to David's >>>>>> question. If we assume that a 'Set' category cannot be a child >>>>>> of a 'List' category (I hope this is written down somewhere if >>>>>> it is the case) then my originally proposed solution would be >>>>>> impossible. Therefore, what David should do is to put the >>>>>> invariant items into a *parent* 'Set' category and state that >>>>>> the child 'List' category. That would solve the immediate issue >>>>>> of separating out looped and unlooped datanames. If some >>>>>> convenience is desired for dREL processing, the child 'List' >>>>>> category could be made joinable to that parent category, thereby >>>>>> making both invariant and looped items available in shorthand >>>>>> form by looping over the parent category. Of course, even if >>>>>> the child 'List' category is not explicitly joined to the parent >>>>>> 'Set' category, the parent category can be explicitly referenced >>>>>> in any dREL method using the full dataname. >>>>>> >>>>>> Nick may wish to confirm that I have correctly understood the >>>>>> proposed behaviour of DDLm. >>>>>> >>>>>> James. >>>>>> >>>>>> On Thu, Mar 25, 2010 at 3:18 PM, Nick Spadaccini >>>>>> <nick@csse.uwa.edu.au> wrote: >>>>>> I have not had any time to respond to David?s >>>>>> original email or the >>>>>> subsequent discussions. However I have discussed it >>>>>> with Syd. >>>>>> >>>>>> DDLm defines loopable categories strictly and >>>>>> sub-categories of those have >>>>>> strictly enforced outer joins (given we have ONLY >>>>>> considered sub-categories >>>>>> that are List). This is how we overcome the split >>>>>> versus non-split versions >>>>>> of atom_site and atom_site_aniso loops. List is >>>>>> strictly looped, Set is >>>>>> strictly non-looped - contrary to your reading, and >>>>>> possibly Syd?s original >>>>>> text. He has since re-read what he wrote and >>>>>> clarified the ambiguity. I will >>>>>> send you that re-write shortly. >>>>>> >>>>>> We had not considered a SET category being a >>>>>> sub-category of a List >>>>>> category, but if it is allowed then it would not be >>>>>> an outer-join as you >>>>>> suggested in your previous contribution but a >>>>>> relational Cartesian product >>>>>> (which is very different). >>>>>> >>>>>> The DDLm specification has that List category data >>>>>> MUST appear in a loop >>>>>> (irrespective of how many rows there are), and SET >>>>>> categories are strictly >>>>>> singular (non-looped data). The formal specification >>>>>> will formally remain >>>>>> that way. >>>>>> >>>>>> HOWEVER the IUCr is free to ?extend? the >>>>>> specification of the DDLm for its >>>>>> own internal and private use, so long as you >>>>>> appreciate that the FORMAL >>>>>> published specification of what Syd and I have >>>>>> created can?t include it. >>>>>> >>>>>> You might explain as to why you feel you have >>>>>> trouble with >>>>>> >>>>>> loop_ >>>>>> _atom_site.label >>>>>> _atom_site.frac_x >>>>>> _atom_site.frac_y >>>>>> _atom_site.frac_z >>>>>> Cu 0 0 0 >>>>>> >>>>>> And yet >>>>>> >>>>>> _atom_site.label Cu >>>>>> _atom_site.frac_x 0. >>>>>> _atom_site.frac_y 0. >>>>>> _atom_site.frac_z 0. >>>>>> >>>>>> Is so much more obvious? Given that people >>>>>> understand what the loop is, I >>>>>> can't see what they would gain from the unrolled >>>>>> version (apart from >>>>>> confusion). The real danger is those less >>>>>> experienced who DON'T read a >>>>>> dictionary and read the latter form may be >>>>>> encouraged to replicate it when >>>>>> there is more that 1 atom (thus corrupting the CIF >>>>>> structure). >>>>>> >>>>>> However these are just personal observations, and if >>>>>> the IUCr wants to >>>>>> qualify the use of DDLm with its own tweaks there is >>>>>> nothing stopping them >>>>>> from doing so. >>>>>> >>>>>> >>>>>> >>>>>>>>>> >>>>>>>>>> At 10:35 AM -0500 3/11/10, David Brown wrote: >>>>>>>>>>> >>>>>>>>>>> Dear Colleagues, >>>>>>>>>>> >>>>>>>>>>> I assume that we are essentially finished in >>>>>> resolving syntax >>>>>>>>>>> problems, but in that discussion some items >>>>>> were identified as being >>>>>>>>>>> related to DDLm rather than syntax, so before >>>>>> we settle into serious >>>>>>>>>>> dictionary writing we need to understand the >>>>>> DDLm rules. >>>>>>>>>>> >>>>>>>>>>> One item that I believe was raised under this >>>>>> heading was whether, >>>>>>>>>>> if a loop contained a single set of items, it >>>>>> was necessary to >>>>>>>>>>> formally include this in a loop structure. If >>>>>> this is deemed to be >>>>>>>>>>> necessary, then there has to be some way of >>>>>> identifying the items >>>>>>>>>>> that must appear in a loop. The presence in >>>>>> the dictionary of a >>>>>>>>>>> _category_key.* item would seem to flag this, >>>>>> but it is applied at >>>>>>>>>>> the level of the category rather than at the >>>>>> level of an individual >>>>>>>>>>> item. If the requirement that the loop >>>>>> structure must always be >>>>>>>>>>> used, then all the items in the category must >>>>>> be loopable, i.e., the >>>>>>>>>>> category cannot include items that would not >>>>>> normally be included in >>>>>>>>>>> the loop, items for example that apply equally >>>>>> to all the listed >>>>>>>>>>> items such as a scale factor that is the same >>>>>> for all the structure >>>>>>>>>>> factors in a loop. This seems to be workable, >>>>>> but I am not sure how >>>>>>>>>>> the legacy CIFs would fit in, since categories >>>>>> may include some >>>>>>>>>>> listable item and some non-listable items, and >>>>>> I am sure the >>>>>>>>>>> listable items do not always appear in a loop >>>>>> if there is only one >>>>>>>>>>> set of such items reported in the CIF. >>>>>>>>>>> >>>>>>>>>>> Is this something that can be clarified fairly >>>>>> easily? It has an >>>>>>>>>>> important bearing on how the CIF dictionaries >>>>>> are written. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>> Attachment converted: Macintosh HD:idbrown >>>>>> 55.vcf (TEXT/ttxt) (0046DFC7) >>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>> ddlm-group mailing list >>>>>>>>>>> ddlm-group@iucr.org >>>>>>>>>>> >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>> ===================================================== >>>>>>>>>> Herbert J. Bernstein, Professor of Computer >>>>>> Science >>>>>>>>>> Dowling College, Kramer Science Center, KSC >>>>>> 121 >>>>>>>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>>>>>>> >>>>>>>>>> +1-631-244-3035 >>>>>>>>>> yaya@dowling.edu >>>>>>>>>> >>>>>> ===================================================== >>>>>>>>>> _______________________________________________ >>>>>>>>>> ddlm-group mailing list >>>>>>>>>> ddlm-group@iucr.org >>>>>>>>>> >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> T +61 (02) 9717 9907 >>>>>>>>> F +61 (02) 9717 3145 >>>>>>>>> M +61 (04) 0249 4148 >>>>>>>>> _______________________________________________ >>>>>>>>> ddlm-group mailing list >>>>>>>>> ddlm-group@iucr.org >>>>>>>>> >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ddlm-group mailing list >>>>>>>> ddlm-group@iucr.org >>>>>>>> >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> cheers >>>>>> >>>>>> Nick >>>>>> >>>>>> -------------------------------- >>>>>> Associate Professor N. Spadaccini, PhD >>>>>> School of Computer Science & Software Engineering >>>>>> >>>>>> The University of Western Australia t: +61 (0)8 6488 3452 >>>>>> 35 Stirling Highway f: +61 (0)8 6488 1089 >>>>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: >>>>>> www.csse.uwa.edu.au/~nick >>>>>> MBDP M002 >>>>>> >>>>>> CRICOS Provider Code: 00126G >>>>>> >>>>>> e: Nick.Spadaccini@uwa.edu.au >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> ddlm-group mailing list >>>>>> ddlm-group@iucr.org >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> T +61 (02) 9717 9907 >>>>>> F +61 (02) 9717 3145 >>>>>> M +61 (04) 0249 4148 >>>>>> >>>>>> ____________________________________________________________________ >>>>>> >>>>>> _______________________________________________ >>>>>> ddlm-group mailing list >>>>>> ddlm-group@iucr.org >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> ddlm-group mailing list >>>>> ddlm-group@iucr.org >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >>>> cheers >>>> >>>> Nick >>>> >>>> -------------------------------- >>>> Associate Professor N. Spadaccini, PhD >>>> School of Computer Science & Software Engineering >>>> >>>> The University of Western Australia t: +61 (0)8 6488 3452 >>>> 35 Stirling Highway f: +61 (0)8 6488 1089 >>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >>>> MBDP M002 >>>> >>>> CRICOS Provider Code: 00126G >>>> >>>> e: Nick.Spadaccini@uwa.edu.au >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >> >> cheers >> >> Nick >> >> -------------------------------- >> Associate Professor N. Spadaccini, PhD >> School of Computer Science & Software Engineering >> >> The University of Western Australia t: +61 (0)8 6488 3452 >> 35 Stirling Highway f: +61 (0)8 6488 1089 >> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >> MBDP M002 >> >> CRICOS Provider Code: 00126G >> >> e: Nick.Spadaccini@uwa.edu.au >> >> >> >> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group -- ****************************************************************** John Westbrook, Ph.D. Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 e-mail: jwest@rcsb.rutgers.edu Ph: (732) 445-4290 Fax: (732) 445-4320 ****************************************************************** _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Finalizing DDLm (James Hester)
- References:
- Re: [ddlm-group] Finalizing DDLm (Nick Spadaccini)
- Re: [ddlm-group] Finalizing DDLm (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Finalizing DDLm
- Next by Date: [ddlm-group] Feedback on draft CIF2 specification from JohnBollinger
- Prev by thread: Re: [ddlm-group] Finalizing DDLm
- Next by thread: Re: [ddlm-group] Finalizing DDLm
- Index(es):