Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Finalizing DDLm

Title: Re: [ddlm-group] Finalizing DDLm

This doesn’t actually make things clearer or easier James. I will repeat it again.

Strict adherence to the formal specification of DDLm REQUIRES List categories to be presented as looped data, even if there is only one row.

If the IUCr wishes to universally adopt the case where instances of List categories that contain only one row may be presented as a Set category then it can do so as an accepted extension. This of course requires every application writer to necessarily read the dictionary to establish if the data is really a Set category or possibly a List category. Once the IUCr adopts this extension to DDLm within its implementation, I would assume every application writer would be required to adhere to it.

On 14/04/10 3:29 PM, "James Hester" <jamesrhester@gmail.com> wrote:

Dear all,

Both John and Herb have come out in favour of allowing one-row loops to be unrolled.  Nick and I are both sceptical about the value of this idea.  We have a few options:

1.  Disallow loop unrolling altogether (as in DDL1).
2.  Allow loop unrolling for all DDLm dictionaries
3.  Add a category-scope DDLm attribute stating that one-row loops in this category and child categories may be unrolled.  If it appears in the 'Head' category of the dictionary, it would mean that all categories in the dictionary could be unrolled.

We have not discussed option 3: it basically means deferring the decision on loop unrolling to the dictionary writers.  It also means that programmers of generic CIF software will need to be prepared for either behaviour, so in that sense it is slightly more burdensome than option 2.

Unless the silent majority would like to contribute further thoughts on this matter, I suggest that we vote and move on.  I discern that the voting so far would be:

Option 1: James, Nick
Option 2: John, Herb
Option 3: ?

(Some comments on John's post are inserted below).

James.
On Thu, Apr 1, 2010 at 10:41 AM, John Westbrook <jwest@rcsb.rutgers.edu> wrote:
Hi all,

Coming in late on this in support of Herb's position.

I  have never understood the necessity of marking a category as
a 'list' type in the dictionary in the early CIF DDL,
and in DDLm I find this even more confusing.   Given
that DDLm supports a category key which provides a
well defined basis for each category, this alone
would seem to provide the appropriate expression of
cardinality.

Absolutely agree, my objection is not to the loss of some packet ordering information, this is explicitly excluded from the infoset produced by the parser in any case.


The choice of exporting a category with a single row as
a collection of keyword-value pairs or  using a table
format via a loop_  seems like a presentation style
matter rather than dictionary issue.

It is more than a presentation issue, as you have lost the information that those key-value pairs belong together, and so you need to refer to your dictionary to reconstitute them as a group.  And if you allow the possibility of unrolling single-row loops for all categories, then significant extra work is done to check, and if necessary, transform the internal representation back to a canonical looped form.  This reconstruction of the canonical form is highly desirable in a DDLm context, where we often wish to apply dREL operations to all packets in a loop.
 

As Herb has observed, the vast majority of DDL2 files opt
for key-value output for any category with a single row.
I do not see what additional semantics are conveyed by
regulating the manner data presentation in these cases.

See above - some semantic information is lost.
 

John

On 3/31/10 10:28 AM, Herbert J. Bernstein wrote:
> Dear Colleagues,
>
> I just checked the RCSB mmCIF of 4INS, and it seems to be following the
> consistent practice of doing all one-row categories as individual tags
> and values, e.g.:
>
> _struct_sheet.id <http://struct_sheet.id>  B
> _struct_sheet.type ?
> _struct_sheet.number_strands 2
> _struct_sheet.details ?
>
> As part of the CIF2 transition, are we going to tell people that the
> very large number of CIFS written this way all need to be rewritten?
> Why? There is no ambiguity here. There is no conflict here. It is
> algorithmically trivial to handle this as a one-row loop.
>
> This reminds me very much of why Pascal died as a CS language -- being
> too fussy about the wrong things.
>
> Regards,
> Herbert
>
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
>
> +1-631-244-3035
> yaya@dowling.edu
> =====================================================
>
> On Wed, 31 Mar 2010, Nick Spadaccini wrote:
>
>>
>>
>>
>> On 31/03/10 10:51 AM, "Herbert J. Bernstein"
>> <yaya@bernstein-plus-sons.com>
>> wrote:
>>
>>> I really don't understand why
>>>
>>> _atom_site_label Cu
>>> _atom_site.fract_x 0.0
>>> _atom_site.fract_y 0.0
>>> _atom_site.fract_z 0.0
>>>
>>> is a problem. It looks very clear and is easily coerced to
>>>
>>> loop_
>>> _atom_site_label
>>> _atom_site.fract_x
>>> _atom_site.fract_y
>>> _atom_site.fract_z
>>> Cu 0.0 0.0 0.0
>>>
>>> when a dictionary is available to tell you that this is supposed to be a
>>> looped list.
>>
>> While I am a big fan of using dictionaries I am repeatedly told there are
>> many users who don't and won't. Those people will not know it is a List
>> category in the former example.
>>
>> David's other example below that you refer to is just de-normalising the
>> data. Considering DB courses all over the world spend so much time on
>> normalisation begs the question why it is attractive to undo all that.
>> But
>> if people want to repeat data unnecessary so be it.
>>
>> But I think what David wants is a consequence of the category-subcategory
>> semantics built in to DDLm. In the specification category-subcategory
>> loops
>> can appear separately or as an outer/inner join. The former makes mm
>> people
>> happy, the latter makes small molecule people happy.
>>
>> Because we have focused on List categories there is a key on which to
>> join.
>> Though Syd and I didn't consider the case, a semantically consistent view
>> for a Set subcategory of a List parent category, would be a Cartesian
>> Product. This would repeat a Set data for every row of the List. In some
>> sense this is already catered for within the semantics of
>> category-subcategory relationships.
>>
>> However the inverse case, unrolling List data into Set data is not
>> part of
>> DDLm. But as I have repeatedly stated, the IUCr is free to extend the
>> specification of DDLm to suit its purposes for local implementation.
>>
>>
>>>
>>> And, while
>>>
>>> _atom_site_label Cu
>>> _atom_site.fract_x 0.0
>>>
>>> loop_
>>> _atom_site.fract_y 0.0
>>> _atom_site.fract_z 0.0
>>>
>>> is a little strange it is just as clear. The more interesting case is
>>> related to what David suggested, of mixing a single tag value pair
>>> with a
>>> loop with more than one row. That seems very useful and echoes some of
>>> what was already done in the mmCIF dictionary.
>>>
>>> If a sensible coercion is clear from the dictionary, why not just do it?
>>>
>>> =====================================================
>>> Herbert J. Bernstein, Professor of Computer Science
>>> Dowling College, Kramer Science Center, KSC 121
>>> Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>> +1-631-244-3035
>>> yaya@dowling.edu
>>> =====================================================
>>>
>>> On Wed, 31 Mar 2010, Nick Spadaccini wrote:
>>>
>>>>
>>>>
>>>>
>>>> On 31/03/10 12:23 AM, "Herbert J. Bernstein"
>>>> <yaya@bernstein-plus-sons.com>
>>>> wrote:
>>>>
>>>>> 1. Existing DDL2 style dictionaries have many looped categories with
>>>>> subcategories.
>>>>
>>>> As do the dictionaries written in the DDLm.
>>>>
>>>>> 2. I do not understand why there is any special need to forbid the
>>>>> presentation of a single row list category as separate tags and
>>>>> values, nor why there is any special need to forbid the presentation
>>>>> of an unlooped catagory as a single row loop. Any necessary coercion
>>>>> is easily done in either case, and deserves at most a warning, if even
>>>>> that.
>>>>
>>>> The reasoning is that you are enforcing its "type" specification. It
>>>> is a
>>>> List category object. ALL List category objects MUST be syntactically
>>>> presented by the loop_ keyword, followed by a sequence of tags,
>>>> followed by
>>>> a list of values whose type then matches the tag type. Anybody
>>>> reading the
>>>> data in the absence of a dictionary will immediately know it is a List
>>>> object.
>>>>
>>>> I must say Syd and my reasoning is pretty clear as to why we enforce
>>>> it the
>>>> way we do. However I can't see the reasoning behind your and David's
>>>> desire
>>>> to allow for,
>>>>
>>>> _atom_site_label Cu
>>>> _atom_site.fract_x 0.0
>>>>
>>>> loop_
>>>> _atom_site.fract_y 0.0
>>>>
>>>> _atom_site.fract_z 0.0
>>>>
>>>> The above is a logical consequence of what is being suggested. You
>>>> may not
>>>> intend it, but "when you'se open the can, you'se eat the worms".
>>>>
>>>> Again (and again and again) I repeat IF the IUCr wishes this to be an
>>>> extension in its implementation of DDLm that is for it to decide.
>>>>
>>>>
>>>>>
>>>>>
>>>>> =====================================================
>>>>> Herbert J. Bernstein, Professor of Computer Science
>>>>> Dowling College, Kramer Science Center, KSC 121
>>>>> Idle Hour Blvd, Oakdale, NY, 11769
>>>>>
>>>>> +1-631-244-3035
>>>>> yaya@dowling.edu
>>>>> =====================================================
>>>>>
>>>>> On Tue, 30 Mar 2010, David Brown wrote:
>>>>>
>>>>>> James seems to have summarized matters pretty well.  The
>>>>>> implication is
>>>>>> that
>>>>>> a list category must be the end of the line - it cannot have a
>>>>>> subcategory.
>>>>>> My real questions was whether a list category must explicitly
>>>>>> included as a
>>>>>> loop, or whether the loop structure is unnecessary if it only
>>>>>> contained a
>>>>>> single row.  It is easy enough to be safe by always inculding the
>>>>>> loop, and
>>>>>> I will probably arrange to do this in the dictionaries.  There are
>>>>>> likely
>>>>>> to
>>>>>> be several places with single fow loops appear, e.g., in examples or
>>>>>> aliases.
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>> James Hester wrote:
>>>>>> Thanks Nick for clarifying this.  We then return to David's
>>>>>> question.  If we assume that a 'Set' category cannot be a child
>>>>>> of a 'List' category (I hope this is written down somewhere if
>>>>>> it is the case) then my originally proposed solution would be
>>>>>> impossible.  Therefore, what David should do is to put the
>>>>>> invariant items into a *parent* 'Set' category and state that
>>>>>> the child 'List' category.  That would solve the immediate issue
>>>>>> of separating out looped and unlooped datanames.  If some
>>>>>> convenience is desired for dREL processing, the child 'List'
>>>>>> category could be made joinable to that parent category, thereby
>>>>>> making both invariant and looped items available in shorthand
>>>>>> form by looping over the parent category.  Of course, even if
>>>>>> the child 'List' category is not explicitly joined to the parent
>>>>>> 'Set' category, the parent category can be explicitly referenced
>>>>>> in any dREL method using the full dataname.
>>>>>>
>>>>>> Nick may wish to confirm that I have correctly understood the
>>>>>> proposed behaviour of DDLm.
>>>>>>
>>>>>> James.
>>>>>>
>>>>>> On Thu, Mar 25, 2010 at 3:18 PM, Nick Spadaccini
>>>>>> <nick@csse.uwa.edu.au> wrote:
>>>>>> I have not had any time to respond to David?s
>>>>>> original email or the
>>>>>> subsequent discussions. However I have discussed it
>>>>>> with Syd.
>>>>>>
>>>>>> DDLm defines loopable categories strictly and
>>>>>> sub-categories of those have
>>>>>> strictly enforced outer joins (given we have ONLY
>>>>>> considered sub-categories
>>>>>> that are List). This is how we overcome the split
>>>>>> versus non-split versions
>>>>>> of atom_site and atom_site_aniso loops. List is
>>>>>> strictly looped, Set is
>>>>>> strictly non-looped - contrary to your reading, and
>>>>>> possibly Syd?s original
>>>>>> text. He has since re-read what he wrote and
>>>>>> clarified the ambiguity. I will
>>>>>> send you that re-write shortly.
>>>>>>
>>>>>> We had not considered a SET category being a
>>>>>> sub-category of a List
>>>>>> category, but if it is allowed then it would not be
>>>>>> an outer-join as you
>>>>>> suggested in your previous contribution but a
>>>>>> relational Cartesian product
>>>>>> (which is very different).
>>>>>>
>>>>>> The DDLm specification has that List category data
>>>>>> MUST appear in a loop
>>>>>> (irrespective of how many rows there are), and SET
>>>>>> categories are strictly
>>>>>> singular (non-looped data). The formal specification
>>>>>> will formally remain
>>>>>> that way.
>>>>>>
>>>>>> HOWEVER the IUCr is free to ?extend? the
>>>>>> specification of the DDLm for its
>>>>>> own internal and private use, so long as you
>>>>>> appreciate that the FORMAL
>>>>>> published specification of what Syd and I have
>>>>>> created can?t include it.
>>>>>>
>>>>>> You might explain as to why you feel you have
>>>>>> trouble with
>>>>>>
>>>>>> loop_
>>>>>>  _atom_site.label
>>>>>>  _atom_site.frac_x
>>>>>>  _atom_site.frac_y
>>>>>>  _atom_site.frac_z
>>>>>>  Cu 0 0 0
>>>>>>
>>>>>> And yet
>>>>>>
>>>>>>  _atom_site.label  Cu
>>>>>>  _atom_site.frac_x  0.
>>>>>>  _atom_site.frac_y  0.
>>>>>>  _atom_site.frac_z  0.
>>>>>>
>>>>>> Is so much more obvious? Given that people
>>>>>> understand what the loop is, I
>>>>>> can't see what they would gain from the unrolled
>>>>>> version (apart from
>>>>>> confusion). The real danger is those less
>>>>>> experienced who DON'T read a
>>>>>> dictionary and read the latter form may be
>>>>>> encouraged to replicate it when
>>>>>> there is more that 1 atom (thus corrupting the CIF
>>>>>> structure).
>>>>>>
>>>>>> However these are just personal observations, and if
>>>>>> the IUCr wants to
>>>>>> qualify the use of DDLm with its own tweaks there is
>>>>>> nothing stopping them
>>>>>> from doing so.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>>
>>>>>>>>>> At 10:35 AM -0500 3/11/10, David Brown wrote:
>>>>>>>>>>>
>>>>>>>>>>> Dear Colleagues,
>>>>>>>>>>>
>>>>>>>>>>> I assume that we are essentially finished in
>>>>>> resolving syntax
>>>>>>>>>>> problems, but in that discussion some items
>>>>>> were identified as being
>>>>>>>>>>> related to DDLm rather than syntax, so before
>>>>>> we settle into serious
>>>>>>>>>>> dictionary writing we need to understand the
>>>>>> DDLm rules.
>>>>>>>>>>>
>>>>>>>>>>> One item that I believe was raised under this
>>>>>> heading was whether,
>>>>>>>>>>> if a loop contained a single set of items, it
>>>>>> was necessary to
>>>>>>>>>>> formally include this in a loop structure.  If
>>>>>> this is deemed to be
>>>>>>>>>>> necessary, then there has to be some way of
>>>>>> identifying the items
>>>>>>>>>>> that must appear in a loop.  The presence in
>>>>>> the dictionary of a
>>>>>>>>>>> _category_key.* item would seem to flag this,
>>>>>> but it is applied at
>>>>>>>>>>> the level of the category rather than at the
>>>>>> level of an individual
>>>>>>>>>>> item.  If the requirement that the loop
>>>>>> structure must always be
>>>>>>>>>>> used, then all the items in the category must
>>>>>> be loopable, i.e., the
>>>>>>>>>>> category cannot include items that would not
>>>>>> normally be included in
>>>>>>>>>>> the loop, items for example that apply equally
>>>>>> to all the listed
>>>>>>>>>>> items such as a scale factor that is the same
>>>>>> for all the structure
>>>>>>>>>>> factors in a loop.  This seems to be workable,
>>>>>> but I am not sure how
>>>>>>>>>>> the legacy CIFs would fit in, since categories
>>>>>> may include some
>>>>>>>>>>> listable item and some non-listable items, and
>>>>>> I am sure the
>>>>>>>>>>> listable items do not always appear in a loop
>>>>>> if there is only one
>>>>>>>>>>> set of such items reported in the CIF.
>>>>>>>>>>>
>>>>>>>>>>> Is this something that can be clarified fairly
>>>>>> easily?  It has an
>>>>>>>>>>> important bearing on how the CIF dictionaries
>>>>>> are written.
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> Attachment converted: Macintosh HD:idbrown
>>>>>> 55.vcf (TEXT/ttxt) (0046DFC7)
>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>>>>>>> ddlm-group mailing list
>>>>>>>>>>> ddlm-group@iucr.org
>>>>>>>>>>>
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>> =====================================================
>>>>>>>>>>  Herbert J. Bernstein, Professor of Computer
>>>>>> Science
>>>>>>>>>>    Dowling College, Kramer Science Center, KSC
>>>>>> 121
>>>>>>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>>>>>
>>>>>>>>>>                  +1-631-244-3035
>>>>>>>>>>                  yaya@dowling.edu
>>>>>>>>>>
>>>>>> =====================================================
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ddlm-group mailing list
>>>>>>>>>> ddlm-group@iucr.org
>>>>>>>>>>
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> T +61 (02) 9717 9907
>>>>>>>>> F +61 (02) 9717 3145
>>>>>>>>> M +61 (04) 0249 4148
>>>>>>>>> _______________________________________________
>>>>>>>>> ddlm-group mailing list
>>>>>>>>> ddlm-group@iucr.org
>>>>>>>>>
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ddlm-group mailing list
>>>>>>>> ddlm-group@iucr.org
>>>>>>>>
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> cheers
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>> --------------------------------
>>>>>> Associate Professor N. Spadaccini, PhD
>>>>>> School of Computer Science & Software Engineering
>>>>>>
>>>>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>>>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>>>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3:
>>>>>> www.csse.uwa.edu.au/~nick <http://www.csse.uwa.edu.au/%7Enick>
>>>>>> MBDP  M002
>>>>>>
>>>>>> CRICOS Provider Code: 00126G
>>>>>>
>>>>>> e: Nick.Spadaccini@uwa.edu.au
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ddlm-group mailing list
>>>>>> ddlm-group@iucr.org
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> T +61 (02) 9717 9907
>>>>>> F +61 (02) 9717 3145
>>>>>> M +61 (04) 0249 4148
>>>>>>
>>>>>> ____________________________________________________________________
>>>>>>
>>>>>> _______________________________________________
>>>>>> ddlm-group mailing list
>>>>>> ddlm-group@iucr.org
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>
>>>> cheers
>>>>
>>>> Nick
>>>>
>>>> --------------------------------
>>>> Associate Professor N. Spadaccini, PhD
>>>> School of Computer Science & Software Engineering
>>>>
>>>> The University of Western Australia t: +61 (0)8 6488 3452
>>>> 35 Stirling Highway f: +61 (0)8 6488 1089
>>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick <http://www.csse.uwa.edu.au/%7Enick>
>>>> MBDP M002
>>>>
>>>> CRICOS Provider Code: 00126G
>>>>
>>>> e: Nick.Spadaccini@uwa.edu.au
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>
>>
>> cheers
>>
>> Nick
>>
>> --------------------------------
>> Associate Professor N. Spadaccini, PhD
>> School of Computer Science & Software Engineering
>>
>> The University of Western Australia t: +61 (0)8 6488 3452
>> 35 Stirling Highway f: +61 (0)8 6488 1089
>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick <http://www.csse.uwa.edu.au/%7Enick>
>> MBDP M002
>>
>> CRICOS Provider Code: 00126G
>>
>> e: Nick.Spadaccini@uwa.edu.au
>>
>>
>>
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

--
******************************************************************
   John Westbrook, Ph.D.
   Rutgers, The State University of New Jersey
   Department of Chemistry and Chemical Biology
   610 Taylor Road
   Piscataway, NJ 08854-8087
   e-mail: jwest@rcsb.rutgers.edu
   Ph:  (732) 445-4290  Fax: (732) 445-4320
******************************************************************
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group



cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.