[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Finalizing DDLm

Dear Colleagues,

   I just checked the RCSB mmCIF of 4INS, and it seems to be following the 
consistent practice of doing all one-row categories as individual tags and 
values, e.g.:

_struct_sheet.id               B
_struct_sheet.type             ?
_struct_sheet.number_strands   2
_struct_sheet.details          ?

   As part of the CIF2 transition, are we going to tell people that the 
very large number of CIFS written this way all need to be rewritten? 
Why?  There is no ambiguity here.  There is no conflict here.  It is 
algorithmically trivial to handle this as a one-row loop.

This reminds me very much of why Pascal died as a CS language -- being too 
fussy about the wrong things.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Wed, 31 Mar 2010, Nick Spadaccini wrote:

>
>
>
> On 31/03/10 10:51 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
> wrote:
>
>> I really don't understand why
>>
>> _atom_site_label Cu
>> _atom_site.fract_x 0.0
>> _atom_site.fract_y 0.0
>> _atom_site.fract_z 0.0
>>
>> is a problem.  It looks very clear and is easily coerced to
>>
>> loop_
>> _atom_site_label
>> _atom_site.fract_x
>> _atom_site.fract_y
>> _atom_site.fract_z
>>   Cu 0.0 0.0 0.0
>>
>> when a dictionary is available to tell you that this is supposed to be a
>> looped list.
>
> While I am a big fan of using dictionaries I am repeatedly told there are
> many users who don't and won't. Those people will not know it is a List
> category in the former example.
>
> David's other example below that you refer to is just de-normalising the
> data. Considering DB courses all over the world spend so much time on
> normalisation begs the question why it is attractive to undo all that. But
> if people want to repeat data unnecessary so be it.
>
> But I think what David wants is a consequence of the category-subcategory
> semantics built in to DDLm. In the specification category-subcategory loops
> can appear separately or as an outer/inner join. The former makes mm people
> happy, the latter makes small molecule people happy.
>
> Because we have focused on List categories there is a key on which to join.
> Though Syd and I didn't consider the case, a semantically consistent view
> for a Set subcategory of a List parent category, would be a Cartesian
> Product. This would repeat a Set data for every row of the List. In some
> sense this is already catered for within the semantics of
> category-subcategory relationships.
>
> However the inverse case, unrolling List data into Set data is not part of
> DDLm. But as I have repeatedly stated, the IUCr is free to extend the
> specification of DDLm to suit its purposes for local implementation.
>
>
>>
>> And, while
>>
>> _atom_site_label Cu
>> _atom_site.fract_x 0.0
>>
>> loop_
>> _atom_site.fract_y 0.0
>> _atom_site.fract_z 0.0
>>
>> is a little strange it is just as clear.  The more interesting case is
>> related to what David suggested, of mixing a single tag value pair with a
>> loop with more than one row.  That seems very useful and echoes some of
>> what was already done in the mmCIF dictionary.
>>
>> If a sensible coercion is clear from the dictionary, why not just do it?
>>
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
>>
>> On Wed, 31 Mar 2010, Nick Spadaccini wrote:
>>
>>>
>>>
>>>
>>> On 31/03/10 12:23 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
>>> wrote:
>>>
>>>> 1.  Existing DDL2 style dictionaries have many looped categories with
>>>> subcategories.
>>>
>>> As do the dictionaries written in the DDLm.
>>>
>>>> 2.  I do not understand why there is any special need to forbid the
>>>> presentation of a single row list category as separate tags and
>>>> values, nor why there is any special need to forbid the presentation
>>>> of an unlooped catagory as a single row loop.  Any necessary coercion
>>>> is easily done in either case, and deserves at most a warning, if even
>>>> that.
>>>
>>> The reasoning is that you are enforcing its "type" specification. It is a
>>> List category object. ALL List category objects MUST be syntactically
>>> presented by the loop_ keyword, followed by a sequence of tags, followed by
>>> a list of values whose type then matches the tag type. Anybody reading the
>>> data in the absence of a dictionary will immediately know it is a List
>>> object.
>>>
>>> I must say Syd and my reasoning is pretty clear as to why we enforce it the
>>> way we do. However I can't see the reasoning behind your and David's desire
>>> to allow for,
>>>
>>> _atom_site_label Cu
>>> _atom_site.fract_x 0.0
>>>
>>> loop_
>>> _atom_site.fract_y 0.0
>>>
>>> _atom_site.fract_z 0.0
>>>
>>> The above is a logical consequence of what is being suggested. You may not
>>> intend it, but "when you'se open the can, you'se eat the worms".
>>>
>>> Again (and again and again) I repeat IF the IUCr wishes this to be an
>>> extension in its implementation of DDLm that is for it to decide.
>>>
>>>
>>>>
>>>>
>>>> =====================================================
>>>>   Herbert J. Bernstein, Professor of Computer Science
>>>>     Dowling College, Kramer Science Center, KSC 121
>>>>          Idle Hour Blvd, Oakdale, NY, 11769
>>>>
>>>>                   +1-631-244-3035
>>>>                   yaya@dowling.edu
>>>> =====================================================
>>>>
>>>> On Tue, 30 Mar 2010, David Brown wrote:
>>>>
>>>>> James seems to have summarized matters pretty well.  The implication is
>>>>> that
>>>>> a list category must be the end of the line - it cannot have a
>>>>> subcategory. 
>>>>> My real questions was whether a list category must explicitly included as a
>>>>> loop, or whether the loop structure is unnecessary if it only contained a
>>>>> single row.  It is easy enough to be safe by always inculding the loop, and
>>>>> I will probably arrange to do this in the dictionaries.  There are likely
>>>>> to
>>>>> be several places with single fow loops appear, e.g., in examples or
>>>>> aliases.
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>> James Hester wrote:
>>>>>       Thanks Nick for clarifying this.  We then return to David's
>>>>>       question.  If we assume that a 'Set' category cannot be a child
>>>>>       of a 'List' category (I hope this is written down somewhere if
>>>>>       it is the case) then my originally proposed solution would be
>>>>>       impossible.  Therefore, what David should do is to put the
>>>>>       invariant items into a *parent* 'Set' category and state that
>>>>>       the child 'List' category.  That would solve the immediate issue
>>>>>       of separating out looped and unlooped datanames.  If some
>>>>>       convenience is desired for dREL processing, the child 'List'
>>>>>       category could be made joinable to that parent category, thereby
>>>>>       making both invariant and looped items available in shorthand
>>>>>       form by looping over the parent category.  Of course, even if
>>>>>       the child 'List' category is not explicitly joined to the parent
>>>>>       'Set' category, the parent category can be explicitly referenced
>>>>>       in any dREL method using the full dataname.
>>>>>
>>>>>       Nick may wish to confirm that I have correctly understood the
>>>>>       proposed behaviour of DDLm.
>>>>>
>>>>>       James.
>>>>>
>>>>>       On Thu, Mar 25, 2010 at 3:18 PM, Nick Spadaccini
>>>>>       <nick@csse.uwa.edu.au> wrote:
>>>>>             I have not had any time to respond to David?s
>>>>>             original email or the
>>>>>             subsequent discussions. However I have discussed it
>>>>>             with Syd.
>>>>>
>>>>>             DDLm defines loopable categories strictly and
>>>>>             sub-categories of those have
>>>>>             strictly enforced outer joins (given we have ONLY
>>>>>             considered sub-categories
>>>>>             that are List). This is how we overcome the split
>>>>>             versus non-split versions
>>>>>             of atom_site and atom_site_aniso loops. List is
>>>>>             strictly looped, Set is
>>>>>             strictly non-looped - contrary to your reading, and
>>>>>             possibly Syd?s original
>>>>>             text. He has since re-read what he wrote and
>>>>>             clarified the ambiguity. I will
>>>>>             send you that re-write shortly.
>>>>>
>>>>>             We had not considered a SET category being a
>>>>>             sub-category of a List
>>>>>             category, but if it is allowed then it would not be
>>>>>             an outer-join as you
>>>>>             suggested in your previous contribution but a
>>>>>             relational Cartesian product
>>>>>             (which is very different).
>>>>>
>>>>>             The DDLm specification has that List category data
>>>>>             MUST appear in a loop
>>>>>             (irrespective of how many rows there are), and SET
>>>>>             categories are strictly
>>>>>             singular (non-looped data). The formal specification
>>>>>             will formally remain
>>>>>             that way.
>>>>>
>>>>>             HOWEVER the IUCr is free to ?extend? the
>>>>>             specification of the DDLm for its
>>>>>             own internal and private use, so long as you
>>>>>             appreciate that the FORMAL
>>>>>             published specification of what Syd and I have
>>>>>             created can?t include it.
>>>>>
>>>>>             You might explain as to why you feel you have
>>>>>             trouble with
>>>>>
>>>>>             loop_
>>>>>              _atom_site.label
>>>>>              _atom_site.frac_x
>>>>>              _atom_site.frac_y
>>>>>              _atom_site.frac_z
>>>>>              Cu 0 0 0
>>>>>
>>>>>             And yet
>>>>>
>>>>>              _atom_site.label  Cu
>>>>>              _atom_site.frac_x  0.
>>>>>              _atom_site.frac_y  0.
>>>>>              _atom_site.frac_z  0.
>>>>>
>>>>>             Is so much more obvious? Given that people
>>>>>             understand what the loop is, I
>>>>>             can't see what they would gain from the unrolled
>>>>>             version (apart from
>>>>>             confusion). The real danger is those less
>>>>>             experienced who DON'T read a
>>>>>             dictionary and read the latter form may be
>>>>>             encouraged to replicate it when
>>>>>             there is more that 1 atom (thus corrupting the CIF
>>>>>             structure).
>>>>>
>>>>>             However these are just personal observations, and if
>>>>>             the IUCr wants to
>>>>>             qualify the use of DDLm with its own tweaks there is
>>>>>             nothing stopping them
>>>>>             from doing so.
>>>>>
>>>>>
>>>>>
>>>>>>>>>
>>>>>>>>> At 10:35 AM -0500 3/11/10, David Brown wrote:
>>>>>>>>>>
>>>>>>>>>> Dear Colleagues,
>>>>>>>>>>
>>>>>>>>>> I assume that we are essentially finished in
>>>>>             resolving syntax
>>>>>>>>>> problems, but in that discussion some items
>>>>>             were identified as being
>>>>>>>>>> related to DDLm rather than syntax, so before
>>>>>             we settle into serious
>>>>>>>>>> dictionary writing we need to understand the
>>>>>             DDLm rules.
>>>>>>>>>>
>>>>>>>>>> One item that I believe was raised under this
>>>>>             heading was whether,
>>>>>>>>>> if a loop contained a single set of items, it
>>>>>             was necessary to
>>>>>>>>>> formally include this in a loop structure.  If
>>>>>             this is deemed to be
>>>>>>>>>> necessary, then there has to be some way of
>>>>>             identifying the items
>>>>>>>>>> that must appear in a loop.  The presence in
>>>>>             the dictionary of a
>>>>>>>>>> _category_key.* item would seem to flag this,
>>>>>             but it is applied at
>>>>>>>>>> the level of the category rather than at the
>>>>>             level of an individual
>>>>>>>>>> item.  If the requirement that the loop
>>>>>             structure must always be
>>>>>>>>>> used, then all the items in the category must
>>>>>             be loopable, i.e., the
>>>>>>>>>> category cannot include items that would not
>>>>>             normally be included in
>>>>>>>>>> the loop, items for example that apply equally
>>>>>             to all the listed
>>>>>>>>>> items such as a scale factor that is the same
>>>>>             for all the structure
>>>>>>>>>> factors in a loop.  This seems to be workable,
>>>>>             but I am not sure how
>>>>>>>>>> the legacy CIFs would fit in, since categories
>>>>>             may include some
>>>>>>>>>> listable item and some non-listable items, and
>>>>>             I am sure the
>>>>>>>>>> listable items do not always appear in a loop
>>>>>             if there is only one
>>>>>>>>>> set of such items reported in the CIF.
>>>>>>>>>>
>>>>>>>>>> Is this something that can be clarified fairly
>>>>>             easily?  It has an
>>>>>>>>>> important bearing on how the CIF dictionaries
>>>>>             are written.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> Attachment converted: Macintosh HD:idbrown
>>>>>             55.vcf (TEXT/ttxt) (0046DFC7)
>>>>>>>>>>
>>>>>             _______________________________________________
>>>>>>>>>> ddlm-group mailing list
>>>>>>>>>> ddlm-group@iucr.org
>>>>>>>>>>
>>>>>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>             =====================================================
>>>>>>>>>  Herbert J. Bernstein, Professor of Computer
>>>>>             Science
>>>>>>>>>    Dowling College, Kramer Science Center, KSC
>>>>>             121
>>>>>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>>>>
>>>>>>>>>                  +1-631-244-3035
>>>>>>>>>                  yaya@dowling.edu
>>>>>>>>>
>>>>>             =====================================================
>>>>>>>>> _______________________________________________
>>>>>>>>> ddlm-group mailing list
>>>>>>>>> ddlm-group@iucr.org
>>>>>>>>>
>>>>>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> T +61 (02) 9717 9907
>>>>>>>> F +61 (02) 9717 3145
>>>>>>>> M +61 (04) 0249 4148
>>>>>>>> _______________________________________________
>>>>>>>> ddlm-group mailing list
>>>>>>>> ddlm-group@iucr.org
>>>>>>>>
>>>>>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ddlm-group mailing list
>>>>>>> ddlm-group@iucr.org
>>>>>>>
>>>>>             http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> cheers
>>>>>
>>>>> Nick
>>>>>
>>>>> --------------------------------
>>>>> Associate Professor N. Spadaccini, PhD
>>>>> School of Computer Science & Software Engineering
>>>>>
>>>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3:
>>>>> www.csse.uwa.edu.au/~nick
>>>>> MBDP  M002
>>>>>
>>>>> CRICOS Provider Code: 00126G
>>>>>
>>>>> e: Nick.Spadaccini@uwa.edu.au
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> T +61 (02) 9717 9907
>>>>> F +61 (02) 9717 3145
>>>>> M +61 (04) 0249 4148
>>>>>
>>>>>     ____________________________________________________________________
>>>>>
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>> cheers
>>>
>>> Nick
>>>
>>> --------------------------------
>>> Associate Professor N. Spadaccini, PhD
>>> School of Computer Science & Software Engineering
>>>
>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>>> MBDP  M002
>>>
>>> CRICOS Provider Code: 00126G
>>>
>>> e: Nick.Spadaccini@uwa.edu.au
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
>
> CRICOS Provider Code: 00126G
>
> e: Nick.Spadaccini@uwa.edu.au
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]