Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] A rationalisation of DDLm

Dear John and group,

Yes, I see you are correct. A synthetic key composed of parts is no use unless the referenced category and the method referencing the category agree on how to construct the value. So just to be clear, the construction A[val].B, where val is the value of the synthetic key into category A will only be usable if the calling dREL has constructed 'val' in exactly the way that category A has defined that it should be constructed. 

In any case, we have added the syntax A[key1 = val1, key2=val2].C to access category A using appropriate values of the datanames forming a composite key, which takes care of any more complex requirements. So I believe the changes proposed above remain viable.

No current dREL methods use synthetic keys (they were replaced a little while ago). Below I've put a comparison of the new vs old way of writing a dREL method with and without synthetic keys. An alternative way to write it would be to loop over geom_bond and assign the individual values at the top of the loop before using them.  I note that the use below doesn't use the key to index into another category, but I could find no such examples.

Current version:

     With b as geom_bond
     xc = List()
     For [label,symop] in  [[b.atom_site_label_1,b.site_symmetry_1],
                            [b.atom_site_label_2,b.site_symmetry_2]]   {
         xf   =   SymEquiv(symop, _atom_site[label].fract_xyz)
         xc ++=  _atom_sites_Cartn_transform.matrix * xf
      }
      _geom_bond.distance =  Norm ( xc[0] - xc[1] )


and this is how it originally read, with 'b.id' being a synthetic key for geom_bond:

   With b as geom_bond
     xc = List()
     For [label,symop] in  b.id   {
         xf   =   SymEquiv(symop, _atom_site[label].fract_xyz)
         xc ++=  _atom_sites_Cartn_transform.matrix * xf
      }
      _geom_bond.distance =  Norm ( xc[0] - xc[1] )

all the best,
James.

On Tue, 20 Oct 2020 at 00:48, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Dear James and group,

 

I think I’m still missing something about the synthetic category keys.  For them to be useful in conjunction with the 'category[keyvalue].object' dREL syntax, doesn’t there need to be a way for dREL method authors to know what keyvalue to use?  And if the keys are opaque, then how does one determine which to use without getting it from the object that one wants to obtain?

 

As a related matter, if a dREL implementation fundamentally changed the form of the synthetic category keys it uses (which it could do if they were opaque, as proposed) then would that break any methods in our existing dictionaries?

 

If the motivation here is to enable getting rid of List, then I don’t think we need to make the synthetic keys opaque to do so.  We could instead use Tables for them instead of Lists, as needed. That would bring the added benefits that key components would be mapped by identifier instead of by position, and optionally that we could generate types for these keys algorithmically from the definitions of categories’ natural keys. (I don’t think we can do a proper job of generating keys algorithmically as Lists, because our mechanism for defining the natural key components does not, in principle, establish an order for them).

 

 

Regards,

 

John

 

--

John C. Bollinger, Ph.D., RHCSA

Computing and X-Ray Scientist

Department of Structural Biology

St. Jude Children's Research Hospital

John.Bollinger@StJude.org

(901) 595-3166 [office]

www.stjude.org

 

 

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of James Hester
Sent: Friday, October 16, 2020 7:36 PM
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] A rationalisation of DDLm

 

Caution: External Sender

 

Dear John and group:

 

On Sat, 17 Oct 2020 at 01:40, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Dear James and group,

 

To be sure I understand the nature of these changes, I ask you to confirm some details of my interpretation of the proposed changes:

 

The LOOP category: this is being removed because the only attribute it contains is _loop.level, which is unused because we do not avail ourselves of STAR nested loops in CIF, we do not anticipate doing so in the future, and we do not .  This has nothing directly to do with whether items are presented in a (single-level) looped list.  We still have Array and Table data types available for use in place of nested loops if we should discover a need for that.

 

Save frames: *text* elements of various item definitions are changed to avoid referencing save frames, so as to focus on semantics rather than representation.  The CIF-format representation of DDLm still uses save frames extensively.

 

Ref-loops: these are defined but not used in the present version of DDLm, they are unused in any of our current DDLm dictionaries, and (it is asserted) we do not anticipate a need or desire for them in the future, neither in DDLm itself nor in other dictionaries written in DDLm.  This is in part because we have backed away from (but not completely abandoned) the ref-table construct in terms of which ref-loops are defined, and in part because we want DDLm definitions to focus on semantics rather than representation.

 

List and Array data types: List is presently distinguished from Array in that the type of each element of a List is specified independently of the others, and can be any type, including complex ones, whereas all elements of an Array have the same type, which must be a numerical one.  These two are replaced with a generalized Array type whose elements are all the same type, but which type can be anything.  There is still Table if we want to define a complex type whose components have different types.

 

The above assertions are exactly what I was getting at, you have understood my intent correctly. 

 

Supposing that all of the above are confirmed, I have some questions:

 

1. DDLm uses save frames as an organizational structure, or at least its CIF representation does.  Do we really come away cleanly from removing references to save frames without replacing them with a different, more generic concept?

 

I would be happy if we could come up with a generic description corresponding to save frames but I'm not convinced it would be that essential. Ultimately save frames are simply a result of filtering tables of ontological information based on the defined data name, to save repetition of the data name - this seems clearest in DDL2, which is rigorous in defining the relational nature of DDL2 dictionaries (not just the incoming data).

 

2. With regard to synthetic keys previously specified as heterogeneous Lists: if our position is to be that synthetic keys should be opaque, then why are we defining any at all?  The multi-attribute natural keys we also have (right?) should be sufficient for defining the wanted identity, uniqueness, and relationship details for a category.  I am inclined to suppose that the single-attribute composed keys were intended to serve as an implementation aid, and I think we're throwing that out if we don't specify details of how those keys should be constructed.

 

I think single-attribute keys were provided in concert with the dREL syntax 'category[keyvalue].object'. As we have discussed in previous years, this scheme becomes unwieldy when categories have additional keys added. The core CIF dREL authors went further, and included internal structure in these keys. This resulted in the need for a way to describe the types of the resulting arbitrary, possibly nested, heterogeneous lists. There is no actual need for any internal structure from the dREL point of view, even if we posit that single-attribute keys are useful, because good relational hygiene dictates that the only use of keys should be to index into the needed information, not to contain that information within themselves.

 

So I don't believe that there is any useful aid to implementation provided by single-attribute keys with internal structure. Your question as to why have them at all if there is no implementation benefit is reasonable. I have been inclined to leave them in as an optional facility just in case they are useful to dictionary authors, but I think you are right, logically speaking there can be no actual benefit. If others agree we can remove the '_category.key_id' attribute completely and definitely bump the version to 4.0

 

 

Regards,

 

John

 

 

--

John C. Bollinger, Ph.D., RHCSA

Computing and X-ray Scientist

Department of Structural Biology

St. Jude Children's Research Hospital

 


From: ddlm-group <ddlm-group-bounces@iucr.org> on behalf of James Hester <jamesrhester@gmail.com>
Sent: Thursday, October 15, 2020 11:56 PM
To: ddlm-group <ddlm-group@iucr.org>
Subject: [ddlm-group] A rationalisation of DDLm

 

Caution: External Sender

 

Dear DDLm group,

 

In preparation for setting things in stone (famous last words) for Vol G second edition I have made some edits to the DDLm specification file ddl.dic to remove unused or obsolete tags. You can see the full changes in the pull request at https://github.com/COMCIFS/cif_core/pull/188 . In summary (from the pull request):

 

A number of items are obsolete in DDLm:

  1. Save frames. DDLm aims to be format-agnostic. All references to save frames and ref-loops have been removed, except for import statements for which a save frame reference is currently unavoidable
  2. LOOP category. No official dictionary (DDL1/2/m) has ever used more than 1 level of looping, so there is no need for such attributes, which are also CIF-format-specific.
  3. Complex data types. Data types of the form 'Type1|Type2' to indicate that an item could have more than one type are not used, are complex to implement, and so have been removed. Data types of the form List(Type2,Type3,...) to indicate List objects where each element is composed of a tuple of values, each of potentially different types, have been removed. These were almost exclusively used for synthetic keys, which should be opaque. Any other current use can be changed to Array type.
  4. Allow arrays of strings. The current DDLm dictionary restricts Arrays to numeric values for no apparent reason.

If these changes are considered significant enough, we should change the major version number to 4.0. Please comment on this.

 

thanks,

James.

--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

 



Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group


 

--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.