Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] A rationalisation of DDLm

Dear James and group,

I think you mean that you have removed _category.key_id, not _category_key.id.  I am indeed fine with that.

Before this goes to press, however, I want to be sure we're all ok with the fact that that leaves category 'category' without any attributes.  I see no logical inconsistency in that, but it is possible that it would present a practical problem for software not prepared for it. If we want to remove 'category' (retaining 'category_key', of course) then now would be the time to do it.  Alternatively, it might be apropos to add a sentence or two to the description of 'category' to acknowledge that it doesn't have any attributes of its own, serving only as a place to hang a description of the concept of a DDLm "category".


Best regards,

John

--
John C. Bollinger, Ph.D., RHCSA
Computing and X-ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital



From: James Hester <jamesrhester@gmail.com>
Sent: Thursday, October 29, 2020 8:23 PM
To: Bollinger, John C <John.Bollinger@STJUDE.ORG>
Cc: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] A rationalisation of DDLm
 
Caution: External Sender

Dear DDLm group,

I have just updated the pull request (https://github.com/COMCIFS/cif_core/pull/188) to include:
(1) Removal of _category_key.id
(2) Update major version to 4.0.0
and given the update to the major version:
(3) Removal of previously deprecated datanames (anything with 'xref' in it)
(4) Removal of deprecated content types ('Count' and 'Index')

I believe that given the current and previous discussions these changes will not be controversial. Please advise of any issues either here or in comments to the pull request. If everything is OK, I'll ask someone who is not me to merge the request.

The next step will be to update the core dictionary to conform with these changes (very minor) after which Brian is ready to publish it on the IUCr website.

thanks,
James.

On Wed, 21 Oct 2020 at 10:01, James Hester <jamesrhester@gmail.com> wrote:
I have no objection to removing synthetic keys, in fact none are left in cif_core. There is no DDLm attribute that specifically defines a synthetic key, rather it is a combination of (1) a data name whose dREL "synthesises" the key; (2) _category.key_id which points to this dataname i.e. a single dataname that can be used to index into a category; (3) _type.contents support for the complicated heterogeneous structures that synthetic keys might like to create. As for (1), none of the former synthetic key data names remain in cif_core, so in a way we have already abandoned synthetic keys.  Removing _category.key_id would be a further welcome simplification, as the same information is available in _category_key.name, and the current proposal removes support from _type.contents.

So the practical question comes down to, in addition to the changes listed in my first email, shall we also remove _category.key_id (and bump the major version number to 4.0)?

On Wed, 21 Oct 2020 at 08:43, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Dear James and group,

 

Ok, then circling back around: if the synthetic keys are to be inappropriate for use with dREL (which we claim is acceptable), and they are not needed for internal DDLm structural purposes either (which I think we agree), then wouldnít it be appropriate to remove them from DDLm altogether?

 

 

Regards,

 

John

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of James Hester
Sent: Tuesday, October 20, 2020 1:08 AM
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] A rationalisation of DDLm

 

Caution: External Sender

 

Dear John and group,

 

Yes, I see you are correct. A synthetic key composed of parts is no use unless the referenced category and the method referencing the category agree on how to construct the value. So just to be clear, the construction A[val].B, where val is the value of the synthetic key into category A will only be usable if the calling dREL has constructed 'val' in exactly the way that category A has defined that it should be constructed. 

 

In any case, we have added the syntax A[key1 = val1, key2=val2].C to access category A using appropriate values of the datanames forming a composite key, which takes care of any more complex requirements. So I believe the changes proposed above remain viable.

 

No current dREL methods use synthetic keys (they were replaced a little while ago). Below I've put a comparison of the new vs old way of writing a dREL method with and without synthetic keys. An alternative way to write it would be to loop over geom_bond and assign the individual values at the top of the loop before using them.  I note that the use below doesn't use the key to index into another category, but I could find no such examples.

 

Current version:

 

     With b as geom_bond
     xc = List()
     For [label,symop] in  [[b.atom_site_label_1,b.site_symmetry_1],
                            [b.atom_site_label_2,b.site_symmetry_2]]   {
         xf   =   SymEquiv(symop, _atom_site[label].fract_xyz)
         xc ++=  _atom_sites_Cartn_transform.matrix * xf
      }
      _geom_bond.distance =  Norm ( xc[0] - xc[1] )

 

and this is how it originally read, with 'b.id' being a synthetic key for geom_bond:

 

 

   With b as geom_bond

     xc = List()

     For [label,symop] in  b.id   {

         xf   =   SymEquiv(symop, _atom_site[label].fract_xyz)

         xc ++=  _atom_sites_Cartn_transform.matrix * xf

      }

      _geom_bond.distance =  Norm ( xc[0] - xc[1] )



 

 

 

all the best,

James.

 

On Tue, 20 Oct 2020 at 00:48, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Dear James and group,

 

I think Iím still missing something about the synthetic category keys.  For them to be useful in conjunction with the 'category[keyvalue].object' dREL syntax, doesnít there need to be a way for dREL method authors to know what keyvalue to use?  And if the keys are opaque, then how does one determine which to use without getting it from the object that one wants to obtain?

 

As a related matter, if a dREL implementation fundamentally changed the form of the synthetic category keys it uses (which it could do if they were opaque, as proposed) then would that break any methods in our existing dictionaries?

 

If the motivation here is to enable getting rid of List, then I donít think we need to make the synthetic keys opaque to do so.  We could instead use Tables for them instead of Lists, as needed. That would bring the added benefits that key components would be mapped by identifier instead of by position, and optionally that we could generate types for these keys algorithmically from the definitions of categoriesí natural keys. (I donít think we can do a proper job of generating keys algorithmically as Lists, because our mechanism for defining the natural key components does not, in principle, establish an order for them).

 

 

Regards,

 

John

 

--

John C. Bollinger, Ph.D., RHCSA

Computing and X-Ray Scientist

Department of Structural Biology

St. Jude Children's Research Hospital

John.Bollinger@StJude.org

(901) 595-3166 [office]

www.stjude.org

 

 

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of James Hester
Sent: Friday, October 16, 2020 7:36 PM
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] A rationalisation of DDLm

 

Caution: External Sender

 

Dear John and group:

 

On Sat, 17 Oct 2020 at 01:40, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Dear James and group,

 

To be sure I understand the nature of these changes, I ask you to confirm some details of my interpretation of the proposed changes:

 

The LOOP category: this is being removed because the only attribute it contains is _loop.level, which is unused because we do not avail ourselves of STAR nested loops in CIF, we do not anticipate doing so in the future, and we do not .  This has nothing directly to do with whether items are presented in a (single-level) looped list.  We still have Array and Table data types available for use in place of nested loops if we should discover a need for that.

 

Save frames: *text* elements of various item definitions are changed to avoid referencing save frames, so as to focus on semantics rather than representation.  The CIF-format representation of DDLm still uses save frames extensively.

 

Ref-loops: these are defined but not used in the present version of DDLm, they are unused in any of our current DDLm dictionaries, and (it is asserted) we do not anticipate a need or desire for them in the future, neither in DDLm itself nor in other dictionaries written in DDLm.  This is in part because we have backed away from (but not completely abandoned) the ref-table construct in terms of which ref-loops are defined, and in part because we want DDLm definitions to focus on semantics rather than representation.

 

List and Array data types: List is presently distinguished from Array in that the type of each element of a List is specified independently of the others, and can be any type, including complex ones, whereas all elements of an Array have the same type, which must be a numerical one.  These two are replaced with a generalized Array type whose elements are all the same type, but which type can be anything.  There is still Table if we want to define a complex type whose components have different types.

 

The above assertions are exactly what I was getting at, you have understood my intent correctly. 

 

Supposing that all of the above are confirmed, I have some questions:

 

1. DDLm uses save frames as an organizational structure, or at least its CIF representation does.  Do we really come away cleanly from removing references to save frames without replacing them with a different, more generic concept?

 

I would be happy if we could come up with a generic description corresponding to save frames but I'm not convinced it would be that essential. Ultimately save frames are simply a result of filtering tables of ontological information based on the defined data name, to save repetition of the data name - this seems clearest in DDL2, which is rigorous in defining the relational nature of DDL2 dictionaries (not just the incoming data).

 

2. With regard to synthetic keys previously specified as heterogeneous Lists: if our position is to be that synthetic keys should be opaque, then why are we defining any at all?  The multi-attribute natural keys we also have (right?) should be sufficient for defining the wanted identity, uniqueness, and relationship details for a category.  I am inclined to suppose that the single-attribute composed keys were intended to serve as an implementation aid, and I think we're throwing that out if we don't specify details of how those keys should be constructed.

 

I think single-attribute keys were provided in concert with the dREL syntax 'category[keyvalue].object'. As we have discussed in previous years, this scheme becomes unwieldy when categories have additional keys added. The core CIF dREL authors went further, and included internal structure in these keys. This resulted in the need for a way to describe the types of the resulting arbitrary, possibly nested, heterogeneous lists. There is no actual need for any internal structure from the dREL point of view, even if we posit that single-attribute keys are useful, because good relational hygiene dictates that the only use of keys should be to index into the needed information, not to contain that information within themselves.

 

So I don't believe that there is any useful aid to implementation provided by single-attribute keys with internal structure. Your question as to why have them at all if there is no implementation benefit is reasonable. I have been inclined to leave them in as an optional facility just in case they are useful to dictionary authors, but I think you are right, logically speaking there can be no actual benefit. If others agree we can remove the '_category.key_id' attribute completely and definitely bump the version to 4.0

 

 

Regards,

 

John

 

 

--

John C. Bollinger, Ph.D., RHCSA

Computing and X-ray Scientist

Department of Structural Biology

St. Jude Children's Research Hospital

 


From: ddlm-group <ddlm-group-bounces@iucr.org> on behalf of James Hester <jamesrhester@gmail.com>
Sent: Thursday, October 15, 2020 11:56 PM
To: ddlm-group <ddlm-group@iucr.org>
Subject: [ddlm-group] A rationalisation of DDLm

 

Caution: External Sender

 

Dear DDLm group,

 

In preparation for setting things in stone (famous last words) for Vol G second edition I have made some edits to the DDLm specification file ddl.dic to remove unused or obsolete tags. You can see the full changes in the pull request at https://github.com/COMCIFS/cif_core/pull/188 . In summary (from the pull request):

 

A number of items are obsolete in DDLm:

  1. Save frames. DDLm aims to be format-agnostic. All references to save frames and ref-loops have been removed, except for import statements for which a save frame reference is currently unavoidable
  2. LOOP category. No official dictionary (DDL1/2/m) has ever used more than 1 level of looping, so there is no need for such attributes, which are also CIF-format-specific.
  3. Complex data types. Data types of the form 'Type1|Type2' to indicate that an item could have more than one type are not used, are complex to implement, and so have been removed. Data types of the form List(Type2,Type3,...) to indicate List objects where each element is composed of a tuple of values, each of potentially different types, have been removed. These were almost exclusively used for synthetic keys, which should be opaque. Any other current use can be changed to Array type.
  4. Allow arrays of strings. The current DDLm dictionary restricts Arrays to numeric values for no apparent reason.

If these changes are considered significant enough, we should change the major version number to 4.0. Please comment on this.

 

thanks,

James.

--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

 



Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group


 

--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group


 

--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.