Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] A rationalisation of DDLm

Dear John and group:

On Sat, 17 Oct 2020 at 01:40, Bollinger, John C <John.Bollinger@stjude.org> wrote:
Dear James and group,

To be sure I understand the nature of these changes, I ask you to confirm some details of my interpretation of the proposed changes:

The LOOP category: this is being removed because the only attribute it contains is _loop.level, which is unused because we do not avail ourselves of STAR nested loops in CIF, we do not anticipate doing so in the future, and we do not .  This has nothing directly to do with whether items are presented in a (single-level) looped list.  We still have Array and Table data types available for use in place of nested loops if we should discover a need for that.

Save frames: *text* elements of various item definitions are changed to avoid referencing save frames, so as to focus on semantics rather than representation.  The CIF-format representation of DDLm still uses save frames extensively.

Ref-loops: these are defined but not used in the present version of DDLm, they are unused in any of our current DDLm dictionaries, and (it is asserted) we do not anticipate a need or desire for them in the future, neither in DDLm itself nor in other dictionaries written in DDLm.  This is in part because we have backed away from (but not completely abandoned) the ref-table construct in terms of which ref-loops are defined, and in part because we want DDLm definitions to focus on semantics rather than representation.

List and Array data types: List is presently distinguished from Array in that the type of each element of a List is specified independently of the others, and can be any type, including complex ones, whereas all elements of an Array have the same type, which must be a numerical one.  These two are replaced with a generalized Array type whose elements are all the same type, but which type can be anything.  There is still Table if we want to define a complex type whose components have different types.

The above assertions are exactly what I was getting at, you have understood my intent correctly. 

Supposing that all of the above are confirmed, I have some questions:

1. DDLm uses save frames as an organizational structure, or at least its CIF representation does.  Do we really come away cleanly from removing references to save frames without replacing them with a different, more generic concept?

I would be happy if we could come up with a generic description corresponding to save frames but I'm not convinced it would be that essential. Ultimately save frames are simply a result of filtering tables of ontological information based on the defined data name, to save repetition of the data name - this seems clearest in DDL2, which is rigorous in defining the relational nature of DDL2 dictionaries (not just the incoming data).

2. With regard to synthetic keys previously specified as heterogeneous Lists: if our position is to be that synthetic keys should be opaque, then why are we defining any at all?  The multi-attribute natural keys we also have (right?) should be sufficient for defining the wanted identity, uniqueness, and relationship details for a category.  I am inclined to suppose that the single-attribute composed keys were intended to serve as an implementation aid, and I think we're throwing that out if we don't specify details of how those keys should be constructed.

I think single-attribute keys were provided in concert with the dREL syntax 'category[keyvalue].object'. As we have discussed in previous years, this scheme becomes unwieldy when categories have additional keys added. The core CIF dREL authors went further, and included internal structure in these keys. This resulted in the need for a way to describe the types of the resulting arbitrary, possibly nested, heterogeneous lists. There is no actual need for any internal structure from the dREL point of view, even if we posit that single-attribute keys are useful, because good relational hygiene dictates that the only use of keys should be to index into the needed information, not to contain that information within themselves.

So I don't believe that there is any useful aid to implementation provided by single-attribute keys with internal structure. Your question as to why have them at all if there is no implementation benefit is reasonable. I have been inclined to leave them in as an optional facility just in case they are useful to dictionary authors, but I think you are right, logically speaking there can be no actual benefit. If others agree we can remove the '_category.key_id' attribute completely and definitely bump the version to 4.0



John C. Bollinger, Ph.D., RHCSA
Computing and X-ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital

From: ddlm-group <ddlm-group-bounces@iucr.org> on behalf of James Hester <jamesrhester@gmail.com>
Sent: Thursday, October 15, 2020 11:56 PM
To: ddlm-group <ddlm-group@iucr.org>
Subject: [ddlm-group] A rationalisation of DDLm
Caution: External Sender

Dear DDLm group,

In preparation for setting things in stone (famous last words) for Vol G second edition I have made some edits to the DDLm specification file ddl.dic to remove unused or obsolete tags. You can see the full changes in the pull request at https://github.com/COMCIFS/cif_core/pull/188 . In summary (from the pull request):

A number of items are obsolete in DDLm:

  1. Save frames. DDLm aims to be format-agnostic. All references to save frames and ref-loops have been removed, except for import statements for which a save frame reference is currently unavoidable
  2. LOOP category. No official dictionary (DDL1/2/m) has ever used more than 1 level of looping, so there is no need for such attributes, which are also CIF-format-specific.
  3. Complex data types. Data types of the form 'Type1|Type2' to indicate that an item could have more than one type are not used, are complex to implement, and so have been removed. Data types of the form List(Type2,Type3,...) to indicate List objects where each element is composed of a tuple of values, each of potentially different types, have been removed. These were almost exclusively used for synthetic keys, which should be opaque. Any other current use can be changed to Array type.
  4. Allow arrays of strings. The current DDLm dictionary restricts Arrays to numeric values for no apparent reason.

If these changes are considered significant enough, we should change the major version number to 4.0. Please comment on this.

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
ddlm-group mailing list

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.