Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Proposal to enhance the behaviour of a DDLm "Set"category: please consider

Dear Colleagues,

  James has said:  "While this would work prospectively, it still doesn't fix the problem of old software that doesn't know about the new child keys. Adding new keys to a category unequivocally changes the meaning of all of the non-key datanames in that category, and old software will operate using the old meanings.  If this is a point of disagreement, I will write a blog post about it as I will need to use some pictures to explain why I think this."

  One major difference between imgCIF and mmCIF is the reliance in making use of "implicit" item.mandatory_code for the values of otherwise missing keys, so we are able to deal with the issue of categories with different key structures from mmCIF without having to define different categories.  In the intervening years since the creation of imgCIF, "implicit" seems to have dropped out of sight for other dictionaries, but it is heavily used in imgCIF because we are always dealing with information that will eventually have to end up in a database (the PDB), but at a stage for which the values to use for some keys are not yet known, or which may have to change.

  So in imgCIF we add keys to categories and, in addition, we fail to give explicit values to some mmCIF keys, and, as far as I am aware, this has not caused issues for existing software, nor does it seem to have changed the meaning of the non-key datanames yet, so I very much would appreciate James' offered blog post.  It would be very helpful in a dREL conversion of imgCIF.


On Mon, Jun 6, 2016 at 12:58 AM, James Hester <jamesrhester@gmail.com> wrote:
Dear all,

Brief comments below:

On 2 June 2016 at 13:03, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
Dear Colleagues,

  In the 1970's we had a serious conflict in the database world between people who accepted Codd's relational database model, and a very large number of supposedly more powerful and flexible alternatives.  In the end it turned out that Codd was right and the best way to represent databases of information in a way that would allow for simultaneous readers and writers with reasonable efficiency and reliability was to keep _everything_ in relational tables in which particular rows are identified by the unique combination of key values in each row.

  Ultimately, if we are going to get the most use from the information we are gathering in CIFs it would be helpful in the CIFs could easily be mapped into relational tables.  That works well will DDL2.  It would be nice if the characteristics if DDL2 that permit such easy use there could be adhered to in the DDLm core dictionary. 

Ease of mapping to a RDB is not the issue - the 'Set' categories are really just one-row tables, don't be fooled by the 'Set' nomenclature. Exactly the same issue exists in mmCIF DDL2 due to the use of entry.id to restrict various categories to have single values. The real issue is a live one even for a straight relational database, i.e. if you add a key column to a table, how do you tell all the applications using that table to pay attention to the value of the new key? Note also that we are not actually talking about a single centralised database with perhaps a few front-end applications talking to it that we can adjust at the same time as we change the database schema - we are talking about a large ecosystem of distributed software that must be updated.

  I don't see that any harm will arise from allowing appropriate keys to be defined for all categories and allowing looping of any categories for which keys have been defined.  Even if a key is added to a category for which there are existing datasets for without that key, having DDLm and dREL we can easily provide a default value to be used.  If the catgeory has not been looped in a particular dataset any default value will do.  If it has been looped the category must already have a key with unique values for each row.  Yes, a set is different from a relational table, but it cen be effectively represented the same way as a relational table.  The distinction is in the semantics in the dictionary, not in the data file.  We invalidate nothing by allowing some CIFs with unlooped versions of the same category that is looped in other CIFs.  Failing to declare as an error something that has a clear an unambiguous meaning would not be a loss to anybody.

No harm arises for legacy software dealing with legacy files, or future software dealing with any file. There is potential harm for legacy CIF-reading software dealing with new-style files.  This is something we have to face and find a solution for.
  James' suggestion does not introduce complexity.  It removes some.

I'd be happy if complexity just stayed the same.  Unfortunately my original proposal does introduce further equivalent datanames, which I think must be avoided - the community is annoyed enough by having to check for the 'dotted' datanames for no good reason, without having to also include a variety of other datanames that mean almost the same thing.  Our best hope is therefore likely to be something very similar to John B's proposal, which is really a description of what you do in imgCIF.  As imgCIF warned about this from the start, and doesn't have any mmCIF-style entry.id datanames, current imgCIF software is not affected.

all the best,

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

ddlm-group mailing list

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.