Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Proposal to enhance the behaviour of a DDLm "Set"category: please consider

Dear all,

Brief comments below:

On 2 June 2016 at 13:03, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
Dear Colleagues,

  In the 1970's we had a serious conflict in the database world between people who accepted Codd's relational database model, and a very large number of supposedly more powerful and flexible alternatives.  In the end it turned out that Codd was right and the best way to represent databases of information in a way that would allow for simultaneous readers and writers with reasonable efficiency and reliability was to keep _everything_ in relational tables in which particular rows are identified by the unique combination of key values in each row.

  Ultimately, if we are going to get the most use from the information we are gathering in CIFs it would be helpful in the CIFs could easily be mapped into relational tables.  That works well will DDL2.  It would be nice if the characteristics if DDL2 that permit such easy use there could be adhered to in the DDLm core dictionary. 

Ease of mapping to a RDB is not the issue - the 'Set' categories are really just one-row tables, don't be fooled by the 'Set' nomenclature. Exactly the same issue exists in mmCIF DDL2 due to the use of entry.id to restrict various categories to have single values. The real issue is a live one even for a straight relational database, i.e. if you add a key column to a table, how do you tell all the applications using that table to pay attention to the value of the new key? Note also that we are not actually talking about a single centralised database with perhaps a few front-end applications talking to it that we can adjust at the same time as we change the database schema - we are talking about a large ecosystem of distributed software that must be updated.

  I don't see that any harm will arise from allowing appropriate keys to be defined for all categories and allowing looping of any categories for which keys have been defined.  Even if a key is added to a category for which there are existing datasets for without that key, having DDLm and dREL we can easily provide a default value to be used.  If the catgeory has not been looped in a particular dataset any default value will do.  If it has been looped the category must already have a key with unique values for each row.  Yes, a set is different from a relational table, but it cen be effectively represented the same way as a relational table.  The distinction is in the semantics in the dictionary, not in the data file.  We invalidate nothing by allowing some CIFs with unlooped versions of the same category that is looped in other CIFs.  Failing to declare as an error something that has a clear an unambiguous meaning would not be a loss to anybody.

No harm arises for legacy software dealing with legacy files, or future software dealing with any file. There is potential harm for legacy CIF-reading software dealing with new-style files.  This is something we have to face and find a solution for.
  James' suggestion does not introduce complexity.  It removes some.

I'd be happy if complexity just stayed the same.  Unfortunately my original proposal does introduce further equivalent datanames, which I think must be avoided - the community is annoyed enough by having to check for the 'dotted' datanames for no good reason, without having to also include a variety of other datanames that mean almost the same thing.  Our best hope is therefore likely to be something very similar to John B's proposal, which is really a description of what you do in imgCIF.  As imgCIF warned about this from the start, and doesn't have any mmCIF-style entry.id datanames, current imgCIF software is not affected.

all the best,

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.