Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: magCIF - policy advice requested

Dear Herbert,

Is it possible that when you say "normalization" you really mean "the relational model" or something to that effect?  Because that's the only way I've found to make some of your comments make sense to me.

Normalization certainly bears on the question of locking for updates, but it is fundamentally *about* minimizing redundancy.  Redundancies present opportunities for inconsistency, such as the one I described, to arise via various mechanisms.  If an update of one non-key item requires an update of another non-key item to maintain consistency then that is a clear sign of lack of normalization.

In fact, a CIF category that provides an item for a vector and also separate items for elements of that vector fails to comply even with first normal form (because the presence of the per-element items makes it impossible to construe the vector item's values as atomic).  More generally, high normalization certainly *is* in conflict with algorithmically related values in many cases, as functional dependencies other than on superkeys directly violate Boyce-Codd and higher normal forms.  Of course, we have long since passed that threshold in the core and mmCIF dictionaries, and perhaps in others, else we would have no use for CIF consistency checkers, nor indeed for the 'm' in "DDLm".

With that being the case, it is eminently reasonable to consider what, specifically, are needful and useful normalization considerations for dictionary design, especially in DDLm dictionaries.  In particular, it is by no means obvious that defining methods both to obtain vector components from a vector and also to obtain the same vector from its components is a good plan, nor any similar arrangement lacking authoritative central data items.  DDLm is a somewhat novel beast, in that in addition to supporting a form of the traditional relational data model, it also provides via methods for a separate, semi-independent, set of data relationships.  How the latter would best be structured is an open question.

Clearly, these are much broader questions than the one that started the conversation.  Nevertheless, if we don't have and cannot find answers to some of them, then I don't think we can confidently recommend a DDL for magCIF, either, at least not jointly and on sound technical grounds.

I suggest that as a rule of thumb, DDLm dictionaries should not define circular sets of methods, meaning sets methods whereby, through method composition, any item can be non-trivially defined partially or wholly in terms of itself.  Following that rule naturally leads to some data representations that are preferable to others, which I think is perfectly fine.


Email Disclaimer:  www.stjude.org/emaildisclaimer
Consultation Disclaimer:  www.stjude.org/consultationdisclaimer
comcifs mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.