[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: magCIF - policy advice requested

Dear Herbert,

Is it possible that when you say "normalization" you really mean "the relational model" or something to that effect?  Because that's the only way I've found to make some of your comments make sense to me.

Normalization certainly bears on the question of locking for updates, but it is fundamentally *about* minimizing redundancy.  Redundancies present opportunities for inconsistency, such as the one I described, to arise via various mechanisms.  If an update of one non-key item requires an update of another non-key item to maintain consistency then that is a clear sign of lack of normalization.

In fact, a CIF category that provides an item for a vector and also separate items for elements of that vector fails to comply even with first normal form (because the presence of the per-element items makes it impossible to construe the vector item's values as atomic).  More generally, high normalization certainly *is* in conflict with algorithmically related values in many cases, as functional dependencies other than on superkeys directly violate Boyce-Codd and higher normal forms.  Of course, we have long since passed that threshold in the core and mmCIF dictionaries, and perhaps in others, else we would have no use for CIF consistency checkers, nor indeed for the 'm' in "DDLm".

With that being the case, it is eminently reasonable to consider what, specifically, are needful and useful normalization considerations for dictionary design, especially in DDLm dictionaries.  In particular, it is by no means obvious that defining methods both to obtain vector components from a vector and also to obtain the same vector from its components is a good plan, nor any similar arrangement lacking authoritative central data items.  DDLm is a somewhat novel beast, in that in addition to supporting a form of the traditional relational data model, it also provides via methods for a separate, semi-independent, set of data relationships.  How the latter would best be structured is an open question.

Clearly, these are much broader questions than the one that started the conversation.  Nevertheless, if we don't have and cannot find answers to some of them, then I don't think we can confidently recommend a DDL for magCIF, either, at least not jointly and on sound technical grounds.

I suggest that as a rule of thumb, DDLm dictionaries should not define circular sets of methods, meaning sets methods whereby, through method composition, any item can be non-trivially defined partially or wholly in terms of itself.  Following that rule naturally leads to some data representations that are preferable to others, which I think is perfectly fine.


John


Email Disclaimer:  www.stjude.org/emaildisclaimer
Consultation Disclaimer:  www.stjude.org/consultationdisclaimer
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://mailman.iucr.org/mailman/listinfo/comcifs

Reply to: [list | sender only]