Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: magCIF - policy advice requested

Dear John,

  By normalization, I mean normalization, which really is a matter of
reducing the grain of
necessary locking for updates, so you can maintain referential
integrity without having to
lock the entire database for every transaction.  I lived through the
2-decade long battle
between relational databases and hierarchical, network and other
databases, and the
issue that settled that battle was that some reasonable degree of
normalization in a
context of relations allowed for the creation and maintenance of
highly efficient
multi-reader-multi-writer DBMS's that simply were nor workable with
other approaches.
Pure object-oriented databases then failed because they tried to
ignore this issue, but
we now have successful object-relational databases which do play
exactly the extended
locking scheme games I was discussing.

  When you say "If an update of one non-key item requires an update of
another non-key
item to maintain consistency then that is a clear sign of lack of
normalization." you are,
whether you realize it not, saying the same thing I am saying about locking.

  Normalization is _not_ an end in itself.  It is just a tool to get a
database to work well.
Most large production databases are somewhat denormalized to find a
sweet-spot for
performance, or actually consist of two or more coupled databases at
different degrees of normalization for different needs, such as a
highly normalized multi-writer backend database
for efficient updates from mutliple sources coupled to a highly
denormalized frontend for
efficient multi-reader access.

  DDLm is nice, but it is certainly not novel.  It shows lessons
well-learned from the development
of object-relational databases, with a nice balance between support
for relations and support for
methods.  We should use it.

  When you are saying "Clearly, these are much broader questions than
the one that started the conversation.  Nevertheless, if we don't have
and cannot find answers to some of them, then I don't think we can
confidently recommend a DDL for magCIF, either, at least not jointly
and on sound technical grounds.," you seem to be saying that there is
some fundamental problem with recommending use of DDLm to anybody for
anything.  If you see a fundamental problem with
DDLm, please state it.  If not, then let's get on with the business of
both using it ourselves and
recommending it to others, so if some problem is buried in there, we
can unearth that problem
and fix it.

  Collegially,
    Herbert

On Mon, Jun 2, 2014 at 6:10 PM, Bollinger, John C
<John.Bollinger@stjude.org> wrote:
> Dear Herbert,
>
> Is it possible that when you say "normalization" you really mean "the relational model" or something to that effect?  Because that's the only way I've found to make some of your comments make sense to me.
>
> Normalization certainly bears on the question of locking for updates, but it is fundamentally *about* minimizing redundancy.  Redundancies present opportunities for inconsistency, such as the one I described, to arise via various mechanisms.  If an update of one non-key item requires an update of another non-key item to maintain consistency then that is a clear sign of lack of normalization.
>
> In fact, a CIF category that provides an item for a vector and also separate items for elements of that vector fails to comply even with first normal form (because the presence of the per-element items makes it impossible to construe the vector item's values as atomic).  More generally, high normalization certainly *is* in conflict with algorithmically related values in many cases, as functional dependencies other than on superkeys directly violate Boyce-Codd and higher normal forms.  Of course, we have long since passed that threshold in the core and mmCIF dictionaries, and perhaps in others, else we would have no use for CIF consistency checkers, nor indeed for the 'm' in "DDLm".
>
> With that being the case, it is eminently reasonable to consider what, specifically, are needful and useful normalization considerations for dictionary design, especially in DDLm dictionaries.  In particular, it is by no means obvious that defining methods both to obtain vector components from a vector and also to obtain the same vector from its components is a good plan, nor any similar arrangement lacking authoritative central data items.  DDLm is a somewhat novel beast, in that in addition to supporting a form of the traditional relational data model, it also provides via methods for a separate, semi-independent, set of data relationships.  How the latter would best be structured is an open question.
>
> Clearly, these are much broader questions than the one that started the conversation.  Nevertheless, if we don't have and cannot find answers to some of them, then I don't think we can confidently recommend a DDL for magCIF, either, at least not jointly and on sound technical grounds.
>
> I suggest that as a rule of thumb, DDLm dictionaries should not define circular sets of methods, meaning sets methods whereby, through method composition, any item can be non-trivially defined partially or wholly in terms of itself.  Following that rule naturally leads to some data representations that are preferable to others, which I think is perfectly fine.
>
>
> John
>
>
> Email Disclaimer:  www.stjude.org/emaildisclaimer
> Consultation Disclaimer:  www.stjude.org/consultationdisclaimer
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://mailman.iucr.org/mailman/listinfo/comcifs
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://mailman.iucr.org/mailman/listinfo/comcifs

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.