Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .

Dear David,

   The datablock of the dictionary contains the save frames, so we can put 
a master alias_definition_set loop up in the data block to cover the
entire dictionary, or we can break it up, save frame by save frame.

   If I were going to denormailize, again, I could do it either by 
promoting the alias items from the save frames up to the data block,
or by breaking up a master alias_definition_set loop into the individual 
save-frame-by-save-frame  alias_definition_set loops and doing the
joins there.

   There is no "point" in denormalizing for presentaion purposes.  The
normalized and denormailzed presentations carry the same information.
In terms of managing the database, it is best never to denormalize
anything, but in terms of efficient searches it often is desirable
to partially or full denormalize, sometimes as an invisible internal
optimization, sometimes visibly.  I already use denormalized dictionary
tables in CBFlib, constructed on the fly from multiple save frames,
but that does not mean I need to have those denormalized tables as
a formal part of the dictionary.

   This really is just a matter of taste.  John B. is wrong when he
tries to settle it as a technical issue.  If you, in working with
the core want the alias information in denormailized form, that is
fine.  If you, in working with the core are more comfortable with
the alias information normalized, that is fine.  We don't need
a uniform answer for all dictionaries.  It is easy to go back
and forth and to combine information from both forms.

   We already have multiple flavors of dictionaries because
we are all different people and we have different work to do.
The important issue is not that the dictionary styles be the
same but that they contain the necessary information in
ways that allow them to be combined in a consistent,
interoperable manner.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Tue, 1 Feb 2011, David Brown wrote:

> Thanks Herbert for your comments.  However, you have not addressed my two main concerns:
> 
> 1. In a CIF dictionary written under DDLm, is there
> a) a single alias_definition_set loop in its own save frame that contain all the aliased
> datanames, i.e., the properties of the individual definition_sets are not attrubutes of a
> dataname,
> or b) is there an alias_definition_set loop in each data-item save frame, i.e., the
> alias_defintion_set properties are explicit data-item attributes?
> 
> If the answer is a) how does one denormalize since there is a single alias_definition_set
> loop, but very many parent alias loops in the dictionary.  Does the information in the
> alias_definition_set loop get distributed as attributes of the different data items? 
> However, as your draft now reads, there would be little point in denormalizing because the
> alias loop contains exactly the same information as the alias_definition_set loop, so
> denormalizing would add no new information.  Collecting all the aliases in one place might
> make programming a little easier, but should we be designing dictionaries in which the same
> information appears in two different places, just organized in a different way?  Can't the
> computer do this for us at run time?  What happens if the two loops disagree with each
> other?
> 
> If the answer to question 1 is b), the alias and alias_definition_set loops contain
> identical information (or at least it should be identical).
> 
> 2. Do you envisage that:
> a) we will have a number of flavours of mmCIF dictionaries depending on how the
> definition_sets are defined?  That is, do we have standard_mmCIF (no defnition_sets),
> mmCIF_a_le_Bernstein, mmCIF_a_le_Westbrook etc.?
> 
> or b) the information about the definition_sets will be held in a subdictionary that will
> be imported at run time to create a virtual dictionary?  This would avoid the problems of
> different flavours, but it is not clear to me that the import features as currently defined
> in DDLm would be able to support this.  How do you see such a merge taking place?
> 
> I apologize if these are trivial questions, but at some point you will have to instruct the
> likes of me how the system works, so you might as well do it now :)
> 
> David
> 
> 
> 
> 
> Herbert J. Bernstein wrote:
>       OK, now to the specifics of this item.  I will interpolate my remarks
>       into David's message with "***" flags. -- Herbert
>
>       =====================================================
>        Herbert J. Bernstein, Professor of Computer Science
>          Dowling College, Kramer Science Center, KSC 121
>               Idle Hour Blvd, Oakdale, NY, 11769
>
>                        +1-631-244-3035
>                        yaya@dowling.edu
>       =====================================================
>
>       On Mon, 31 Jan 2011, David Brown wrote:
>
>             Not having done any programming for many years and not being
>             familiar with
>             the current jargon, I find keeping up with the rapid-fire
>             discussion on
>             _aliases requires some effort on my part and the discussion has
>             usually
>             move on before I havc a chance to comment.
>
>             However, being incommunicado on weekends, I took Herbert's draft
>             home where
>             I could sit down without distraction and see what eactly what was
>             proposed.  The errors and ambiguities in the draft did not help me
>             get my
>             heasd around the proposal, but I made some progress, and here are
>             my
>             comments.  First the general comments, then particular comments
>             interleaved
>             in the draft.
>
>             1. _identifier_set:  What does this set identify?  The _id seems to
>             flag
>             arbitrary groups aliases.  'alias_set' would be a better name for
>             this
>             category, or even just 'set'.  What's in a name?  It sets certain
>             switches
>             in the brain, and if these are misset by the dataname, it may take
>             a lot of
>             work to get them reset.  Not conducive to instant communication. 
>
>       *** Yes, for maximal generality, a set is an arbitrary grouping.  We
>       have been through many alternative names for these groupings, and so far
>       "set" is the only one to which there has not (yet) been strong objection
>       on grounds of confusion with other, existing aggregational terminology.
>       The only (weak) point in favor of "_identifier_set" rather than
>       "_alias_set" is that the primary save frame name/tag/identifier
>       can appear in the list.  I would be just as happy with "set", but
>       then I assume sombody will have been deprived of some future use
>       of some other set aggregation scheme, so to avoid appearing to
>       hog a very useful name, it probably should be qualitied in some
>       way.  Does "tag_set" appeal to people?
>
>             2. Why do we need both _alias and _alias_identifier_set
>             categories?  They
>             have indentical information (if the datanames (Syd's word) or tags
>             (Herbert's word) are any indication).  I suppose (though this is no
>             where
>             spelled out) that _identifier_set would have its own save frame in
>             the
>             dictionary and would not be an attfibute of a datanmae.  If this is
>             a
>             correct interpretation it would provide a place whare all the alias
>             datanames in the dictionary could be listed within a single loop). 
>             This
>             seems redundant, but I cannot speak from programming experience. 
>             If
>             alias_identifier_set does not appesr in its own save_ frame, how
>             does it
>             differ from 'alias'?
> 
>
>       *** We do not need both an _alias and and _alias_identifier_set category,
>       and more than we need both an _atom_site and an _atom_site_aniso_U
>       category (well perhaps a both more).  It is just an organizational
>       convience to all both the flatter, very denormalized DDL1 style
>       of presentation as well as the more normalize DDL2 style of
>       presentation.  It does have the side advantage of allowing
>       one to pull our a separate list of tags by set.
> 
>
>             3. If my supposition in 2. is correct, we would appear to have a
>             problem. 
>             We will now have a variety of flavours of CIF dictionaries, each
>             expressing
>             a particular programmer's preferences for grouping the aliases. 
>             This will
>             make no difference to the CIFs themselves as these groupings are
>             irrelevant
>             once a CIF has been written, but if, for example, I am given a
>             program
>             written by Herbert and different program written by either of the
>             Johns, I
>             might need a different mmCIF dictionary for each of these two
>             programs,
>             dictionaries that differ only in the way the aliases are gouped. 
>             When I
>             load my CIF it cannot give instructions on which dictionary to call
>             up
>             because is will have no knowledge of the idiosyncrasies of the
>             program I
>             have chosen to use.  A possible solution would be to use the
>             _import
>             feature to create a virtual dictioanry at run time.  Thus the
>             _identifier_set information would be held in a local dictionary
>             that would
>             be imported into the authorised CIF dictioanry at run time. 
>             However, there
>             are limitations on what can be imported.  The _identifier_set
>             category
>             could be imported but it would be impossible to import the
>             _identifier_set_id into the alias loop by this mechanism.  Since I
>             am not
>             sure that I understand how Herbert intends to use this feature, I
>             do not
>             feel competent to suggest a way in which this problem could be
>             handled. 
>             Having programs that used different dialects of CIF dictionaries
>             does not
>             seem to be in line with the traditional development of CIF,
>             particularly
>             for a feature that is unlikely to be much used, even if they do not
>             affect
>             the CIFs themselves.  We should think carefully about the
>             implications of
>             this move.
>              
> 
>
>       *** The main reason I am trying to bring this to COMCIFS is that we
>       very much seem headed in the direction of multiple, conflicting
>       interpretations of our standards, and dictionaries that will
>       not fit together.  Without this, we seem headed towards having,
>       for example, at least three distinct and possibly conflicting
>       core dictionaries in simultaneous use (the current core, the one
>       in mmCIF and the new DDLm core), and at least three and possibly
>       four mmCUF dictionaries (the official IUCr DDL2 mmCIF, the PDB's
>       DDL2 pdbx mmCIF, and 2 more DDLm versions of each of those).
>
>             4. My detailed comments follow the feature they comment on below:
> 
> 
>
>             save_definition.xref_code
>                    _definition.id             '_definition.xref_code'
>                    _definition.update           2011-01-26
>                    _definition.class            Attribute
>                    _description.text
>             ;
>                     Code identifying the equivalent definition in the
>             dictionary
>                     referenced by the DICTIONARY_XREF attributes.
>
>                     Use of _definition.xref_code is deprecated in favor of
>                     use of _alias.xref_code
>             ;
>                    _name.category_id            definition
>                    _name.object_id              xref_code
>                    _type.purpose                Identify
>                    _type.container              Single
>                    _type.contents               Code
>                     save_
> 
>
>             This item is deprecated.  It should be deleted.   The  alias and
>             xref
>             iitems have never been tested and it is clear from the current DDLm
>             that
>             thay are placeholders that are awaiting development.  If there are
>             any
>             programs that make use of the items they can only have been written
>             by
>             members of this group.  The current xref defintiions inadequate and
>             unworkable. .The is no excuse for leaving this item in if we don't
>             need it.
> 
>
>       *** See my general remarks on communicating with the unknown users
>       of what has been posted since 2007.
> 
>
>             save_ALIAS
>
>                    _definition.id               alias
>                    _definition.scope            Category
>                    _definition.class            List
>                    _definition.update           2011-01-26
>                    _description.text
>             ;
>                     The attributes used to specify the aliased names of
>             definitions.
>                     Every tag has an implicit alias to itself with a null
>                     _alias.xref_code to allow use of the primary tag in
>                     the ALIAS_IDENTIFIER_SET category.
>
>                     The use of _alias.identifier_set_id in the key of
>                     this catgeory is provide a placeholder for the
>                     to conform the key of the parent ALIAS category
>                     to the key of the child ALIAS_IDENTIFIER_SET
>                     for automatic joins.  It is not intended that
>                     _alias.identifier_set_id should be used in the
>                     ALIAS category when no join is being done.
> 
>
>             This last paragraph would be easier to undestand if all the words
>             were
>             present and the sentences grammatical..  In any case it should
>             appear under
>             _alias,identifierr_set, not here.  If I am right in thinking alias
>             is an
>             attribute and alias_identifier_set is not,  how does one join a
>             non-attribute to an attribute?
> 
> 
>
>       *** This stray item has to do with being able to have both the
>       denormalized flat presentation and the normalized presentation
>       via a join.  If we stay with just the normalized presentation,
>       it is not necessary.  I apologize for my typos.  I should have
>       said "is to provide" instead of "is provide".  Discussing the
>       key of a category in the category definition seems appropriate
>       because the key is defined here (see just below).
> 
> 
>
>             ;
>                    _category.parent_id          ddl_attr
>                    _category_key.primitive      ['_alias.definition_id',
>                                                  '_alias.xref_code',
>                                                  '_alias.identifier_set_id']
>                     save_
> 
>
>             save_alias.definition_id
>                    _definition.id             '_alias.definition_id'
>                    _definition.class            Attribute
>                    _definition.update           2006-11-16
>                    _description.text
>             ;
>                     Identifier tag of an aliased definition.
>             ;
>                    _name.category_id            alias
>                    _name.object_id              definition_id
>                    _type.purpose                Key
>                    _type.container              Single
>                    _type.contents               Tag
>                     save_
>
>             save_alias.deprecated
>                    _definition.id             '_alias.deprecated'
>                    _definition.class            Attribute
>                    _definition.update           2006-11-16
>                    _description.text
>             ;
>                     Specifies whether use of the alias is deprecated
>             ;
>                    _name.category_id            alias
>                    _name.object_id              definition_id
> 
>
>             .object_id should be the second part of the _definition.id, i.e.,
>             'deprecated'.  This needs correcting in many places.
> 
>
>       *** thanks for spotting that
>
>                    _type.purpose                STATE
>                    _type.container              Single
>                    _type.contents               YesorNo
>                    _enumeration.default         No
>                     save_
> 
>
>             save_alias.dictionary_uri
>                    _definition.id             '_alias.dictionary_uri'
>                    _definition.update           2011-01-26
>                    _definition.class            Attribute
>                    _description.text
>             ;
>                     Dictionary URI in which the aliased definition belongs.
>                     _alias.dictionary_uri is deprecated in favor if
>                     _alias.xref_code
>             ;
>                    _name.category_id            alias
>                    _name.object_id              dictionary_uri
>                    _type.purpose                Identify
>                    _type.container              Single
>                    _type.contents               Uri
>                     save_
> 
>
>             This item should be moved to the _dictionary_xref category.  The
>             xref.id 
>             is sufficient link.  Again, the fact that it may appear in the
>             draft
>             dictionaries should not prevent it being deleted in DDLm since the
>             draft
>             dictionaries are just drafts and will be chaniged once we have
>             sorted out
>             how to do the aliases.
> 
>
>       *** I donlt disagree on the objective.  I just disagree on whether
>       the ghost of tags past needs to remain here to haunt its old abode.
> 
> 
>
>             save_alias.identifier_set_id
>                    _definition.id   '_alias.identifier_set.id'
>                    _definition.class  Attribute
>                    _definition.update 2011-01-26
>                    _description.text
>             ;
>                    A code identifying an identifier_set of related tags.
>                    This linked item is provided in the ALIAS category to
>                    ensure that the key of the ALIAS category is
>                    conformed to the key of the ALIAS_IDENTIFIER_SET
>                    category.  The alias has not been joined with
>                    ALIAS_IDENTIFIER_SET, _alias.identifier_set_id
>                    it is not intended that  _alias.identifier_set_id
>                    in the ALIAS category.
> 
>
>             I cannot make much sense of the last sentence.  Perhaps there are
>             missing
>             words?
> 
>
>       *** Thanks.  I'll try to clean up the words.
> 
> 
>
>                    This is a pointer to _identifier_set.id
>             ;
>                    _name.category_id alias
>                    _name.object_id   code
> 
>
>             See above.  'code' is not the proper .object_id
>
>       *** thanks.  you are right
> 
>
>                    _name.linked_item_id         '_identifier_set.id'
>                    _type.purpose     Key
>                    _type.container   Single
>                    _type.contents    Code
>                    _enumeration.default  .
>                   save_
> 
>
>             save_alias.xref_code
>                    _definition.id             '_alias.xref_code'
>                    _definition.update           2011-01-26
>                    _definition.class            Attribute
>                    _description.text
>             ;
>                     Code identifying the dictionary containing the primary
>                     definition of the dictionary as given in the
>                     DICTIONARY_XREF category.
>
>             ;
>                    _name.category_id            definition
>                    _name.object_id              xref_code
>                    _name.linked_item_id         '_dictionary_xref.code'
>                    _type.purpose                Key
>                    _type.container              Single
>                    _type.contents               Code
>                     save_
> 
>
>             save_IDENTIFIER_SET
>
>                  _definition.id      identifier_set
>                  _definition.scope   Category
>                  _definition.class   List
>                  _definition.update  2011-01-27
>             ;
>                   Data items used to describe the identifier_set identifiers
>                   used in this dictionary.  Data items in this category
>                   are NOT used directly as attributes of individual data items.
>                   See linked item _alias_identifier_set.identifier_set_id
>                   for such uses.
> 
>
>             ;
>                   _category.parent_id ddl_attr
>                   _category_key.generic  '_identifier_set.id'
>
>                   save_
>
>             save_identifier_set.id
>                    _definition.id   '_identifier_set.id'
>                    _definition.class  Attribute
>                    _definition.update 2011-01-27
>                    _description.text
>             ;
>                    A code identifying an identifier_set of related tags.
>                    The coverage of an identfier_set may conform precisely
>                    to the set of tags in a particular dictionary,
>                    or to tags drawn from multiple dictionaries or
>                    to a subset of tags from a single dictionary.
>
>                    The same tag may belong to multiple identifier
>                    sets, and a given tag may not belong to any
>                    identifier set, in which case the only associated
>                    identifier set is a null value.
> 
>
>             Presumably the second line should read 'need not belong to any'. 
>             The
>             wording above is ambiguous.
>             The last line should read 'identifier set id' and its default value
>             should
>             be given explicitly.  What is nul?
>             Possible nul values are 'nul', '0', ' ', '?', '.', etc.
> 
>
>       *** I was thinking of "." and "?"
> 
>
>                    To help ensure that dictionaries can be merged,
>                    each code should either begin with an IUCr-registered
>                    prefix or, if not prefixed, have been approved
>                    by COMCIFS.  The special prefix 'local_' may be
>                    use for purely internal purposes of an organization.
> 
>
>             I assume these are not datanames that appear in the dictionaries
>             but a list
>             of COMCIFS  enumerations, some of which might appear in a
>             non-exclusive
>             enumeration list.  What happens if someone chooses 'joeblow' as an
>             id?
> 
>
>       If COMCIFS approved joeblow, then so be it, but the idea is that
>       COMCIFS would approved and control the use of set identifiers such
>       as say DDL1, DDL2, DDLm, core, and mmCIF, but that the PDB, which
>       has registered the pdbx_ prefix, would control the use of set
>       identifiers such as pdbx_mmCIF or pdbx_EM, and that somebody who
>       is doing something purely local, migh just use local_joeblow without
>       consulting anybody.
>
>             ;
>                    _name.category_id identifier_set
>                    _name.object_id   code
>                    _type.purpose     Key
>                    _type.container   Single
>                    _type.contents    Code
>
>                   save_
>
>             save_identifier_set.description
>                    _definition.id   '_identifier_set.description'
>                    _definition.class  Attribute
>                    _definition.update 2011-01-27
>                    _description.text
>             ;
>                    A description of the identifier_set
>             ;
>                    _name.category_id identifier_set
>                    _name.object_id   code
>                    _type.purpose     Describe
>                    _type.container   Single
>                    _type.contents    Text
> 
>
>                   save_
>
>             save_ALIAS_IDENTIFIER_SET
>
>                   _definition.id      alias_identifier_set
>                   _definition.scope   Category
>                   _definition.class   List
>                   _definition.update  2011-01-27
>             ;
>                    The attributes used to specify the identifier_set of
>                    tags to which a given tag belong.
>
>                    A given tag may belong to multiple identifier_sets
>                    and may be cited against multiple dictionaries.
>
>                    Note that _alias_identifier_set.identifier_set_id is a
>                    component of the key of ALIAS_IDENTIFIER_SET.  If the
>                    denormalized join presentation is used to bring the object
>                    ids of this child category up into the parent
>                    ALIAS category, then _alias.identifier_set_id will
>                    we used as an implicit addition to the key of
>                    the denormalized ALIAS category.
>
>                    Until DDLm can be formally revised to automatically
>                    handle the necessary promotion of child catgeory keys
>                    in denormalized joins, a place-holder
>                    _alias.identifier_set_id has been defined in the
>                    ALIAS catgeory.
>
>             ;
>                   _category.parent_id  alias
>                   _category.parent_join  Yes
>                   _category_key.primitive 
>             ['_alias_identifier_set.identifier_set_id',
>                                            
>             '_alias_identifier_set.definition_id',
>                                             '_alias_identifier_set.xref_code']
>                    save_
>
>             save_alias_identifier_set.definition_id
>                    _definition.id   '_alias_identifier_set.definition_id'
>                    _definition.class  Attribute
>                    _definition.update 2011-01-27
>                    _description.text
>             ;
>                    Together with _alias_identifier_set.xref_code, identifies
>                    an alias belonging to an identifier_set.  An alias may
>                    belong to any number of identifier_sets, including zero.
>
>             ;
>                    _name.category_id alias_identifier_set
>                    _name.object_id   definition_id
>                    _name.linked_item_id  '_alias.definition_id'
>                    _type.purpose     Key
>                    _type.container   Single
>                    _type.contents    Tag
>                     save_
>
>             save_alias_identifier_set.identifier_set_id
>                    _definition.id   '_alias_identifier_set.identifier_set_id'
>                    _definition.class  Attribute
>                    _definition.update 2011-01-27
>                    _description.text
>             ;
>                    Identifies an identifier_set to which the alias
>                    identified by _alias_identifier_set.definition_id
>                    and _alias_identifier_set.xref_code ) belongs.
>
>                    A pointer to _identifier_set.id
>             ;
>                    _name.category_id alias_identifier_set
>                    _name.object_id   code
>                    _name.linked_item_id  '_identifier_set.id'
>                    _type.purpose     Key
>                    _type.container   Single
>                    _type.contents    Code
>
>                   save_
> 
>
>             save_alias_identifier_set.xref_code
>                    _definition.id   '_alias_identifier_set.xref_code'
>                    _definition.class  Attribute
>                    _definition.update 2011-01-21
>                    _description.text
>             ;
>                    A code identifying the actual dictionary,
>                    virtual dictionary or other logical grouping
>                    to which the identifier tag belongs.
> 
>
>             What is this identifier tag - '.definiton_id' or
>             '.definiton_set_id'?
>
>       *** They are just pointers, see the primary definitions at
>       the _name.linked_item_id linked tags.  But for clarity, I
>       will repeate them locally in the next pass.
>             ;
>                    _name.category_id alias_identifier_set
>                    _name.object_id   code
>                    _name.linked_item_id  '_dictionary_xref.code'
>                    _type.purpose     Key
>                    _type.container   Single
>                    _type.contents    Code
>                     save_
> 
> 
> 
>
>             We also need to refined the _dictionary_xref category.  '.uri;
>             should be
>             added, '.format' should be better derfined or deleted.  Perhaps
>             '.version'
>             should also be added.  Definining the dictionaries is just as
>             important as
>             definiting the definition_sets
> 
>
>       *** I'll try to propose something on the next pass, probably on Wednesday
>       during the next storm.
> 
> 
>
>             David
> 
>
>      _________________________________________________________________________________
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> 
> 
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.