Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Further discussion of proposal #2

  • To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
  • Subject: Re: [ddlm-group] Further discussion of proposal #2
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Fri, 17 Jun 2016 12:22:40 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:from:date:message-id:subject:to;bh=v1BTCzOUdspk3sqUQXMOySK692kTC9enOys7tD70Srk=;b=X3iJT9YxZZQ6y7bIAgd5+ptNwWCMUhXKrB4ptNrmjLODymWaACFWSt7B2imq0GwhJiMy7L+U2DjZlLZeQwFnsP0Z5TYRXKGuEHYPGUGS90HDElwMgmCxIfArh9kUjJCDZ6q/hIjOWCpA1FchtJAu7Hk7tQaWZ3k9EXUArRD0QVNWSyN2FV7usS+OMI3KHzkm0lwZG+jg1NU0rAmJtSMhIoD0Z5cRJvoXQfTqCbMyNXSyvCrPXYiuWR728aPXVNT2PBbrtbtZPeV1gMiylKtwJ+KmWbJvy9z5ygzeL/xVgXhabUukfz6wq/ZXRROWKkeYCui1jQ+c7twUvf1ZUTs5tg==
Dear John and others,

Note that I inadvertently removed the subject line from my previous message and have added a subject line back in for those perusing the archive in 10 years' time.

On 17 June 2016 at 02:19, Bollinger, John C <John.Bollinger@stjude.org> wrote:
Dear James and Colleagues,

I’m going to respond to several aspects of this discussion in separate messages to (I hope) make the individual threads easier to follow.  This first response is directed to these comments of James's:

> I don't actually think default values for keys are necessary until multiple-packet loops are set up. Also, I would need to see a clear formulation of how you would propose to convert Set to Loop in order to comment sensibly in light of all of the other constraints we are operating under. I do appreciate that I am favouring a particular application by giving 'Set' categories such significance. Apart from being bound to do this to keep compatibility with legacy applications, there are non-trivial efficiencies available by being able to make certain values 'global', and the DDL1, DDL2 entry.id, and 'Set' behaviour provides a neat way to do this.

With respect to efficiency, it is not clear to me how a Set with a category key would be any more efficient than a Loop, and it is also not clear whether any efficiency gain that did accrue would be sufficient to justify the required change to DDLm.  I would be interested in hearing a fuller explanation.

Sorry, the comment about efficiency simply meant that the concept of a 'Set' category reduces clutter in the dictionaries and file size in datafiles for *dependent* loops. This is because a 'Set' category's single-valued datanames act as global variables for the whole datablock, meaning that loop categories do not have to explicitly specify the dependence of their non-key dataname values on the 'Set' category, so there is no extra key dataname defined in the dependent category, and so these dependent loop categories consequently do not have to include a column in their loops giving the values of that extra key dataname.  For example, the atom_site loop does not have to specify which space group and unit cell the fractional coordinates are expressed relative to. If 'space_group' and 'cell_length' were looped,  two more key dataname columns must be provided in the atom_site loop in order to understand which space group / unit cell combination the coordinates are given with respect to.  The 'Set' designation for a category hides this dependence and thus saves space in the datafile.  Sorry if this is long-winded.  I believe that your default_key proposal is intended to have the same effect.

I'm not at all concerned about tweaking DDLm. The proposed update to DDLm is a clarification and an extension, because the semantic interpretation of existing files would be unchanged.  Is there any particular reason you are concerned about such measured changes to DDLm?  From my point of view DDLm is the lowest-impact area of the framework - very few people actually care *how* we express the meaning of a dataname, as long as that meaning doesn't change, and those that do care deeply about DDL in general (in my experience, databases) have not done any work on DDLm yet.


In any case, whether it is classified as a Set or a Loop, if a category has a category key, then surely it is necessary for every instance / packet / row of that category to have a value for that key.  If one does not want to oblige data files to provide an *explicit* value for the category key, then the only alternative is to permit them to rely on a default value.  If we are going to do that -- as we must do to avoid changes such as we are discussing from invalidating existing data files -- then I don't see what is gained by making a special rule allowing the key to be defaulted in certain cases, as opposed to simply defining default values for keys where that is warranted, so that existing dictionary semantics provide for data files that present only one packet to omit the key.  In particular, that does not interfere with requiring the key to be presented explicitly when multiple packets are presented, for if a file that presents multiple packets allows them to all take the key's default value then the resulting duplicate keys make the file invalid.

I'm not opposed to the concept of a default key value per se, I'm just unclear as to why you are arguing that this needs to be defined in a cif_core 'Set' category as opposed to an add-on dictionary. 
 

Let's consider the SPACE_GROUP category, since it sparked this whole discussion.  I append a cut at what I think we should do with it (only frames containing modifications are presented); I think I have marked all the changes and additions within via CIF comments.  I rarely wrangle dictionaries, so I apologize for any errors I have committed.  The key defaulting presented within formalizes how, when, and why SPACE_GROUP's category key and the associated child key in SPACE_GROUP_SYMOP can be omitted from data files.  To the best of my knowledge, nothing within relies on any DDLm changes.

I think I understand your proposal to be using the existence of a default key value to signal that the key may be omitted in a single-value loop, *and* that child key datanames in other loops that would otherwise contain them may be omitted in this case.   I'm not clear whether you propose that these changes should happen in cif_core, or in an add-on dictionary.  In any case, I agree that this can be made precisely semantically equivalent to the 'Set' proposal, due to the fact that a default key value makes no sense in general and so the meaning of a default value for a key may be overloaded as you have done, with no implications elsewhere.  This is still a change to DDLm, because the presence of  _enumeration_default in certain definitions now has new implications (not that I'm opposed in principle to changing DDLm).

My preference would still be for the 'Set' proposal, because the semantics are wrapped up in a single enumerated value, at category level, rather than arising from an interaction between attributes of a particular dataname inside that category.  I do not see any other distinguishing features.  I believe that for programmers, dictionary authors, and casual dictionary readers, the 'Set' proposal is more accessible, as the particular special behaviour of the category is flagged explicitly and concisely, in the category definition, and described in a single place in the DDLm attribute dictionary.  You will notice there is semantic convenience in referring to a category as a 'Set' category, rather than 'a category that has a default key value defined'. If you propose changing the cif_core dictionary rather than using an add-on dictionary, then the 'Set' proposal involves zero changes, whereas the default_value proposal involves a single extra key definition and adjustment to the definitions for each 'Set' category.  Both these objections are not particularly critical, of course.

Ultimately, this is going to be a matter of taste as the semantics can be made identical, and so I don't know quite what else you or I can say to convince each other on this point.  We may have to rely on our colleagues to decide.

Note, by the way, that I think the particular changes presented, or something very like them, are needed regardless of what we choose for the general case, because the DDL1 core and mmCIF are already structured this way.

I was perhaps too diplomatic or long-winded in previous messages. The incorporation of space_group into cif_core as a looped category was a mistake that we must *not* perpetuate. We either correct it by dropping it from DDLm cif_core, which is impossible due to widespread DDL1 usage (as a 'Set' category), or we fix the semantics.  So, in the case of space_group we can feel ourselves bound only by widespread current usage, not by the contradictory semantics of the DDL1 version.

Regarding your space-group example below, I may have missed something in your proposal: you have added a key to space_group_symop pointing to space_group. Why have you not done this for all other loop categories that rely on the value of space_group, for example, 'atom_site', 'refln' etc.?  Note also that my example #1 from yesterday's email was a published program that would fail when presented with a datafile conforming to the definitions below (doesn't check space group loopiness, does loop over symops to get atomic positions), i.e. these changes can only be made after a way of protecting existing software from them is established.

Best regards,

John

----

save_SPACE_GROUP

_definition.id                          SPACE_GROUP
_definition.scope                       Category
_definition.class                       Loop        # CHANGED
_definition.update                      2016-06-16  # CHANGED
_description.text
;
     The CATEGORY of data items used to specify space group
     information about the crystal used in the diffraction measurements.
;
_name.category_id                       EXPTL
_name.object_id                         SPACE_GROUP

####
# ADDED:

_category.key_id                        '_space_group.key'
loop_
  _category_key.name
         '_space_group.id'

# ... end of additions
####

save_

save__space_group.id

_definition.id                          '_space_group.id'
loop_
  _alias.definition_id
         '_space_group.id'
         '_space_group_id'
_definition.update                      2016-06-16  # CHANGED
_description.text
;
     Code identifying a space group if multiple symmetries.
     See _exptl_crystals.key.
;
_name.category_id                       space_group
_name.object_id                         id
_type.purpose                           Encode
_type.source                            Assigned
_type.container                         Single
_type.contents                          Code

# Take note of this (ADDED):
_enumeration.default                    ''

save_

####
# ADDED:

save__space_group.key

_definition.id                          '_space_group.key'
loop_
  _alias.definition_id
         '_space_group.key'
_definition.update                      2016-06-16
_description.text
;
     Value is a unique key to a set of space_group items
     in a looped list.
;
_name.category_id                       space_group
_name.object_id                         key
_type.purpose                           Key
_type.source                            Related
_type.container                         Single
_type.contents                          Code
loop_
  _method.purpose
  _method.expression
         Evaluation          '_space_group.key = _space_group.id'

save_

# ... end of additions
####

save_SPACE_GROUP_SYMOP

_definition.id                          SPACE_GROUP_SYMOP
_definition.scope                       Category
_definition.class                       Loop
_definition.update                      2013-09-08
_description.text
;
     The CATEGORY of data items used to describe symmetry equivalent sites
     in the crystal unit cell.
;
_name.category_id                       SPACE_GROUP
_name.object_id                         SPACE_GROUP_SYMOP
_category.key_id                        '_space_group_symop.key'
loop_
  _category_key.name
         '_space_group_symop.sg_id'     # ADDED
         '_space_group_symop.id'

save_

save__space_group_symop.key

_definition.id                          '_space_group_symop.key'
loop_
  _alias.definition_id
         '_space_group_symop.key'
_definition.update                      2016-06-16    # CHANGED
_description.text
;
     Value is a unique key to a set of space_group_symop items
     in a looped list.
;
_name.category_id                       space_group_symop
_name.object_id                         key
_type.purpose                           Key
_type.source                            Related
_type.container                         List          # CHANGED
_type.contents                          'Code,Index'  # CHANGED
loop_
  _method.purpose
  _method.expression
         # CHANGED:
         Evaluation          '_space_group_symop.key = [_space_group_symop.sg_id, _space_group_symop.id]'

save_

####
# ADDED:
# Note: this item is needed in any case because mmCIF and
# the DDL1 core define it

save__space_group_symop.sg_id

_definition.id                          '_space_group_symop.sg_id'
loop_
  _alias.definition_id
         '_space_group_symop.sg_id'
         '_space_group_symop_sg_id'
_definition.update                      2016-06-16
_description.text
# copied from mmCIF:
;
   This must match a particular value of _space_group.id, allowing
   the symmetry operation to be identified with a particular space
   group.
;
_name.category_id                       space_group_symop
_name.object_id                         sg_id
_name.linked_item_id                    '_space_group.id'
_type.purpose                           Link
_type.source                            Related
_type.container                         Single
_type.contents                          Code

# Take note of this:
_enumeration.default                    ''

save_

# ... end of additions
####

________________________________

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.