Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Dictionary versioning

Dear DDLm Group,

1. We should indeed limit this discussion to the 'import' functionality and versioning of DDLm dictionaries. I also agree that we should deprecate the DDL1 dictionary merging protocol, which as far as I know it was never fully implemented, or used in published dictionaries or data files.

2. My original suggestion attempted to cover both 'Contents' and 'Full' imports assuming 'if_dupl' referred only to save frame names, but John's suggestion of extending 'if_dupl' to cover both the case of duplicate definition frames, and duplicate attributes within those frames, seems reasonable.  After a little thought I've convinced myself that 'if_dupl' can always refer to attributes for 'Contents' imports, and save frames for 'Full' imports, as save frames in 'Full' mode always either completely replace other save frames or are completely ignored; there is no 'merge' operation that would descend to attribute level.

I think the ultimate goal of this discussion is to have enough information available in the dictionary version numbering and the import attributes to be able to mechanically decide whether or not a templated definition and importing definition are compatible, erring on the side of caution.  A tool could then be easily written that would detect all such potential incompatibilities and allow authors to bring definitions and templates back into line (e.g. by finding newer versions or bumping the acceptable version number).  I suggest the following scheme, where 'attributes' becomes 'save frame names' for 'Full' imports:

(1) We introduce an optional 'version' key that can be specified for importation, of form <major>.<minor>.<patch>. Where absent, all versions are acceptable
(2) If there is a mismatch between the <major> versions of importing and imported dictionaries, the definitions are potentially incompatible
(3) If major versions match, but 'if_dupl' is 'Exit' and duplicate attributes exist, the definitions are potentially incompatible
(4) In all other cases the definitions are considered to be compatible.

Note that the major version number for a domain dictionary would never be incremented, as that goes against our undertaking to maintain stable definitions.

As a companion to the above scheme, we need instructions for when to increment the major version number of template dictionaries. As importation could in theory occur from domain dictionaries as well, these rules also describe "things that shouldn't ever change in domain dictionaries".

I suggest the following:

(1) If an attribute is removed from a definition (this could be modified to allow some non-mandatory attributes to be excluded e.g. examples)
(2) If a definition is removed (without being replaced by an alias in the case of a domain dictionary)
(3) If an enumerated state is removed from a list within a definition
(4) If the value of an enumerated state for certain important attributes is changed (for discussion)

all the best,

James.

On 25 July 2017 at 00:02, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Dear DDLm group,

 

I completely agree with James’s suggestion to use semantic versioning as the basis for dictionary version numbers.  These principles are probably familiar already to most people on this group, but some may be unaware that there is an actual standard-ish document describing them, published at http://semver.org/.  As presented there, semver is focused on software versioning, but inasmuch as the purpose is determining and managing compatibility between separately-maintained, interdependent software components, I don’t think it’s much of a stretch to apply them to separately-maintained, interdependent data components, such as CIF dictionaries.

 

As for the behavior of combining dictionaries to form a composite, we now have two procedures for that: the dictionary merging protocol, documented in ITvG 3.1.9.1, and DDLm’s importing facility.  I suppose James uses the term “import” specifically in reference to DDLm’s import mechanism, but we should not neglect the dictionary merging protocol.  If we don’t want to give it any other consideration then we should at least say that the rest of this discussion, and in particular the principles for assigning version numbers to dictionaries, are specific to DDLm dictionaries.  If that were the direction we went, then I would furthermore recommend deprecating the dictionary merging protocol for use with DDLm dictionaries.

 

As far as the behavior of DDLm importation when existing and imported attributes collide, we should clarify exactly what situation or situations that would be.  Since the issue is couched in terms of attributes, not whole definitions, I suppose we must be talking about an import defined with

 

_import_details.mode 'Contents'

 

, else there is no scope for an attribute-level collision.  Is that correct?  If so, the question seems to assume that the associated value of _import_details.if_dupl is not relevant in that case.  Such an assumption is consistent with the letter of that item’s definition, but seems perhaps contrary to its spirit.  If this is the path we are traversing, then I think we should clarify the meaning and applicability of _import_details.if_dupl as a threshold issue for the rest of this discussion.

 

For reference, I’m looking at the current version of DDLm as published on the IUCr web site, version 3.11.09.

 

 

Regards,

 

John

 

--

John C. Bollinger, Ph.D.

Computing and X-Ray Scientist

Department of Structural Biology

St. Jude Children's Research Hospital

John.Bollinger@StJude.org

(901) 595-3166 [office]

www.stjude.org

 

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of James Hester
Sent: Monday, July 24, 2017 12:59 AM
To: ddlm-group <ddlm-group@iucr.org>
Subject: [ddlm-group] Dictionary versioning

 

Dear DDLm group,

The vagueness of dictionary versioning has been raised as an issue (see https://github.com/COMCIFS/cif_core/issues/47). Now that dictionaries can import template dictionaries, it becomes possible that the template dictionary could change in ways that would render the main dictionary incorrect, for example, if a necessary attribute was removed from the template definition.  Such fiddling with template attributes has recently been proposed (see https://github.com/COMCIFS/cif_core/issues/42) as a solution to certain technical issues.

While COMCIFS will obviously endeavour to maintain both template dictionaries and main dictionaries as a compatible whole, we should come up with some principles for versioning to guide authors and editors, as well as authors of dictionary checking software.  I suggest that we use semantic versioning of the form <major>.<minor>.<patch>, where a change in the major version number is required when incompatible changes are introduced.

There are two situations that are important: importing pieces of a definition from a template dictionary, and importing a whole dictionary in order to build on it. 

Versioning in template dictionaries: Firstly, there has been no explicit statement of how importation should treat the presence of the same attribute in both the template and the importing definition - I suggest the simple principle that the value in the importing definition always has precedence over the imported value. Assuming this, a template dictionary will be potentially incompatible with an importing dictionary if attributes are removed from a definition, a definition is itself removed, or the value of an attribute is changed in a way that would change the behaviour of software.  Either of these three changes would require an increase in the major version number of a template dictionary.  Other changes are covered by the rules below for full dictionaries.

Versioning in full dictionaries: we never make any changes in a domain dictionary that would require a change in major version number as this would undermine our goal of stable, universal data names. We are then left with simple rules for changing the non-major version numbers in both full and template dictionaries:

  1. change the patch version for typo correction, rewording and clarification
  2. increment the minor version for all other changes: additions to enumerations, new definitions, moving data names to aliases of new definitions

Feel free to respond either on the github issue or here.

James.

--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148



Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group




--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.