Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Dictionary versioning

(Apologies for the enormous delay dealing with this. As you might have noticed I'm trying to get rid of the backlog of issues).

Thanks Herbert for pointing in the direction of an alternative to semantic versioning.  As I read it, the GNU approach for version c.r.a would require bumping c whenever an interface changes, which we could interpret to mean whenever a data name is added, changed or removed.  That would seem incompatible with current practice, given that we have essentially said that DDL1 is version 1 and DDL2 is version 2. 

Alternatively, if we view a dictionary as a relational database containing ontological data, then perhaps the interface is the database schema. In this case our interface would only change if we add/change DDLm attributes, but again this is catered for by the DDL compatibility attribute.

I think I would prefer to pursue the semantic versioning path as it seems a simpler fit.

On Sat, 29 Jul 2017 at 01:12, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
Dear Colleagues,

  I would suggest consideration of the versioning scheme used by GNU libtool for libraries (see https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html).  This versioning
scheme is not based on major and minor versions, but on how the changes made impact actual
use.  The scheme has 3 components:  current, revision, and age.  To distinguish this from the
more common versioning schemes, the components are separated by colons.

  You start off with 0:0:0.
  You only change the version information on public releases.
  You change the first component (current), and reset the second component (revision) to 0, when you make any change that adds, removes or changes an "interface" -- i.e. that changes how the dictionary can be used.
  You change the second component (revision) when there is a change in the "code", in this case the
text of the dictionary with the value of current
  You change the third component (age) when you add, remove or change any interfaces.  You go up by 1 if you added any interfaces.  You reset age to 0 when you remove or change any interfaces.

  This is not the same as release numbers.  You still need them.  But is has proven very useful in untangling the dependencies of libraries in unix, and might have similar value in helping to untangle CIF dictionary dependencies.


On Fri, Jul 28, 2017 at 10:35 AM, Bollinger, John C <John.Bollinger@stjude.org> wrote:
Dear DDLm Group,

Comments in-line below.

On Tuesday, July 25, 2017 9:30 PM, James Hester wrote:
> [...]
> 2. My original suggestion attempted to cover both 'Contents' and 'Full' imports assuming 'if_dupl' referred only to save frame names, but John's suggestion of extending 'if_dupl' to cover both the case of duplicate definition frames, and duplicate attributes within those frames, seems reasonable.  After a little thought I've convinced myself that 'if_dupl' can always refer to attributes for 'Contents' imports, and save frames for 'Full' imports, as save frames in 'Full' mode always either completely replace other save frames or are completely ignored; there is no 'merge' operation that would descend to attribute level.

I concur with that analysis, and I am satisfied with that interpretation of how 'if_dupl' would apply in 'Contents' import mode.  Do note, however, that such an interpretation complicates the compatibility analysis, as I discuss below.

> I think the ultimate goal of this discussion is to have enough information available in the dictionary version numbering and the import attributes to be able to mechanically decide whether or not a templated definition and importing definition are compatible, erring on the side of caution.  A tool could then be easily written that would detect all such potential incompatibilities and allow authors to bring definitions and templates back into line (e.g. by finding newer versions or bumping the acceptable version number).  I suggest the following scheme, where 'attributes' becomes 'save frame names' for 'Full' imports:
> (1) We introduce an optional 'version' key that can be specified for importation, of form <major>.<minor>.<patch>. Where absent, all versions are acceptable

I observe that if the version numbers are indeed assigned according to semantic versioning principles, then the <patch> part is irrelevant to import-compatibility analysis.  I suggest, therefore, that it be omitted from the version-requirement metadata (but not from the item expressing a dictionary's version), or at least made optional.

I think we should also consider whether the major and minor numbers required should be expressed as one data item or two.  I don't necessarily object to using just one item, but the two significant parts of the version number can be regarded as having separate significance, thus possibly justifying separate items.

> (2) If there is a mismatch between the <major> versions of importing and imported dictionaries, the definitions are potentially incompatible
> (3) If major versions match, but 'if_dupl' is 'Exit' and duplicate attributes exist, the definitions are potentially incompatible
> (4) In all other cases the definitions are considered to be compatible.

I'd prefer to omit the word "potentially" here.  If semantic version comparison of the required dictionary version to the available dictionary version does not indicate compatibility, then the two should be considered incompatible for that reason alone.  Taking any other position encourages dabbling with "try it and see" approaches, and these are prone to subtle failures.  It will be hard enough to just get the version number assignments right.

Additionally, I'm not certain that the proposed evaluation rules are sufficient, in that they do not consider the minor version number.  At minimum, the imported dictionary should be considered incompatible if its minor version number is less than the one specified by the importing dictionary, because the importing dictionary's choice of minor version number implies that it requires features of the imported dictionary that were added at that version.  If we are supposing that it is possible to update dictionaries such that the minor version increases but the major version remains the same, then we need to consider the minor version number, too.

> Note that the major version number for a domain dictionary would never be incremented, as that goes against our undertaking to maintain stable definitions.
As a companion to the above scheme, we need instructions for when to increment the major version number of template dictionaries. As importation could in theory occur from domain dictionaries as well, these rules also describe "things that shouldn't ever change in domain dictionaries".
> I suggest the following:
> (1) If an attribute is removed from a definition (this could be modified to allow some non-mandatory attributes to be excluded e.g. examples)

I've attempted to go through DDLm to identify more precisely which attributes need to be preserved, and in what sense.  I am by no means certain that I've gotten it all right, but see below.

> (2) If a definition is removed (without being replaced by an alias in the case of a domain dictionary)

I guess there are two levels of issue here.  Since DDLm imports are defined in terms of frame codes rather than the identities of defined items, there are stronger constraints on import compatibility than on compatibility for standalone use.  With respect to imports, then, (2) appears to work out more specifically to these:

(2'.1) If there are frame codes in the old dictionary that do not appear in the new one, taking into account the case-insensitivity of frame codes

(2'.2) If the identity (_definition.id) of the item defined by a definition frame (as identified by its frame code) is changed, taking into account data name case-insensitivity, unless the previous identity is added to the definition as an alias (_alias.definition_id).

(2'.3) If there is any alias (_alias.definition_id) in the old definition that does not also appear in the new one.

For standalone use, on the other hand, I think "If a definition is removed" suffices, provided "definition" is interpreted as discussed below.

> (3) If an enumerated state is removed from a list within a definition
> (4) If the value of an enumerated state for certain important attributes is changed (for discussion)

I offer an alternative, stronger rule for enumerations below.

Subject to my comment above, I agree that all of those changes render the affected dictionary incompatible for import, but it is not an exhaustive list.  Additionally, it seems that all of these would also produce import incompatibility:

 (5) If a definition's category is changed (_name.category_id).
 (6) If a definition's object ID is changed (_name.object_id), except possibly in conjunction with a renaming + aliasing.
 (7) If the class or scope of a definition is changed (_definition.class, _definition.scope).
 (8) If any pre-existing category definition is modified by introduction, modification, or deletion of _category.key_id and/or _category_key.name attributes.
 (9) If any pre-existing definition is modified by adding, removing, or modifying any attribute from the ENUMERATION or ENUMERATION_DEFAULT category.
 (10) If any definition is modified so that it no longer has a method for a given purpose (_method.purpose).
 (11) If the expression of any method (_method.expression) of a definition, as identified by its purpose (_method.purpose) is modified in a way that produces incompatibility (sorry for the circularity).
 (12) If a definition is modified in any of a variety of ways involving modification, addition, or deletion of attributes from the TYPE category, such that the value of any item that is valid with respect to the old definition is invalid or interpreted differently with respect to the new one.
 (13) If the _units.code attribute of a definition is modified or deleted.

Where (5) - (13) refer to a "definition", import compatibility construes definitions as being identified by frame codes, whereas standalone compatibility construes them as being identified by _definition.id and/or _alias.definition_id.  Except with respect to aliases, that distinction is moot for dictionaries that adhere to a convention of using frame codes that are computable from the _definition.id of the item defined within.



John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
(901) 595-3166 [office]


Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
ddlm-group mailing list

ddlm-group mailing list

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.