Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Making ddl2 <-> ddlm translation a reality

Dear DDLm group,

imgCIF has not yet been incorporated into the DDLm world, which I think is essential for us to take advantage of its excellent raw data descriptors. As we all know, raw data is becoming increasingly important, and even if data are not stored in CIF format, CIF descriptors can be adapted to any format. In order to support a DDLm version of the imgCIF dictionary, imgCIF maintainers want a DDL2 -> DDLm -> DDL2 dictionary round trip to preserve the important information. This will allow a single version of imgCIF to be maintained and be available to both the macromolecular community and to the communities covered by non-DDL2 dictionaries.  

As part of preliminary investigation I have analysed how the translation would work in both directions by writing (but not testing!!) dREL methods that operate on dictionary data. I'll make this document available on Github shortly.  Most of the fundamental relationships and data name information are simple to transform.

However, as a result of this investigation I have come up against a key problem that has long ago been identified, related to DDL2 types. So: item_type_list is a category that tabulates all possible DDL2 "types", by linking a type name with a regular expression, a primitive type (char/uchar/numb/null) and an explanation. In contrast to DDLm, these types are defined in the domain dictionary instead of semi-baked into the DDL.

While a mapping from DDL2 types to DDLm types is largely straightforward, in doing this a lot of the DDL2 imgCIF/mmCIF information is lost, particularly the highly detailed distinctions between various textual formats in imgCIF/mmCIF that are captured in regular expressions.  This means that sensible translation back to DDL2 is impossible, most fundamentally because the DDL2 names of the types are not preserved - DDLm has no dictionary-definable types.

Here are some options that I see for solving this, and by extension the basic DDLm -> DDL2 translation problem:
(1) When translating DDLm-> DDL2, the _item_type_list found in mmCIF/PDBx is consulted for matching regex and the corresponding code used. If none found, arbitrary code is generated.
(2) We create an extension dictionary to DDLm which defines a few extra attributes specifically for preserving DDL2 information (e.g. type codes).
(3) We create a new DDLm category for "foreign" attributes, where arbitrary foreign attributes with values can be listed.
(4) Your suggestions?

Option (1) is not that unnatural, as imgCIF (and as I understand it any mmCIF extension dictionary) should harmonise its units and item type lists with mmCIF. So the translation is not DDLm -> DDL2, but instead DDLm -> DDL2 -> mmCIF extension dictionary. However, in this case we would be using the regex as a natural key and so the DDL2->mmCIF extension step is a bit fragile e.g. there might be multiple ways to express the same text constraints using a regex and therefore matches might be missed if either DDL2 or DDLm sides update a regex.

Option (2) is easy enough to create, and has the advantage of extensibility if and when more things that are dropped in translation are desired. It also serves to "define" the differences in information granularity between DDL2 and DDLm.  It allows "pure" DDLm users to work in DDLm, and then if somebody wishes to incorporate that ontology into the mmCIF world, the list of necessary attributes to add to the definitions is available.  It does however create (yet) another DDL, although one that could be said to come under the DDLm umbrella.  Additionally, it may serve as a model for integrating with other ontologies beyond DDL2. If this group sees merit in this approach, we would probably organise formal approval in COMCIFS.

As far as I can see, Option (3) would only work in a non-clunky way for non-looped attributes which is fine for the particular case of item_type but is not extensible.

What I would like from this group is for us to consider the above options and for us to arrive at a preferred approach.


T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.