Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] audit.schema discussion

  • To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
  • Subject: Re: [ddlm-group] audit.schema discussion
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Fri, 17 Jun 2016 14:42:13 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:from:date:message-id:subject:to;bh=cWET7oYL3Hixvj79LwjPybwkhY/9QziAxSINshlH4A0=;b=cnyP/QSnmNmv37W3N6KD9IdMQDfLp3yf7+uIGAjksefhWCyBF4oGtQ+0VKkXZbAFfzAsvYFAdsJS8nJhkYvS5Fh0/DzZ7r8j4JJzVsT7MHiY6Xq5yTQHKqNvTcC4KKWQwsBYLBr27CxjvWMag7BctsGeQNWdGxwbk47SDU7l61HbCFPu500Oa2lSfqmQ9TUN825j146FboXCTDiRqVbxGqoh+pdEaWJwJW4rHCismhWQVNlRZUjD1WvkGJpCbtTkb9tcF3NPI3JMf4p8AqFJQYit28jnd/wdluI8jZFmTMd2zexQSmBnoG1NZRyaUHcRuP5zoVN3WdRaJebbHtojsw==
Dear DDLmers,

Comments at the end, and I have resurrected the subject line.

On 17 June 2016 at 08:47, Bollinger, John C <John.Bollinger@stjude.org> wrote:
Dear Colleagues,

[...] 
I acknowledge that I may have misunderstood proposal #2.  In light of James's subsequent comments I'm now interpreting it to hinge in part on the assertion that undefined categories are necessarily Sets, so that if they are initially defined as Loops or as Sets-with-key then that constitutes a change that may need to be advertised via _audit.schema.  I trust that if I still have it wrong then James will supply further clarification.  Whether I accept that undefined categories are Sets or not, I can accept that P2 provides a more complete solution than I previously gave it credit for, provided we find a suitable specification for which category names should be listed in _audit.schema.

Yes, your description is correct.

James raises some good points about audit_conform.  My previous comments about dictionary versioning are applicable here, but they do not constitute a rebuttal.

On the other hand, similar can be applied to the proposed usage model for _audit.schema: if a data file expresses via _audit.schema that a certain category has been modified, and a given piece of software does not have that category in its list of acceptable schema changes, then the software will reject the data, even if it does not use the modified category in any way.  This is analogous to the example of software rejecting a data file that uses _audit_conform.dictionary to express conformance with pd_CIF, even though that does not change the interpretation of the items the software actually uses.  I believe this is the same issue James acknowledged:


> The audit.schema system requires some extra work in non-default cases: software cannot trivially determine that an unknown (at software creation time) value of audit.schema was acceptable for some given datablock contents and use cases. This case can be dealt with if the software is prepared to parse and analyse the dictionaries provided in _audit_conform, or if the software is prepared to run a generic conversion utility to transform to its own schema.
> [...]

Yes, you have understood correctly.


I considered that it might be possible to solve that problem by inverting the usage model for _audit.schema: instead of applications listing values that they can accept, they could instead list values that they must reject.  That would no longer be as easy to use, but I don't think it would be prohibitive.  But that isn't sufficient either, however, because it doesn't provide for rejecting some changes to a given category but not others.  As James suggests, the problem could be solved instead by analyzing dictionaries or converting schemas, but if it comes to that then most practical utility of _audit.schema has been lost.  An application prepared to do that would more simply just start in with the dictionary analysis straightaway.

Ultimately, I would not oppose adding _audit.schema to the DDLm core dictionary with a definition that makes it advisory, rather than prescriptive.  Inasmuch as we seem to agree that _audit.schema cannot replace consulting the relevant dictionaries, I think that's the most reasonable form for its definition to take if we do define it.  Although I think it would be possible to define items that express schema characteristics in sufficient detail for applications to determine whether they are prepared to understand the file, such items would not have any of the ease of use that _audit.schema enjoys.

That does not, however, constitute agreement to Proposal #2 overall.

I do not agree exactly "that _audit.schema cannot replace consulting the relevant dictionaries [at run time]". I do agree that *if* _audit.schema does *not* have the default value, then an application *must* consult the dictionaries listed in _audit.conform *if* it wants to be guaranteed of correctly interpreting the provided file.  If _audit.schema *does* have the default value, then _audit.schema *does* replace consulting the relevant dictionaries at run time. How about we say that "it is mandatory for CIF-reading software to check either _audit.schema or the dictionaries listed in _audit_conform.dictionary. If _audit.schema is non-default, then checking _audit_conform.dictionary is strongly recommended"?

In any case, the above must become a widespread practice before either the 'default key' or 'Set' proposals can be used to produce dictionaries.  I would intend that we publicise this recommendation widely, make web pages, contact individual authors personally, check existing software, liaise with large-scale databases, get an alert into CheckCIF and so forth, and hold off on approving any expanded dictionaries until checking of _audit.schema (or dictionaries) was performed by some large fraction of software in daily use (which is easy to test).

So, what I hope we can now agree to is:

(1) That a new dataname, called '_audit.schema' or similar, is defined. Each enumerated value in _audit.schema corresponds to a list of 'Set' (or 'default key') categories that may have multiple packets, and the default value corresponds to current practice.
(2) That we aim for a situation in which all CIF-reading software checks either _audit.schema or the dictionaries listed in _audit_conform.dictionary, and if _audit.schema is non-default, _audit_conform.dictionary is always checked

all the best,
James.


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.