Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Draft namespace recommendations

  • To: "Discussion list of the IUCr Committee for the Maintenance of the CIF Standard (COMCIFS)" <comcifs@iucr.org>
  • Subject: Re: Draft namespace recommendations
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Wed, 17 Jul 2013 11:46:44 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:in-reply-to:references:date:message-id:subject:from:to:content-type; bh=IKcbzkabiEBu51aaP/WVFHw34zXKGTmce7AV05vOesc=;b=RWo8vm01ZAQuYQMWFYuUL1OHh2EJcKAzG/NQOPdSy8KVswXa/A44qHTob6V9rn0akFxaZljuH75M/Cj4uUJkkfuiTbnjIjLmwrpukoZW59wm2gWfZO/WSoISMV3rDA7l3h/dNkSEEsfB+PWbhmdHn2lxGVDj+z8d5XT5Ff0dgK6FdRwtBzjzZGrbvb5eXjon2hArqhYsJQPXCVB6HfbZrYax1fIZnJg+/mzbjiPmkzGaFQEMNW7aM/71t25AnhH7P2m4FmTDg1pvFPrlKyZwlSWQMkehyAIp7XcuhOn3Zyj1MM5BbtfjPYFNe0VsUKHd0Kq7fcWFem91p/9MpRB1oA==
  • In-Reply-To: <CAM+dB2eWdo5Hn0nMJ98O_dXN72jhmVGCPUjxugK+whEz9-AoRg@mail.gmail.com>
  • References: <CAM+dB2eWdo5Hn0nMJ98O_dXN72jhmVGCPUjxugK+whEz9-AoRg@mail.gmail.com>
Dear COMCIFS members,

There has been little discussion on the two namespace proposals linked
in my original email in February (apologies for the delay), which
leads me to conclude that they are acceptable.  For archival purposes,
I have included the full text of the two proposals in this email, and
I now request the COMCIFS voting members to formally indicate their
agreement.  In the case of a disagreement, please note that
disagreement briefly in your reply and then follow up the issue in the
namespace forum.  In accordance with COMCIFS practice, if more than 6
weeks pass from today's date with no reply, agreement will be assumed,
although explicit and rapid assent is always preferable.

Note that the second namespace proposal below differs slightly from
the one originally linked: I have added a sentence clarifying the
meaning of 'IUCr domain' in the preamble and clarified the meaning of
'adopting' a third party dictionary.


Proposal for a new dataname to support a CIF namespace mechanism


We wish to build some sort of namespace mechanism into CIF so that
other communities can use CIF with minimal, if any, coordination with
COMCIFS. The key requirement is that datanames and the corresponding
dictionary definitions must be unambiguously matchable. Currently,
COMCIFS guarantees the uniqueness and immutable nature of datanames,
so there is no need for any disambiguation mechanism. If CIF is to be
usable outside COMCIFS, there must be a mechanism so that the readers
and writers of CIF data files from a given community can agree on the
correct definition for a given dataname.

Two partial solutions already exist:
(1) people and organisations register an opaque 'prefix' for a
dataname with the IUCr. This allows users to populate their own
namespaces safely and devolves management of dataname collisions to
the relevant community. From the point of view of the outside
discipline, there remains the annoyance that the datanames and
dictionaries are cluttered with a redundant prefix.
(2) The _audit tags in a datablock can specify which dictionary the
datanames come from. The problem then becomes one of encouraging
programs to read and write these _audit items, given that simply
finding a matching dataname in a datafile is already a pretty solid
guarantee that it means what the programmer thought, as COMCIFS has up
until now guaranteed the stability and uniqueness of datanames.

Some discussion has taken place in the namespaces forum and members
are invited to read the comments there as well.

Proposed solution

We define an enumerated dataname, _audit.discipline, which takes
values assigned by COMCIFS and should never be redefined by any
CIF-using organisation - in effect it becomes part of the CIF
specification. We can formally define a 'discipline' here as a
collection of dictionaries which define datanames that are guaranteed
to always have a constant, unambiguous meaning. This guarantee would
presumably be provided by some organisation using policies chosen by
that organisation. A CIF datafile wishing to explicitly specify which
discipline its datanames are drawn from would set the value of
_audit.discipline inside its datablocks. Likewise, programmers who are
concerned about possible ambiguity in datanames can explicitly check
for the value of this dataname.

Note the following:

* The IUCr would maintain a registry of accepted disciplines. In
minimal form this could be the dictionary entries for
_audit.discipline and something like _audit.discipline_URI
* There is no requirement to use the _audit.discipline dataname, nor
to register disciplines. It is provided as a tool for those wishing to
avoid ambiguity
* Disciplines not wishing to register their discipline name but still
wishing to use _audit.discipline, must never choose 'IUCr' (or
whatever it is we decide) for their discipline name
* Minimal checking is required compared to the current _audit
datanames, but similar guarantees of uniqueness and correctness are
* The _audit.discipline dataname should never be looped. Datanames
drawn from multiple disciplines may not have overlapped when a
datafile was produced, but may overlap when it is read, as there is no
coordination between disciplines.

The scope of the _audit.discipline dataname is the entire datablock
and all save frames within that data block, unless a save frame gives
a different value for _audit.discipline, in which case that new value
will apply to all nested save frames within that save frame.


Draft COMCIFS dataname and dictionary policy within the IUCr domain

COMCIFS must ensure the uniqueness of all extant datanames within the
IUCr domain. The following policy is designed to maximise the chances
that the status and meaning of any dataname encountered in the IUCr
domain is unambiguous. A dataname is considered to be within the IUCr
domain if the proposed _audit.discipline dataname has the value

(1) Datanames not explicitly approved by COMCIFS and appearing in CIF
datafiles should either contain the string '[local]' or commence with
a prefix handed out by COMCIFS
(2) COMCIFS makes no undertakings as to the uniqueness of datanames
containing the string '[local]'.
(3) In the register of approved prefixes, COMCIFS may provide
certification that datanames with a given prefix will be unique. In
order to obtain this certification, a prefix assignee should:

(i) publish a publically-available dictionary defining all datanames
with that prefix
(ii) have an organisational structure judged capable of enforcing
dataname policy (a single person also suits this criterion)

(4) Alternatively, if a prefix assignee provides to the IUCr a
dataname dictionary and advises that the prefix is no longer in use,
the IUCr will archive that dictionary and certify that the prefix is
unique. If later workers wish to re-use such a 'closed' prefix, they
must not define any items that appear in such archived dictionaries.
(5) The IUCr cannot provide any guarantees as to the correctness or
uniqueness of definitions in dictionaries published by third parties.
COMCIFS may choose, on request, to bring such third party dictionaries
into the IUCr domain, in which case datanames and details of
definitions may change.

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
comcifs mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.