Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Draft namespace recommendations

Then I have further questions -

For an item such as _pdbx_entity_src_syn.entity_id, would I expect to define
this as _PDBX::entity_src_syn.entity_id or _PDBX::pdbx_entity_src_syn.entity_id
where I am expecting the usage in the data file to continue to carry the
_pdbx prefix (i.e. _pdbx_entity_src_syn.entity_id).

I am trying to understand if the motivation for this is to avoid the use
of the organization specific prefixes in the data names, or is this an
organizational tool to formally collect a set of organization extensions
within the namespace container.   In other words, what is the intended fate
of existing data items containing organizational prefixes in data names?

On 7/17/13 7:16 AM, yayahjb wrote:
> Just as with the xml and C++ namespace use,
>    _IUCR::atom_site.occupancy
> has exactly the same meaning as
>   _atom_site.occupancy
> in a CIF for which no discipline other than IUCR has been specified.
> If you are in a CIF for which some other discipline has been specified,
> then they could (and most likely would) have different meanings.
> In the IUCR discipline namespace, the two tags would both refer
> to the same ATOM_SITE category occupancy column with nothing
> additional having to have been added to the definitional save
> frames of the category nor of the column.  I would recommend
> explicitly adding the appropriate _audit.discipline... tags to
> the dictionary, but I think it only fair to say that any dictionary
> for which no _audit.discipline... have been added is presumed
> to be under the IUCR discipline.
> On 7/17/13 7:04 AM, John Westbrook wrote:
>> Can I get clarification on the interpretation of the namespace for the
>> case in your example -
>> _IUCR::atom_site.occupancy
>> Do you interpret that the above item name is a reference to a logically distinct
>> data category named 'atom_site' which exists in the IUCR namespace? This case
>> would have formally been represented as something like _iucr_atom_site.occupancy.
>> Or, do you interpret this as adding the data item _atom_site.occupancy from
>> the IUCR namespace to the standard data category, atom_site?
>> John
>> On 7/17/13 6:44 AM, yayahjb wrote:
>>> The discipline proposal does not provide a convenient way to mix datanames from multiple
>>> disciplines in the same datablock.  XML DOM does it with a colon separator.   C++ does it
>>> with a "::" separator.  I suggest we adopt the C++ convention, treating a discipline
>>> similarly to a C++ namespace.  I would suggest adding the following:
>>>   " To facilitate use of datanames from multiple disciplines in a single CIF the discipline
>>> from which a dataname is drawn may be specified by using C++ namespace-style
>>> conventions, i.e. by optionally prefixing the dataname with the discipline followed immediately
>>> by a double colon, as in
>>>    _IUCR::atom_site.occupancy
>>> as equivalent to
>>>    _atom_site.occupancy
>>> when the discipline is know to be IUCR
>>> Note that leading underscore appears before the prefixed discipline and and the
>>> double colon, and that no spaces are permitted in the construct."
>>> Neither the discipline proposal nor the statement of the existing prefix system
>>> talks about the interaction with dotted notation as it is used in DDL2. In DDL2,
>>> the prefix may be used on either dotted component:  the category or the
>>> column.  I suggest adding the following statements in the prefix section.
>>> "When an IUCr CIF dictionary uses formal dotted notation separating a dataname
>>> into category and column components (as in DDL2) the prefix may appear
>>> on either component."
>>> Finally we should explicitly allow the combination of dicsciplines and prefixes
>>> as in
>>> _IUCR:: audit_author.PDBX_ordinal
>>> On 7/16/13 9:46 PM, James Hester wrote:
>>>> Dear COMCIFS members,
>>>> There has been little discussion on the two namespace proposals linked
>>>> in my original email in February (apologies for the delay), which
>>>> leads me to conclude that they are acceptable.  For archival purposes,
>>>> I have included the full text of the two proposals in this email, and
>>>> I now request the COMCIFS voting members to formally indicate their
>>>> agreement.  In the case of a disagreement, please note that
>>>> disagreement briefly in your reply and then follow up the issue in the
>>>> namespace forum.  In accordance with COMCIFS practice, if more than 6
>>>> weeks pass from today's date with no reply, agreement will be assumed,
>>>> although explicit and rapid assent is always preferable.
>>>> Note that the second namespace proposal below differs slightly from
>>>> the one originally linked: I have added a sentence clarifying the
>>>> meaning of 'IUCr domain' in the preamble and clarified the meaning of
>>>> 'adopting' a third party dictionary.
>>>> James.
>>>> =========================
>>>> http://forums.iucr.org/viewtopic.php?f=28&t=319
>>>> Proposal for a new dataname to support a CIF namespace mechanism
>>>> Background
>>>> We wish to build some sort of namespace mechanism into CIF so that
>>>> other communities can use CIF with minimal, if any, coordination with
>>>> COMCIFS. The key requirement is that datanames and the corresponding
>>>> dictionary definitions must be unambiguously matchable. Currently,
>>>> COMCIFS guarantees the uniqueness and immutable nature of datanames,
>>>> so there is no need for any disambiguation mechanism. If CIF is to be
>>>> usable outside COMCIFS, there must be a mechanism so that the readers
>>>> and writers of CIF data files from a given community can agree on the
>>>> correct definition for a given dataname.
>>>> Two partial solutions already exist:
>>>> (1) people and organisations register an opaque 'prefix' for a
>>>> dataname with the IUCr. This allows users to populate their own
>>>> namespaces safely and devolves management of dataname collisions to
>>>> the relevant community. From the point of view of the outside
>>>> discipline, there remains the annoyance that the datanames and
>>>> dictionaries are cluttered with a redundant prefix.
>>>> (2) The _audit tags in a datablock can specify which dictionary the
>>>> datanames come from. The problem then becomes one of encouraging
>>>> programs to read and write these _audit items, given that simply
>>>> finding a matching dataname in a datafile is already a pretty solid
>>>> guarantee that it means what the programmer thought, as COMCIFS has up
>>>> until now guaranteed the stability and uniqueness of datanames.
>>>> Some discussion has taken place in the namespaces forum and members
>>>> are invited to read the comments there as well.
>>>> Proposed solution
>>>> We define an enumerated dataname, _audit.discipline, which takes
>>>> values assigned by COMCIFS and should never be redefined by any
>>>> CIF-using organisation - in effect it becomes part of the CIF
>>>> specification. We can formally define a 'discipline' here as a
>>>> collection of dictionaries which define datanames that are guaranteed
>>>> to always have a constant, unambiguous meaning. This guarantee would
>>>> presumably be provided by some organisation using policies chosen by
>>>> that organisation. A CIF datafile wishing to explicitly specify which
>>>> discipline its datanames are drawn from would set the value of
>>>> _audit.discipline inside its datablocks. Likewise, programmers who are
>>>> concerned about possible ambiguity in datanames can explicitly check
>>>> for the value of this dataname.
>>>> Note the following:
>>>> * The IUCr would maintain a registry of accepted disciplines. In
>>>> minimal form this could be the dictionary entries for
>>>> _audit.discipline and something like _audit.discipline_URI
>>>> * There is no requirement to use the _audit.discipline dataname, nor
>>>> to register disciplines. It is provided as a tool for those wishing to
>>>> avoid ambiguity
>>>> * Disciplines not wishing to register their discipline name but still
>>>> wishing to use _audit.discipline, must never choose 'IUCr' (or
>>>> whatever it is we decide) for their discipline name
>>>> * Minimal checking is required compared to the current _audit
>>>> datanames, but similar guarantees of uniqueness and correctness are
>>>> obtainable
>>>> * The _audit.discipline dataname should never be looped. Datanames
>>>> drawn from multiple disciplines may not have overlapped when a
>>>> datafile was produced, but may overlap when it is read, as there is no
>>>> coordination between disciplines.
>>>> The scope of the _audit.discipline dataname is the entire datablock
>>>> and all save frames within that data block, unless a save frame gives
>>>> a different value for _audit.discipline, in which case that new value
>>>> will apply to all nested save frames within that save frame.
>>>> ===================================================================
>>>> http://forums.iucr.org/viewtopic.php?f=28&t=315
>>>> Draft COMCIFS dataname and dictionary policy within the IUCr domain
>>>> COMCIFS must ensure the uniqueness of all extant datanames within the
>>>> IUCr domain. The following policy is designed to maximise the chances
>>>> that the status and meaning of any dataname encountered in the IUCr
>>>> domain is unambiguous. A dataname is considered to be within the IUCr
>>>> domain if the proposed _audit.discipline dataname has the value
>>>> 'IUCr'.
>>>> (1) Datanames not explicitly approved by COMCIFS and appearing in CIF
>>>> datafiles should either contain the string '[local]' or commence with
>>>> a prefix handed out by COMCIFS
>>>> (2) COMCIFS makes no undertakings as to the uniqueness of datanames
>>>> containing the string '[local]'.
>>>> (3) In the register of approved prefixes, COMCIFS may provide
>>>> certification that datanames with a given prefix will be unique. In
>>>> order to obtain this certification, a prefix assignee should:
>>>> (i) publish a publically-available dictionary defining all datanames
>>>> with that prefix
>>>> (ii) have an organisational structure judged capable of enforcing
>>>> dataname policy (a single person also suits this criterion)
>>>> (4) Alternatively, if a prefix assignee provides to the IUCr a
>>>> dataname dictionary and advises that the prefix is no longer in use,
>>>> the IUCr will archive that dictionary and certify that the prefix is
>>>> unique. If later workers wish to re-use such a 'closed' prefix, they
>>>> must not define any items that appear in such archived dictionaries.
>>>> (5) The IUCr cannot provide any guarantees as to the correctness or
>>>> uniqueness of definitions in dictionaries published by third parties.
>>>> COMCIFS may choose, on request, to bring such third party dictionaries
>>>> into the IUCr domain, in which case datanames and details of
>>>> definitions may change.
>>>> --
>>>> T +61 (02) 9717 9907
>>>> F +61 (02) 9717 3145
>>>> M +61 (04) 0249 4148
>>>> _______________________________________________
>>>> comcifs mailing list
>>>> comcifs@iucr.org
>>>> http://mailman.iucr.org/mailman/listinfo/comcifs
>>> _______________________________________________
>>> comcifs mailing list
>>> comcifs@iucr.org
>>> http://mailman.iucr.org/mailman/listinfo/comcifs


John Westbrook, Ph.D.
RCSB, Protein Data Bank
Rutgers, The State University of New Jersey
Department of Chemistry and Chemical Biology
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087
e-mail: jwest@rcsb.rutgers.edu
Ph: (848) 445-4290 Fax: (732) 445-4320
comcifs mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.