Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Expanding category key for publ_author.name

  • Subject: Re: Expanding category key for publ_author.name
  • From: Simon Westrip <simonwestrip@xxxxxxxxxxxxxx>
  • Date: Fri, 28 Jun 2019 15:11:00 +0000 (UTC)
  • Cc: Forum for CIF Software Developers <cif-developers@xxxxxxxx>
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=btinternet.com; s=s2048;t=1561734670; bh=8md1baDZRsJg6KkANBuF68XmkGi5+dqjPjd3oHBdfag=;h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:From:Subject;b=LUNeYlWRdpw9ZNqx+iRtPJpmJsf0lWvG2/yuGEREF2gQ3LflBkTJfXnQ31cMeof2Bs+EA2K5Gmsq0tiwbkMYczD7GTzV4WA2Z2ip9g8gttb0E/2pDKOI69rpn1hVYtU8z//40uSKsWdV5bmA6/LMCUDgMEjnPfNtld1yHBKW11h8evdXM55Xycrjm+mHaTOZLj5sAd6cklXXTdU1QOOQt/rWqD+4o4AiS/eCMCnRtbRAu3UjK67DJ2ywi+vjs1G2+EGca0AXt50ZreBMObCmx0oR2xo8ptyGTsyCUHH9bIc8eRGurFMokLYc6Z85sE5EKQ3ctIF8bvMMrjvaZBrIcA==
  • In-Reply-To: <CAM+dB2cXofDA7RSmi19uz04tKm5RFG3rAz9EfpL41ZeG+8aOtQ@mail.gmail.com>
  • References: <CAM+dB2cXofDA7RSmi19uz04tKm5RFG3rAz9EfpL41ZeG+8aOtQ@mail.gmail.com>
I think Solution (1) is less disruptive.

The publ_author category has traditionally relied on 'conventions' with respect to representing data
(e.g. _publ_author_name ought to be 'family name, initials')
and overall lacks the level of granularity that is becoming increasingly important
(e.g. _publ_author_address can contain any number of addresses as arbitrary text;
recognizing institutions/countrys relies on heuristics...)

That is, the publ_author category does not currently lend itself to more granular data collection
and I dont think it is linked to by other categories,
so rather than redefine it in a way that totally invalidates current practice [Solution (2)],
I would look forward to new categories and items to be defined as alternatives, fully exploiting the
key definitions...

I think there are other (DDL1) categories where the same may apply, e.g.
_audit_author (though I note that in the DDL2 pdbx dic there is_audit_author.pdbx_ordinal...)

So I'm in favour of a solution whereby keys can be inferred from the order in which loop rows appear
in the CIF in cases where there the key items are absent from the CIF
(in much the same way as space_group_symop_id is often inferred from the order in which space_group_symop_operation is listed).



On Friday, 28 June 2019, 05:38:12 GMT+1, James Hester <jamesrhester@gmail.com> wrote:

Dear Cif-developers,

The CIF core group has not expressed an opinion on the following proposal, so before going ahead with one of the suggested fixes, I would like your opinion as CIF software developers and database curators on the options presented:

The publ_author category has _publ_author.name as the category key, meaning that _publ_author.name can be used to select a unique row of the loop. However, it has been pointed out that _publ_author.name is insufficient as a key for the _publ_author loop, as some authors have the same name (there are apparently 40 entries in the COD with this feature). 

Solution (1): Add a further disambiguating data name to _publ_author. The suggestion is that something like "_publ_author.id" could be defined, which would contain an arbitrary code and which would be added to the category key.  The lack of this dataname in historical CIFs can be worked around by assuming it has a constant value, and CIF curators can auto-generate it when a situation is encountered with multiple identically-named authors.

Solution (2): Define "_publ_author.id" as the new key of the category that can be auto-generated from each packet. While in a formal sense this makes all current CIFs non-conformant, I can't imagine that it would affect most software, which will continue to work with _publ_author.name.

Please indicate your preference or alternative solutions. Particularly important is the perspective of software authors and database curators who may be impacted.

James. (CIF chair/acting core CIF DMG chair)
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
cif-developers mailing list
cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.