[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Expanding category key for publ_author.name

Subject: Re: Expanding category key for publ_author.name
From: James Hester <jamesrhester@xxxxxxxxx>
Date: Mon, 30 Sep 2019 12:15:45 +1000
Cc: Forum for CIF Software Developers <cif-developers@xxxxxxxx>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;h=mime-version:references:in-reply-to:reply-to:from:date:message-id:subject:to:cc;bh=odP9swhaTpt4IjZCG2NnF0Asd+i7/Xyp+UAJN4MNYtw=;b=jPTKmXoqTWOdG8ckzfMXE6+v2KALBpebVVCp/xjylEhmci8WC6Vqasfr+yuPLXc8FAXFjGml1nukCX8gzG6NpELttmbq1lMlbL9mXbWx/ZVKnDLHRBcfBxtj77xXniiGS+hFhNBoyPIqHFhAlPWKbamkK+6zaLnIAzeH1Bw0uZ0ykdGaz9GtHjpkYkGzpvzvJyIF/FeoeudnbX+/nZgq0+F5OZiSUfFoa8sgIWjUUmqXm+EWUGC67Vz9G0KiF182pQ6h7jxo+3zJokCKjvx5z3AJVhlNVBSrZponHdG2SO2UXVEA1JJ+9orN9pr7tf5UzkSRPf4dTduCqCeDzVAnvw==
In-Reply-To: <[email protected]>
References: <CAM+dB2cXofDA7RSmi19uz04tKm5RFG3rAz9EfpL41ZeG+8aOtQ@mail.gmail.com><[email protected]>

Dear All,

It has taken me rather a long time (as usual) to get back to this. I am not sure in which direction Simon has landed, as initially it looks like (1) was preferred as being less disruptive, but the final paragraph seems to be a wish for auto-generation of keys as per option (2). Option (2) would be disruptive only insofar as CIF *writers* who were aware of the new key would no longer feel the need to disambiguate author names by some device, and CIF readers who relied on whatever ad-hoc device is used would get confused - but that is true of both options. Software that was unaware of the the new data name would continue to be unaware until updated, and function precisely as it has in the past.

Herbert - if we use ORCID ID, we cannot autogenerate it for legacy CIFs and we cannot assume that all authors will have one.

Anyway, thanks for the input. This topic will now be picked up again at https://github.com/COMCIFS/cif_core/issues/103 and then presented to the core dictionary management group, if anybody wants to follow along.

Picking up on Simon's comments, we could separately think about a completely revamped approach to authors, including allowing multiple addresses (email and real world) and multiple roles (another topic that has come up in the past). I will raise that comment as an issue on Github so as not to lose sight of it.

all the best,

James.

On Sat, 29 Jun 2019 at 01:11, Simon Westrip <[email protected]> wrote:

I think Solution (1) is less disruptive.

The publ_author category has traditionally relied on 'conventions' with respect to representing data
(e.g. _publ_author_name ought to be 'family name, initials')
and overall lacks the level of granularity that is becoming increasingly important
(e.g. _publ_author_address can contain any number of addresses as arbitrary text;
recognizing institutions/countrys relies on heuristics...)

That is, the publ_author category does not currently lend itself to more granular data collection
and I dont think it is linked to by other categories,
so rather than redefine it in a way that totally invalidates current practice [Solution (2)],
I would look forward to new categories and items to be defined as alternatives, fully exploiting the
key definitions...

I think there are other (DDL1) categories where the same may apply, e.g.
_audit_author (though I note that in the DDL2 pdbx dic there is_audit_author.pdbx_ordinal...)

So I'm in favour of a solution whereby keys can be inferred from the order in which loop rows appear
in the CIF in cases where there the key items are absent from the CIF
(in much the same way as space_group_symop_id is often inferred from the order in which space_group_symop_operation is listed).

Cheers

Simon

On Friday, 28 June 2019, 05:38:12 GMT+1, James Hester <[email protected]> wrote:

Dear Cif-developers,

The CIF core group has not expressed an opinion on the following proposal, so before going ahead with one of the suggested fixes, I would like your opinion as CIF software developers and database curators on the options presented:

The publ_author category has _publ_author.name as the category key, meaning that _publ_author.name can be used to select a unique row of the loop. However, it has been pointed out that _publ_author.name is insufficient as a key for the _publ_author loop, as some authors have the same name (there are apparently 40 entries in the COD with this feature).

Solution (1): Add a further disambiguating data name to _publ_author. The suggestion is that something like "_publ_author.id" could be defined, which would contain an arbitrary code and which would be added to the category key. The lack of this dataname in historical CIFs can be worked around by assuming it has a constant value, and CIF curators can auto-generate it when a situation is encountered with multiple identically-named authors.

Solution (2): Define "_publ_author.id" as the new key of the category that can be auto-generated from each packet. While in a formal sense this makes all current CIFs non-conformant, I can't imagine that it would affect most software, which will continue to work with _publ_author.name.

Please indicate your preference or alternative solutions. Particularly important is the perspective of software authors and database curators who may be impacted.

thanks,
James. (CIF chair/acting core CIF DMG chair)
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif-developers mailing list
[email protected]
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

_______________________________________________
cif-developers mailing list
[email protected]
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]

References:

Expanding category key for publ_author.name (James Hester)

Re: Expanding category key for publ_author.name (Simon Westrip)

Prev by Date: Re: Expanding category key for publ_author.name

Next by Date: mmCIF specification

Prev by thread: Re: Expanding category key for publ_author.name

Next by thread: Announcing a Julia module for reading CIF/mmCIF files and DDLmdictionaries

Index(es):

Date

Thread

Discussion List Archives

Re: Expanding category key for publ_author.name