Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Expanding category key for publ_author.name

Dear All,

It has taken me rather a long time (as usual) to get back to this. I am not sure in which direction Simon has landed, as initially it looks like (1) was preferred as being less disruptive, but the final paragraph seems to be a wish for auto-generation of keys as per option (2).  Option (2) would be disruptive only insofar as CIF *writers* who were aware of the new key would no longer feel the need to disambiguate author names by some device, and CIF readers who relied on whatever ad-hoc device is used would get confused - but that is true of both options.  Software that was unaware of the the new data name would continue to be unaware until updated, and function precisely as it has in the past.

Herbert - if we use ORCID ID, we cannot autogenerate it for legacy CIFs and we cannot assume that all authors will have one.

Anyway, thanks for the input. This topic will now be picked up again at https://github.com/COMCIFS/cif_core/issues/103 and then presented to the core dictionary management group, if anybody wants to follow along.

Picking up on Simon's comments, we could separately think about a completely revamped approach to authors, including allowing multiple addresses (email and real world) and multiple roles (another topic that has come up in the past). I will raise that comment as an issue on Github so as not to lose sight of it.

all the best,

On Sat, 29 Jun 2019 at 01:11, Simon Westrip <simonwestrip@btinternet.com> wrote:
I think Solution (1) is less disruptive.

The publ_author category has traditionally relied on 'conventions' with respect to representing data
(e.g. _publ_author_name ought to be 'family name, initials')
and overall lacks the level of granularity that is becoming increasingly important
(e.g. _publ_author_address can contain any number of addresses as arbitrary text;
recognizing institutions/countrys relies on heuristics...)

That is, the publ_author category does not currently lend itself to more granular data collection
and I dont think it is linked to by other categories,
so rather than redefine it in a way that totally invalidates current practice [Solution (2)],
I would look forward to new categories and items to be defined as alternatives, fully exploiting the
key definitions...

I think there are other (DDL1) categories where the same may apply, e.g.
_audit_author (though I note that in the DDL2 pdbx dic there is_audit_author.pdbx_ordinal...)

So I'm in favour of a solution whereby keys can be inferred from the order in which loop rows appear
in the CIF in cases where there the key items are absent from the CIF
(in much the same way as space_group_symop_id is often inferred from the order in which space_group_symop_operation is listed).



On Friday, 28 June 2019, 05:38:12 GMT+1, James Hester <jamesrhester@gmail.com> wrote:

Dear Cif-developers,

The CIF core group has not expressed an opinion on the following proposal, so before going ahead with one of the suggested fixes, I would like your opinion as CIF software developers and database curators on the options presented:

The publ_author category has _publ_author.name as the category key, meaning that _publ_author.name can be used to select a unique row of the loop. However, it has been pointed out that _publ_author.name is insufficient as a key for the _publ_author loop, as some authors have the same name (there are apparently 40 entries in the COD with this feature). 

Solution (1): Add a further disambiguating data name to _publ_author. The suggestion is that something like "_publ_author.id" could be defined, which would contain an arbitrary code and which would be added to the category key.  The lack of this dataname in historical CIFs can be worked around by assuming it has a constant value, and CIF curators can auto-generate it when a situation is encountered with multiple identically-named authors.

Solution (2): Define "_publ_author.id" as the new key of the category that can be auto-generated from each packet. While in a formal sense this makes all current CIFs non-conformant, I can't imagine that it would affect most software, which will continue to work with _publ_author.name.

Please indicate your preference or alternative solutions. Particularly important is the perspective of software authors and database curators who may be impacted.

James. (CIF chair/acting core CIF DMG chair)
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
cif-developers mailing list

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.