Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Provence and property rights

At 13:51 20/09/2004 +0100, Brian McMahon wrote:
>A quick initial comment. It's inevitable that CIF data will be repurposed
>in CIF format, so assertions about intellectual property and redistribution
>rights should be
>(a) part of the audit history of the file; and
>(b) verifiable against checksums.

Fully agreed.

>So we might have (without any claim that the suggested data names are
>2002   'W. Plinge'   .   '25219b1586fa67a279ef9fb988d23c19'
>2003   'J. Doe'      ?   '6cd63e9ef1f1e3117f67addfb497bb9c'
>2004   'American Chemical Society'
>              'Transferred when submitted for publication'
>              '6cd63e9ef1f1e3117f67addfb497bb9c'

I think this is looking promising

>(1) While not relevant to the technical discussion, I'm curious to know the
>circumstances in which Peter envisages Plinge transferring the copyright to
>Doe - or do they jointly own the copyright, but from different dates?
No idea! - and I agree that metadata needs to be structured here. The 
example was meant to represent something like:
Copyright W.Plinge 2002
Copyright J.Doe 2003

I'm guessing that W.Plinge was a graduate student who went off to be an 
actor and that J.Doe was the supervisor who had to write it up....

>(2) The purpose of the checksum is to validate that a file matching that
>checksum is (probably) the identical file to which the associated assertion
>relates. If the file has been changed in any way, there is no way to
>reverse-engineer the changes to reproduce the file corresponding to the
>stated checksum. On the other hand, if there is a dispute and Plinge (let us
>say) can produce an original file with the relevant checksum, that will
>provide evidence to support his intellectual property claims.

Looks fine, but it may be fragile wrt whitespace and operating systems. XML 
has a canonicalisation which addresses these problems by normalising 
whitespace and line-ends, etc.

>(3) In my example, the checksums for Doe and the ACS are the same (which
>almost certainly wouldn't be the case if a true MD5 checksum were used). Do
>we want a checksum that validates the *exact* content of a file (so that
>you need to preserve OS-dependent line endings, comments etc) or that simply
>in some way validates the "essential contents" of the file, e.g. excluding
>the copyright assertions?
My personal opinion would be to use XML technology to generate the 
checksum.  Otherwise I suspect it will be a fair amount of work

>(4) Requiring mandatory checksum generation may be too heavy a burden on
>older CIF writers, but perhaps we can aim for a start to generate such
>things for the CIFs redistributed off the IUCr web site.

Seems a good idea.

>(5) Is there a case for including some sort of digital signature (where
>available) into each loop packet to strengthen the associated rights

Yes - again XML has this technology and perhaps it could be borrowed.

>Present practice for Acta C and E papers is that they are submitted as CIFs.
>These enhanced CIFs include the text of the paper; the author transfers
>copyright of this material to us. (By the way that simply is present
>practice - we're happy to make other arrangements if the author wishes to
>retain copyright or if there is a general movement in that direction.)
>Since we may change the text during editing, in practice we carry the
>copyright along into the final version of the paper, and we don't wish to
>expose the early draft to public redistribution. Therefore the CIFs served
>as supplementary materials represent only the data component of the
>submitted CIF - that is, they are a subset.

I agree this needs some thinking. The author creates a complete paper for a 
journal, checks it against CheckCIF and finds it needs editing. CheckCIF is 
(I think) the property of the IUCr (at least its use is an added value 
service). So the journal could claim that this is added value. The CIF 
*may* also contain added value through DOIs, codes, etc. Although it's not 
part of this discussion my suggestions would be that what is posted is:
- just the data component of the CIF
- after checking and correcting where appropriate
- with addition of DOIs and other bibliographic.code support so that the 
publisher can be located. (OA evangelists like Stevan Harnad assert that 
open publication enhances citation, so CIFs should retain this link)
- and copyright if possible
- and some checksum of provenance and integrity

The copying and reuse of the CIF will depend on the author, publishers, 
etc. In general reuse should include provenance and integrity.

>A general legal question: is a licence to redistribute (according to the NIH
>or BOAI model, say) the sole prerogative of the copyright owner? If so, then
>we would need to think rather carefully how to manage the serving of data
>CIFs from our site that came from different authors who wish to retain
>copyright but license redistribution under various conditions. A controlled
>vocabulary would most certainly help here, so that we could implement a
>policy to redistribute anything tagged with certain approved prorocols.
There will certainly be questions we cannot answer at present and that 
might (ultimately) have to be tested in judicial processes. However we 
should try to build a framework that allows all parties to state their 
position within the CIF which at least should make the position clearer.


Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069

Reply to: [list | sender only]