Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dictionary release policy

Dear COMCIFS,

I've initiated our very first release process from Github over at https://github.com/COMCIFS/cif_core/issues/317. Hopefully this first attempt will iron out some kinks and address some of Brian's comments in the process. Feel free to follow along on Github. Those of you on the cif_core or ccp4 mailing lists will be advised that a release is pending at some point.

all the best,
James.

On Thu, 27 Oct 2022 at 23:15, Brian McMahon via comcifs <comcifs@iucr.org> wrote:
James

This seems like a reasonable proposal.

Peter MR responded with a comment on versioning, and there is some
discussion on the github site, which looks fine to me:
https://github.com/COMCIFS/comcifs.github.io/blob/master/accepted/dictionary_development_practices.md

I see some further general discussion on the GitHub page that James
references, and make the following remarks in response to that.

Antanas asked 'One thing that should be further specified is which one
of the multiple dictionary locations (GitHub, IUCr website, Zenodo,
etc.) should be regarded as the original one and reflected in the
dictionary URL', to which James replied 'My idea on URLs is that we
should mint a DOI for the dictionary and use the dx.doi URL for the
URL in the dictionary'.

I think that's probably the right long-term decision, but the idea of
dictionary DOI registration has been hovering around for a while, and
is not necessarily an easy solution to implement, because of the
requirements to characterise the registration with appropriate
metadata. What's 'appropriate' is well defined by CrossRef for
journal articles, but less clear-cut for other digital object types.
Note that, strictly, DOI is an identifier only; the dx.doi.org
resolver is a particular mechanism implemented by CrossRef, and
it's making the resolvers work that requires well-defined metadata
characterising the registered digital object. As a purist I'm not
happy that CrossRef insists that DOIs should be cited with the
resolver URL.

In answer to Antanas' question about the preferred dictionary
location, I would certainly prefer that canonical approved dictionaries
are attributed to the IUCr. But currently the dictionary register
contains URLs that use the ftp scheme, e.g.

   data_validation_dictionaries
     loop_
       _cifdic_dictionary.name
       _cifdic_dictionary.version
       _cifdic_dictionary.DDL_compliance
       _cifdic_dictionary.reserved_prefix
       _cifdic_dictionary.date
       _cifdic_dictionary.URL
       _cifdic_dictionary.description

   ######################################################################
   # COMCIFS approved dictionaries                                      #
   ######################################################################
     cif_core.dic    .    1.4.1   .    .
       ftp://ftp.iucr.org/pub/cifdics/cif_core.dic
       'Core CIF Dictionary'


The register itself is advertised in International Tables G 1st edition
at the location ftp://ftp.iucr.org/pub/cifdics/cifdic.register, but
the IUCr CIF dictionaries page links to it at the URL
https://www.iucr.org/__data/iucr/cif/dictionaries/cifdic.register

Now, we advertised the ftp: address since the early 1990s on the
assumption that ftp was a simple and durable protocol that would
reliably be supported indefinitely. And the register and its
referenced dictionaries are indeed all still available using those
URLs. That is, if you use a software client that interprets those
URLs as requests over the FTP protocol to the cited locations, you
will fetch the desired files.

However, until recently such "a software client" would have included all
common browsers, so you could simply click on a link in Chrome
or Firefox to download the file. Browsers now no longer support
the ftp protocol, so this makes the retrieval procedure less convenient
for most users.

What are this group's thoughts on how best to approach this? Now that
I've retired from the IUCr, we'll need to interact with the Chester
office to see what they can implement, but I'd appreciate a feel for
a preferred solution before I talk to them. A few considerations:

[1] I value long-term stability, so would like to see the resources
still available over ftp, at least as one option, though I guess the
thing to do there would be to look at the ftp logs in Chester to see
if any dictionaries are in fact being downloaded from the ftp server.

[2] I note that in a similar situation the PDB now advertises resources
as https://ftp.wwpdb.org/pub/pdb/data/structures/... (i.e. their
server ftp.wwpdb.org handle both ftp and http/https schemes). For IUCr
to do so would require running a web server on the ftp server, or
configuring the nginx proxy that sits in front of IUCr web services to
translate the request to a different location on our existing website
- something we will have to discuss with the Chester office.

[3] Currently all the dictionaries are available also from the main
IUCr website, but with those slightly ugly /__data/ components in the
URLs. These are an artefact of the content management system currently
in use, but I understand IUCr will move away from that system in the
future, so some effort would need to go into an http/https naming
scheme that would be robust across different webserver platforms.

[4] Should _cifdic_dictionary.URL be replaced by scheme-specific
data items (_cifdic_dictionary.URL_ftp, _cifdic_dictionary.URL_https
etc.)? Or should one allow for a new category CIFDIC_DICTIONARY_LOCATION
so that an arbitrary number of locations can be specified for each
dictionary? This is the most general solution and could of course
include DOI and other persistent URL formulations. One could then easily
list multiple locations at IUCr, GitHub, Zenodo etc., though it would
probably then also be worth thinking about providing MD5 hashes to
confirm that all such mirror copies were identical.

[5] Or should we just focus on a DOI-based solution? However,
I note that the PDB DOIs now resolve to a "landing page" (e.g.
http://dx.doi.org/10.2210/pdb5cro/pdb) whereas in earlier days
such a DOI would immediately download the required PDB file.
I do see some benefit of providing an address that is known to
serve directly the resource you want (i.e. the dictionary file),
so that for instance validation program can automatically load
the current version. So having citable DOIs to reference a
dictionary doesn't entirely remove the need for something
like the existing register.

Brian





On 10/10/2022 06:18, James H via comcifs wrote:
> Dear COMCIFS,
>
> Activity is picking up on the dictionary development front, particularly
> regarding powder and core. This has led me to wonder about developing
> some sort of process for releasing dictionary updates. See the below
> message that I've raised as an issue on Github (see
> https://github.com/COMCIFS/cif_core/issues/307
> <https://github.com/COMCIFS/cif_core/issues/307>). Please feel free to
> respond there or here. Message follows:
>
> We should develop some sort of dictionary release policy. At the moment
> we commit updates to the master branch of the dictionary on Github, and
> no further release activity happens. The status of the release is
> unclear: is it official once the commit is made? It certainly has an
> internal version number. I suggest we develop a process. Here is a start:
>
>  1. A dictionary becomes official once it has been tagged on Github as a
>     Github release
>  2. A Github release should be simultaneously reflected on the main IUCr
>     website as the latest version of the dictionary
>  3. The machine-readable IUCr dictionary catalogue should be updated at
>     the same time as (2)
>  4. There should be one release at least every 3 months unless a
>     dictionary has not changed in that time.
>  5. A dictionary may be released sooner than every 3 months if there is
>     an urgent need
>  6. Approximately one week before the official release date relevant
>     IUCr mailing lists should be advised of the forthcoming release
>     together with a summary of changes
>  7. A "release manager" is nominated for each dictionary and is
>     responsible for managing the release process.
>
> Thoughts? Experiences?
>
> thanks,
> James.
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs

Reply to: [list | sender only]