Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dictionary release policy

James
This seems like a reasonable proposal.
Peter MR responded with a comment on versioning, and there is some discussion on the github site, which looks fine to me:https://github.com/COMCIFS/comcifs.github.io/blob/master/accepted/dictionary_development_practices.md
I see some further general discussion on the GitHub page that Jamesreferences, and make the following remarks in response to that.
Antanas asked 'One thing that should be further specified is which oneof the multiple dictionary locations (GitHub, IUCr website, Zenodo,etc.) should be regarded as the original one and reflected in thedictionary URL', to which James replied 'My idea on URLs is that weshould mint a DOI for the dictionary and use the dx.doi URL for theURL in the dictionary'.
I think that's probably the right long-term decision, but the idea ofdictionary DOI registration has been hovering around for a while, andis not necessarily an easy solution to implement, because of therequirements to characterise the registration with appropriatemetadata. What's 'appropriate' is well defined by CrossRef forjournal articles, but less clear-cut for other digital object types.Note that, strictly, DOI is an identifier only; the dx.doi.orgresolver is a particular mechanism implemented by CrossRef, andit's making the resolvers work that requires well-defined metadatacharacterising the registered digital object. As a purist I'm nothappy that CrossRef insists that DOIs should be cited with theresolver URL.
In answer to Antanas' question about the preferred dictionarylocation, I would certainly prefer that canonical approved dictionariesare attributed to the IUCr. But currently the dictionary registercontains URLs that use the ftp scheme, e.g.
   data_validation_dictionaries     loop_       _cifdic_dictionary.name       _cifdic_dictionary.version       _cifdic_dictionary.DDL_compliance       _cifdic_dictionary.reserved_prefix       _cifdic_dictionary.date       _cifdic_dictionary.URL       _cifdic_dictionary.description
   ######################################################################   # COMCIFS approved dictionaries                                      #   ######################################################################     cif_core.dic    .    1.4.1   .    .       ftp://ftp.iucr.org/pub/cifdics/cif_core.dic       'Core CIF Dictionary'

The register itself is advertised in International Tables G 1st editionat the location ftp://ftp.iucr.org/pub/cifdics/cifdic.register, butthe IUCr CIF dictionaries page links to it at the URLhttps://www.iucr.org/__data/iucr/cif/dictionaries/cifdic.register
Now, we advertised the ftp: address since the early 1990s on the assumption that ftp was a simple and durable protocol that wouldreliably be supported indefinitely. And the register and itsreferenced dictionaries are indeed all still available using thoseURLs. That is, if you use a software client that interprets thoseURLs as requests over the FTP protocol to the cited locations, youwill fetch the desired files.
However, until recently such "a software client" would have included allcommon browsers, so you could simply click on a link in Chromeor Firefox to download the file. Browsers now no longer supportthe ftp protocol, so this makes the retrieval procedure less convenientfor most users.
What are this group's thoughts on how best to approach this? Now thatI've retired from the IUCr, we'll need to interact with the Chesteroffice to see what they can implement, but I'd appreciate a feel fora preferred solution before I talk to them. A few considerations:
[1] I value long-term stability, so would like to see the resourcesstill available over ftp, at least as one option, though I guess thething to do there would be to look at the ftp logs in Chester to seeif any dictionaries are in fact being downloaded from the ftp server.
[2] I note that in a similar situation the PDB now advertises resourcesas https://ftp.wwpdb.org/pub/pdb/data/structures/... (i.e. theirserver ftp.wwpdb.org handle both ftp and http/https schemes). For IUCrto do so would require running a web server on the ftp server, orconfiguring the nginx proxy that sits in front of IUCr web services totranslate the request to a different location on our existing website- something we will have to discuss with the Chester office.
[3] Currently all the dictionaries are available also from the mainIUCr website, but with those slightly ugly /__data/ components in theURLs. These are an artefact of the content management system currentlyin use, but I understand IUCr will move away from that system in thefuture, so some effort would need to go into an http/https namingscheme that would be robust across different webserver platforms.
[4] Should _cifdic_dictionary.URL be replaced by scheme-specificdata items (_cifdic_dictionary.URL_ftp, _cifdic_dictionary.URL_httpsetc.)? Or should one allow for a new category CIFDIC_DICTIONARY_LOCATIONso that an arbitrary number of locations can be specified for each dictionary? This is the most general solution and could of course include DOI and other persistent URL formulations. One could then easilylist multiple locations at IUCr, GitHub, Zenodo etc., though it wouldprobably then also be worth thinking about providing MD5 hashes toconfirm that all such mirror copies were identical.
[5] Or should we just focus on a DOI-based solution? However,I note that the PDB DOIs now resolve to a "landing page" (e.g. http://dx.doi.org/10.2210/pdb5cro/pdb) whereas in earlier dayssuch a DOI would immediately download the required PDB file.I do see some benefit of providing an address that is known toserve directly the resource you want (i.e. the dictionary file),so that for instance validation program can automatically loadthe current version. So having citable DOIs to reference adictionary doesn't entirely remove the need for somethinglike the existing register.
Brian




On 10/10/2022 06:18, James H via comcifs wrote:> Dear COMCIFS,> > Activity is picking up on the dictionary development front, particularly > regarding powder and core. This has led me to wonder about developing > some sort of process for releasing dictionary updates. See the below > message that I've raised as an issue on Github (see > https://github.com/COMCIFS/cif_core/issues/307 > <https://github.com/COMCIFS/cif_core/issues/307>). Please feel free to > respond there or here. Message follows:> > We should develop some sort of dictionary release policy. At the moment > we commit updates to the master branch of the dictionary on Github, and > no further release activity happens. The status of the release is > unclear: is it official once the commit is made? It certainly has an > internal version number. I suggest we develop a process. Here is a start:> >  1. A dictionary becomes official once it has been tagged on Github as a>     Github release>  2. A Github release should be simultaneously reflected on the main IUCr>     website as the latest version of the dictionary>  3. The machine-readable IUCr dictionary catalogue should be updated at>     the same time as (2)>  4. There should be one release at least every 3 months unless a>     dictionary has not changed in that time.>  5. A dictionary may be released sooner than every 3 months if there is>     an urgent need>  6. Approximately one week before the official release date relevant>     IUCr mailing lists should be advised of the forthcoming release>     together with a summary of changes>  7. A "release manager" is nominated for each dictionary and is>     responsible for managing the release process.> > Thoughts? Experiences?> > thanks,> James.> > -- > T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148_______________________________________________comcifs mailing listcomcifs@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs

Reply to: [list | sender only]