[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Additional update to core dictionary
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Additional update to core dictionary
- From: Brian McMahon <bm@iucr.org>
- Date: Mon, 28 Mar 2011 23:16:20 +0100
- In-Reply-To: <472620FF2D2FBB4BB62FD1285C58A04F92441645AE@mail01.ccdc.cam.ac.uk>
- References: <20110324102821.GB3581@emerald.iucr.org><1893558218.22554.1300989799369.JavaMail.open-xchange@oxapp1.inap.sea.dotster.net><472620FF2D2FBB4BB62FD1285C58A04F92441645AE@mail01.ccdc.cam.ac.uk>
JK> I think the idea of a definition for a document DOI is uncontroversial. JK> It's pretty common these days to access a journal article by its DOI. JK> Given the increasing interest in curation of raw data, might this JK> idea get more complex? Would the article DOI also include the raw JK> data, or would multiple DOIs need to be accommodated? Say one for JK> the paper and another for the raw data? Just so. MT> AFAIK some journals are now using separate DOI for the supplementary MT> data, that said many use a single DOI for the article then the MT> data are accessed as sub-pages from this. If multiple DOI for a MT> single article were supported I suggest having a single DOI MT> item for the article then a separate loop for any associated data e.g. MT> MT> _journal_article_doi ABC1234 MT> loop_ MT> _journal_article_related_doi MT> ABC5678 MT> ABC8970 OK, since there's some interest in this, let me share with you our thinking so far. The following notes are from our internal discussion document at the Acta offices. ============================================================================== PROPOSAL FOR INCORPORATING DOI IDENTIFIERS IN CIF ------------------------------------------------- (1) "Journal housekeeping, citation and indexing entries" The _journal_ items (not usually modified by an author) are currently used to record the bibliographic information about the article published from the current CIF. The simplest extension would be: _journal_paper_doi '10.1107/S010876739101067X' For supplementary materials, the category currently includes _journal_suppl_publ_number _journal_suppl_publ_pages designed to record the old SUP number and number of pages, both of which were traditionally published in the deposition footnote. These items cannot however be looped. So one might introduce a new loop allowing greater characterization of "supporting" documents (hence _sup_ instead of _suppl_): loop_ _journal_sup_material_id _journal_sup_material_role _journal_sup_material_mime_type _journal_sup_material_doi 1 cif chemical/x-cif 10.1107/S0108768110051050/wh5011sup1.cif 2 hkl text/plain 10.1107/S0108768110051050/wh5011Pbar1sup2.hkl 3 hkl text/plain 10.1107/S0108768110051050/wh5011P21csup3.hkl 4 rtv text/plain 10.1107/S0108768110051050/wh5011Pbar1sup4.rtv 5 rtv text/plain 10.1107/S0108768110051050/wh5011P21csup5.rtv 6 extra application/pdf 10.1107/S0108768110051050/wh5011sup6.pdf Possible enumerations for _journal_sup_material_role: cif 'structural data model in CIF format' mmcif 'structural data model in mmCIF format' (or mcf if we want to promote standard filename extensions) hkl 'structure factors' rtv 'Rietveld powder data' extra 'additional article content (e.g. figures, tables, appendices)' data 'supporting data in a machine-parseable format' QUESTION: Is it OK to assume that it is not necessary to make an explicit connection with _journal_paper_doi - i.e. that all of these items are implicitly associated with the one publication derived from this CIF? (I think it is.) (2) "Contents of a publication" The _publ_ category concerns the content of a publication and are created/edited by the author. We introduce a category that allows the listing of links to related materials. loop_ _publ_related_id _publ_related_citation_id _publ_related_publisher _publ_related_link_identifier _publ_related_link_identifier_type _publ_related_role _publ_related_details 1 . pdb 2zse refcode struct . 2 . pdb 10.2210/pdb2zse/pdb doi struct . 3 . uniprotkb P63810 refcode seq . 4 . pdb r2zsesf refcode relsfac . 5 . pdb 2zs7 refcode relstruct 'citrate complex' 6 . icsd 161730 refcode relstruct . 7 . csd ADENTP01 refcode relstruct . 8 1 ? 10.1074/jbc.C500044200 doi relpub . 9 1 ? 15795230 pmid relpub . 10 2 ? 123456 casreg relchem . _publ_related_citation_id allows one of these links to be cross-referenced to a structured entry in the reference list (CITATION family of CIF categories) _publ_related_publisher is possibly not well named: all these examples are databases, and authors might not know the publisher of a journal (or a journal publisher could change over the lifetime of a journal). Examples here of _publ_related_link_identifier_type include "refcode" (meaning any accession code that is local to a particular database, not just a CSD 'refcode'), PubMed ID and CAS registry number (but are these inherently different from "refcode") and DOI. The notion behind "doi" is that you can figure out how to use it directly (e.g. http://dx.doi.org/blah....). Maybe a better scheme would be doi, url, uri, urn, refcode ? Examples of the values permitted for _publ_related_role might be: relchem 'related chemical compound' relpub 'related publication' relseq 'peptide or nucleotide sequence of related structure' relsfac 'structure factors of related structure' relstruct 'related structural model' seq 'peptide or nucleotide sequence for a structure in this publication' sfac 'structure factors for a structure in this publication' struct 'structural model in this publication' ============================================================================== Comments welcome. So, it's a more complex scheme than Matt suggested for the supporting materials for the published article, in order to express the relationships between the document components; and we started to work on a parallel scheme for derivative or related data sets. However, the reason we didn't work this up further was the sense that locating and annotating such information was likely to be something that authors would not do reliably (or at all). Further, in many cases an identifier for a related data set might not be available when the article is submitted, or published; but it's not feasible for the journal to locate such information subsequently. In the end, we believed that so rich an annotation would be unworkable; but we're willing to revisit the topic when we're convinced that we could capture a significant amount of such information at a sufficiently early stage. However, we would be interested in your thoughts as to the categorisation of relationships between related and derivative data sets as suggested above. In particular, if you're aware of any widespread declarative ontologies (such as the "Cito" ontology for expressing relationships between cited publications) that could be mined for such relationships, I'd be very interested. Regards Brian
Reply to: [list | sender only]
- Follow-Ups:
- RE: Additional update to core dictionary (Matthew Towler)
- References:
- Additional update to core dictionary (Brian McMahon)
- Re: Additional update to core dictionary (jim kaduk)
- RE: Additional update to core dictionary (Matthew Towler)
- Prev by Date: RE: Additional update to core dictionary
- Next by Date: RE: Additional update to core dictionary
- Prev by thread: RE: Additional update to core dictionary
- Next by thread: RE: Additional update to core dictionary
- Index(es):