Re: Additional update to core dictionary


> I worry about including the mime_type in the CIF.  I feel that the
> DOI is a permanent reference to some associated data, but that the
> format could change.  As an example DOI might reference a pdf this
> year, but might move to some other document format in future.

That's a very good point. One way of looking at the suggested
_journal_sup_material_ items is of cataloguing known supplementary
documents according to various known attributes; exactly which
items are populated in such a loop could vary according to (a) what
information you happen to know and (b) what you (think you) want
to do with it. So some items might have _mime_type, some _doi
(raising the interesting question of whether one wants DDL attributes
to prevent certain combinations of populated items in loops).

The whole question of what a DOI, especially for a data set, should
actually point to remains under active discussion, which is partly
why we think that current practice is still neither consistent nor
mature enough to warrant working this up as a full proposal as yet.

> Some of the later presented data appear to me to belong more to
> a publication system than a general format for crystal structure
> experimental output, and that this might be extending things too
> far.  I can see the benefit within a particular journal system.

That's also a fair point, and it does bring up the general topic
of the extent to which a CIF should just be considered a crystal
structure, and to what extent it can be considered as just one way of
packaging a compound package of information, complete with all
the items that would normally be considered 'metadata' rather
than 'data'. The current buzz word for such a package is 'research
object'. From the outset, the core CIF dictionary has been far
richer than most scientific data formats precisely because it
includes information about publications, provenance, audit trails
and, in a limited way, relationships with other data sets.

As we see, it can be very difficult (if not practically impossible)
to capture the desired level of detail in the scientific, and even
in the publishing, workflow - but I think it is useful to develop
the dictionaries in a way that tracks useful "metadata" standards
in the wider world.

This is all very hot stuff at the moment. Peter Murray-Rust's recent
"Scholarly HTML" initiatives depend upon being able to capture and
express relationships between linked data sets. After attending the
JISC Workshop on Managing Research Data (http://bit.ly/dYdUtx) I can
now answer the question I posed in my last post - is there an analogy
for data to the "CiTO" ontology for relationships between cited
publications? Yes, an extension to CiTO is being worked upon by David
Shotton's group at Oxford in asociation with the DRYAD project (see also
http://purl.org/spar). CrossRef is looking at making DOIs more
suitable for direct use in linked-data applications (http://bit.ly/e2mvya)
etc. etc.

All of which contributes to our sense that we should wait for this
to settle down a bit before revisiting the issue of recording DOIs
within CIFs.

