Dear John
On 12/4/2015 4:10 AM, John Helliwell
wrote:
5FDAAB132A30B7469258444FB7BD4C91017569FD29@MBXP07.ds.man.ac.uk"
type="cite">
Dear Wladek,
Thankyou for the exciting news that your Big Data Portal is "As
of today, starting to assign doi." I am curious to know some
practical details :-
(i) how did you assume formal ownership of the datasets in order
to be able to apply for those dois ie presumably under a
signature from the original experimenters 'transferring
ownership' to your Big Data Portal?
Our plan is that they will be owners of the data - we will be only
depository i.e.the place that store data, extract metadata and
organize entire system;
This is the reason why I wrote 'starting'.
5FDAAB132A30B7469258444FB7BD4C91017569FD29@MBXP07.ds.man.ac.uk"
type="cite">
(ii) Is the permanently funded agency applying for those dois
the University of Virginia?
Yes, we are working with University of Virginia library. They are
interesting in our system because it can be applied to other fields
too. At this moment we can generate enough doi to cover entire PDB.
5FDAAB132A30B7469258444FB7BD4C91017569FD29@MBXP07.ds.man.ac.uk"
type="cite">
Greetings,
John
PS I think Simon and Andy have answered your
technical/timing concerns re dois. Many thanks to them both.
Thank you. I was not right - for some time data that have doi can be
in hiding. Anyway there are many practical problems that have to be
solved when you are dealing with large amount of diversified data.
W.
5FDAAB132A30B7469258444FB7BD4C91017569FD29@MBXP07.ds.man.ac.uk"
type="cite">
From: dddwg
[dddwg-bounces@iucr.org] on behalf of Wladek Minor
[wladek@iwonka.med.virginia.edu]
Sent: 03 December 2015 18:40
To: IUCr Working Group on Diffraction data
Deposition
Cc: Marek Grabowski
Subject: Re: [dddwg] Initiation of formal proposal
resulting from discussions at the DDDWG satellite meeting
of the ECM Croatia
Dear All,
Some clarifications
1. Frame formats:
As I wrote to Herbert, there is no single CBF format -
beamlines create various modifications. Some companies that
distribute Detectors also created their own frame format.
Tom Terwilliger and I are working on minimum metadata
header that would allow to process datasets. If this header
would be add to every frame format, such a data usually can
be process easily regardless the information that is in
other part of the header.
2. doi
We now have over 2800 publicly available diffraction
experiments (around 300 are in pipeline). As of today, we
are starting to assign doi. There are several problems
related to it. For examle:
Once doi is assigned, it can not be removed or modify. What
one should do when PDB depositor changes the title. We can
not change doi. We will follow PDB approach to it.
3. doi citations
People will not use doi citations in their original paper
because doi data have to be public and this can happen only
when paper is already publish. I do believe that we have
catch 22 here.
Best regards
Wladek
On 12/3/2015 12:00 PM, John
Helliwell wrote:
Dear Mike,
Many thanks for bringing your proposal about area
detector raw data image formats, that you aired in
Rovinj, forward.
You mention imgcif and HDF5/NeXus explicitly
and so we invite Herbert Bernstein, as chair of that
work, to respond directly to your proposal and its
possible practical implementation.
Thankyou,
John and Brian
PS Just one, admittedly very specific detail, whilst the
bulk of MX data is collected at the synchrotron
(estimated at around 90%) we believe that about 95% of
'small molecule' single crystal are detector data is
measured on home lab set ups.
Dear All,
following a lively and entertaining discussion
at this year's DDDWG satellite meeting in
Croatia, I feel that we should attempt to
formalise some of the thoughts discussed.
Therefore I enclose a starting point for
discussion in a proposal at the bottom of this
email. I feel very strongly about the need for
advancement in this area and that the time is
absolutely correct to initiate this. It has
recently been pointed out that some institutions
are already archiving raw data and defining
sensible protocols for this seems incredibly
sensible if not an absolute necessity for the
longevity of such projects.
Please feel free to comment on the outline
below - I would hope that we could come to some
agreed position that could then be taken forward
by the group leaders as representative of our
collective feelings on the issue of data
storage, usefulness and to a certain extent
future proofing.
I hope that I have managed to convey my ideas
clearly and that the proposal makes sense. I am
certain that there are aspects that need
clarification and am equally certain that a
large degree of finessing may be required before
this can be taken to the next step. However we
must start somewhere and condensing ideas from
the meeting seems a good place to start.
Many thanks for your time, bye for now
Mike
The
need for fully archived data is becoming
more apparent and the
volume of said data is becoming ever
greater. One of the larger
hurdles to this process is that for the data
archived to be useful it
must be stored in a format that allows other
users the ability to
interact with it. Some years ago the idea of
imgCIF was created, but
for various reasons instrument manufacturers
were reluctant to adapt
to this format. Since then with the advent
of newer detector
technologies there has been a small
explosion in the number and
variety of frame formats that are currently
in use. It now seems a
daunting uphill task to convince all
developers to rewrite their
firmware to output a common image format,
therefore an alternative
must be found. As a community we currently
archive data (positions and
structure factors) in a common format - CIF.
There is no reason why
this philosophy would not work for the raw
data as well. Users
currently convert all of their processed
data into CIF format for
publication, therefore I put it to the DDDWG
that one sensible way
forward would be to have users archive their
raw data in a common format
(be that imgCIF or HDF5/NeXus) at the point
of submission. There are
currently image conversion utilities
available for some image formats
and it would not take a large investment of
time to generate these for
all users; indeed, I am sure nearly all of
these are written in various
places around the world. If the conversion
is lossless and all
information on the experimental setup is
maintained then there is no
reason for any degradation of data, but
there is the huge advantage
that this information would then be of use
to everyone for
reinvestigation or authentication protocols.
I believe this results in
one moderately sized problem in deciding
which format is the best to
use for archiving. This problem can be
approached in different ways
although there is, I believe, a simple and
pragmatic answer; the
majority of raw data is now produced at
synchrotrons due to the
technologies employed - therefore we should
take the direction from
them as they are mostly working towards
something common in format.
Dr Michael R. Probert
Head of Crystallography
Lecturer in Inorganic Chemistry
School of Chemistry
Newcastle University
Bedson Building
Newcastle upon Tyne
NE1 7RU
tel: +44(0) 191 208 6641
fax: +44(0) 191 208 6929
_______________________________________________
dddwg mailing list
dddwg@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/dddwg
--
Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616
http://krzys.med.virginia.edu/CrystUVa/wladek.htm
US-mail address:
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736
Fed-Ex address:
Department of Molecular Physiology and Biological Physics
1340 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908
----
_______________________________________________
dddwg mailing list
dddwg@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/dddwg
--
Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616
http://krzys.med.virginia.edu/CrystUVa/wladek.htm
US-mail address:
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736
Fed-Ex address:
Department of Molecular Physiology and Biological Physics
1340 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908
----
|