Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Image sizes


Hello Everyone,

   Nick Spadaccini who is a co-developer of the STAR format has been
sent a copy of the DRAFT CBF proposal, and has asked me a number of questions
regarding image sizes, and data compression.

I include here some of the text which I sent him regarding the sizes
of images and image data-sets. Since, I feel, that it is the "size of 
the problem" which makes binary storage essential. 

-------------------------------------------------------------------------------

I feel that the argument for binary has perhaps not been made in great
detail, but nevertheless I feel that the fact that people who work
with 2-D detector data agree unanimously that ASCII encoding is unsuitable
is very significant (this is probably the only point were unanimous
agreement is possible).

Why binary ?

The size of data is overwelming, and the desire for speed means that any
extra conversion steps are highly unwelcome. If a "slow" image format 
was introduced it would almost certainly be ignored.

All exisiting image formats are binary. I don't think this is an accident
nor due to lack of imagination. I see to a large extent the initial aim 
of CBF as "standardising" existing practices.

Unfortunately there are too many image formats rather than too few (but 
still there are no ideal standard formats suitable for scientific image
data). If CBF is to be accepted it must not inccur big disadvantages e.g. 
taking more disk space, or causing slower access times.

Integer representations do not present a practical problem for inter-
computer system transport (or at least the level at which they represent
a problem has been solved time and time again), and IEEE floating point 
number representation means that real numbers are much less of a problem
than they used to be, and this is a trend which will doubtless continue.

You ask for some "typical" sizes of images and data-sets. I'll give you
some insite into the ESRF situation, but particularly for protein 
xtallography. This is the area with which I'm most familiar, but for
small molecule xtallography using 2-D detectors the situation is not
very much different, and I work with groups using 2-D detectors for
powder diffraction.

> (1) How big ?

A very popular on-line imaging plate detector used world-wide is the
MarResearch image plate scanner. This has 2000x2000 pixels which are
normally stored in 16-bit integers (a compressed format is also available
but little software exists which works directly with the compressed 
format). (A new bigger system, i.e. more pixels, is under development.)
So each image is 8 million bytes in size. 

> (2) How many ?

Typically 1 degree oscillation images are taken. The total oscillation
range will depend on the crystal symmetry, but 135 - 180 degrees may
be considered real-life examples. For a single wavelength experiment
that's it, although there might be a native plus a number of derivative
data-sets to collect. For a multi-wavelength anomalous dispersion
experiment 3 to 5 data-sets are required. The Mar has a slow read-out
so probably few experiments have enough time to take 5 wavelengths, but
if this was not a limit many would.

So a slightly "pessimistic" Mar monochromatic data-set might be
8 Mbytes * 180 images = 1.4 Gbytes (forgive me confusing millions with
megas)

A similar 3 wavelength MAD data-set would be 4.3 Gbytes.

This example corresponds to experiments which can be carried out at
any synchrotron world-wide, and the monochromatic example in home
labs.

The Mar takes a long time to read-out so the average data rate is not
so great, and here at the ESRF is a limiting factor in the rate of
data collection.

On-line CCD read-out systems are starting to be used in synchrotrons as
they have much faster read-out although the number of resolution elements
is generally smaller.

At the ESRF we are presently using CCD systems with 1242x1152 pixels
with 16-bit ADC's. These systems can be read-out in less than 4 seconds.
These systems are used routinely on the beam-lines. The size of data-sets
does not increase owing to these systems, but the data rate increases
owing to the fast read-out. (Images are smaller, but often 0.5 degree
oscillations are used.)

A 1024x1024 pixel system with a 14-bit dynamic range, which can presently
be read-out 3 times a second is being tested. This system is designed
to be read-out with 4 parallel outputs, and should soon produce ~12
images a second, and later 20 images a second. These rates are for
time-resolved experiments such a polymerisation. At full speed this
is 40Mbytes per second if no compression is used. This is at the limit
of present bus rates, and well beyond simple SCSI disks. With up-market
systems and RAID arrays, and with compression, this becomes tractable.
For xtallography such a system might be used with much smaller oscillation
ranges so many more images would be in a data-set. e.g. if 0.1 degree
oscillations are used the data-set becomes 10x larger than for 1 degree
oscillations. So single wavelength data-sets of ~4Gbytes are to be forseen.


Some other examples of "large" detector systems:

We presently work with a Molecular dynamics imaging plate scanner which
produces images of 3500x4500x2-byte pixels. Here though a group may
only collect 10-30 images in an experiment. A new scanner should
have arrived a month ago which can produce a 130 Mbyte image. [This has in fact
now arrived.] For the
APS in Argonne a CCD detector is being developed with 3076x3076x2-byte
pixels and the read-out time is presently 1.8 seconds. It is aimed to
go 4x faster. 8-bit CCD detectors exist which can be read-out 1000x
/second.

At the ESRF there is a central storage of user data. Due to the problem
of volume the data is stored for 100 days after which it is automatically
deleted. My guess is that the figure of 100 days is likely to be reduced
to perhaps 30 days as more of the fast read-out detector systems are
installed, and the total data rate increases. [The 100 days has now been 
reduced to 50 days.] At present about 150
Gbytes are generated per month, but the rate is increasing. With the 
new detectors full integrated data rates of about 10 Gbytes per day 
per beam-line end-station are forseeable (although some quote figures 
about 10x higher).

Ideally data processing software would work at the same speed as 
data collection, or in many cases much faster. For the Mar example a
system is under development which aims to process each image in 5
seconds. This includes all over-heads of disk access etc.

-------------------------------------------------------------------------------

More following-on soon.


Best Regards,


           Andy






Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.