[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Image sizes
- To: Multiple recipients of list <imgcif-l@bnl.gov>
- Subject: Image sizes
- From: Andy Hammersley <hammersl@esrf.fr>
- Date: Thu, 6 Jun 1996 12:30:33 -0400 (EDT)
Hello Everyone, Nick Spadaccini who is a co-developer of the STAR format has been sent a copy of the DRAFT CBF proposal, and has asked me a number of questions regarding image sizes, and data compression. I include here some of the text which I sent him regarding the sizes of images and image data-sets. Since, I feel, that it is the "size of the problem" which makes binary storage essential. ------------------------------------------------------------------------------- I feel that the argument for binary has perhaps not been made in great detail, but nevertheless I feel that the fact that people who work with 2-D detector data agree unanimously that ASCII encoding is unsuitable is very significant (this is probably the only point were unanimous agreement is possible). Why binary ? The size of data is overwelming, and the desire for speed means that any extra conversion steps are highly unwelcome. If a "slow" image format was introduced it would almost certainly be ignored. All exisiting image formats are binary. I don't think this is an accident nor due to lack of imagination. I see to a large extent the initial aim of CBF as "standardising" existing practices. Unfortunately there are too many image formats rather than too few (but still there are no ideal standard formats suitable for scientific image data). If CBF is to be accepted it must not inccur big disadvantages e.g. taking more disk space, or causing slower access times. Integer representations do not present a practical problem for inter- computer system transport (or at least the level at which they represent a problem has been solved time and time again), and IEEE floating point number representation means that real numbers are much less of a problem than they used to be, and this is a trend which will doubtless continue. You ask for some "typical" sizes of images and data-sets. I'll give you some insite into the ESRF situation, but particularly for protein xtallography. This is the area with which I'm most familiar, but for small molecule xtallography using 2-D detectors the situation is not very much different, and I work with groups using 2-D detectors for powder diffraction. > (1) How big ? A very popular on-line imaging plate detector used world-wide is the MarResearch image plate scanner. This has 2000x2000 pixels which are normally stored in 16-bit integers (a compressed format is also available but little software exists which works directly with the compressed format). (A new bigger system, i.e. more pixels, is under development.) So each image is 8 million bytes in size. > (2) How many ? Typically 1 degree oscillation images are taken. The total oscillation range will depend on the crystal symmetry, but 135 - 180 degrees may be considered real-life examples. For a single wavelength experiment that's it, although there might be a native plus a number of derivative data-sets to collect. For a multi-wavelength anomalous dispersion experiment 3 to 5 data-sets are required. The Mar has a slow read-out so probably few experiments have enough time to take 5 wavelengths, but if this was not a limit many would. So a slightly "pessimistic" Mar monochromatic data-set might be 8 Mbytes * 180 images = 1.4 Gbytes (forgive me confusing millions with megas) A similar 3 wavelength MAD data-set would be 4.3 Gbytes. This example corresponds to experiments which can be carried out at any synchrotron world-wide, and the monochromatic example in home labs. The Mar takes a long time to read-out so the average data rate is not so great, and here at the ESRF is a limiting factor in the rate of data collection. On-line CCD read-out systems are starting to be used in synchrotrons as they have much faster read-out although the number of resolution elements is generally smaller. At the ESRF we are presently using CCD systems with 1242x1152 pixels with 16-bit ADC's. These systems can be read-out in less than 4 seconds. These systems are used routinely on the beam-lines. The size of data-sets does not increase owing to these systems, but the data rate increases owing to the fast read-out. (Images are smaller, but often 0.5 degree oscillations are used.) A 1024x1024 pixel system with a 14-bit dynamic range, which can presently be read-out 3 times a second is being tested. This system is designed to be read-out with 4 parallel outputs, and should soon produce ~12 images a second, and later 20 images a second. These rates are for time-resolved experiments such a polymerisation. At full speed this is 40Mbytes per second if no compression is used. This is at the limit of present bus rates, and well beyond simple SCSI disks. With up-market systems and RAID arrays, and with compression, this becomes tractable. For xtallography such a system might be used with much smaller oscillation ranges so many more images would be in a data-set. e.g. if 0.1 degree oscillations are used the data-set becomes 10x larger than for 1 degree oscillations. So single wavelength data-sets of ~4Gbytes are to be forseen. Some other examples of "large" detector systems: We presently work with a Molecular dynamics imaging plate scanner which produces images of 3500x4500x2-byte pixels. Here though a group may only collect 10-30 images in an experiment. A new scanner should have arrived a month ago which can produce a 130 Mbyte image. [This has in fact now arrived.] For the APS in Argonne a CCD detector is being developed with 3076x3076x2-byte pixels and the read-out time is presently 1.8 seconds. It is aimed to go 4x faster. 8-bit CCD detectors exist which can be read-out 1000x /second. At the ESRF there is a central storage of user data. Due to the problem of volume the data is stored for 100 days after which it is automatically deleted. My guess is that the figure of 100 days is likely to be reduced to perhaps 30 days as more of the fast read-out detector systems are installed, and the total data rate increases. [The 100 days has now been reduced to 50 days.] At present about 150 Gbytes are generated per month, but the rate is increasing. With the new detectors full integrated data rates of about 10 Gbytes per day per beam-line end-station are forseeable (although some quote figures about 10x higher). Ideally data processing software would work at the same speed as data collection, or in many cases much faster. For the Mar example a system is under development which aims to process each image in 5 seconds. This includes all over-heads of disk access etc. ------------------------------------------------------------------------------- More following-on soon. Best Regards, Andy
Reply to: [list | sender only]
- Prev by Date: New member
- Next by Date: Alternative proposal
- Prev by thread: Re: Alternative proposal
- Next by thread: New member
- Index(es):