This is an archive copy of the IUCr web site dating from 2008. For current content please visit
https://www.iucr.org.
Crystallographic Binary File: Final Discussions
Below is a summary of the final discussions at the ImageNCIF meeting held at
the Biology Department of Brookhaven National Laboratory on Monday-Wednesday,
20-22 October 1997. These notes are intended to clarify the handouts
the attendees assembled just before the end of the meeting.
Tasks to Accomplish
- Capability of getting and putting single data items:
probably accomplished by existing CIFLIB or CIFPARSE API's,
or by minor modifications thereof:
errcode = cbf$get_element("array_structure_list.dimension", &detwidth);
. . . assuming the dictionary items already exist, of course.
- Multi-element gets and puts
- Do we defer this capability till later?
- Do we return explicit pointers to internal structures? no.
- Set up (or co-opt) facility for returning an enumerated list
of already-vetted data items.
- Arrays
How do we do these? Probably existing codes in CIFPARSE will do it.
- Qualitative and non-machine-generated information
GUIs and other I/O software need to provide easy ways of generating
this kind of information.
Mechanisms for requiring the user to input some of it will be useful,
so the archive won't be missing important information later.
Something equivalent to what an HTML forms interface does would help:
user's data-run information won't be accepted by GUI unless the
necessary items have been explicitly filled in.
- What code do we ourselves need to write?
- minor modifications or extensions of CIFLIB or CIFPARSE
so that the CIF-like portions of the CBF can be read without error
and with some level of dictionary-checking.
Can the existing codes work on files that contain binary info?
We don't know... ask John Westbrook.
- Code to interpret that header information so that we
can read the first binary segment.
- Code to interpret that header information so that we
can read the nth binary segment.
- Code to read a binary segment into memory.
How smart should this particular API be?
General consensus: not very. It should limit its activity to
the actual I/O, correcting the endian-ness of the data, and perform
any necessary decompression.
Thus not very many header items will need to be accessible to this
routine: total size of discfile to read, minimal understanding
of organizational structure of those data (width vs. height),
type of compression done, type of data present in file.
Anything beyond this structure should be the responsibility of
a downstream API that looks at this memory buffer and pulls
relevant items out of it.
This routine should be general enough to handle arbitrary
(two-dimensional?) binary data.
- Code to write a binary memory buffer onto disc.
Again, this one shouldn't be terribly smart: it should
handle endian-ness and compression, and not much else.
- Populate appropriate arrays or objects with contents
of the binary buffer(s).
This is the more difficult and application-specific step.
- Memory-freeing routines.
This is trivial if we're in C++, not quite so trivial
in C. But it needs to be done in either case.
- Read defined subsection of the image.
A typical application of this concept arises if one
wishes to extract a shoebox in (X,Y,Z) from a small group of images.
This is relatively efficient if the data aren't compressed;
if the data are compressed, it'll probably be slow.
- Data-item ordering:
The question here is whether we wish to specify that
the dictionary-entry items in the header will appear in some
specific order. Jim Pflugrath reports that his users urged
him to put the items in alphabetic order so they could find
them easier.
- Unspecified on read.
- Specified on writing?
The consensus is that we won't specify it in the CBF
standard (whatever that is!). The organization of the data
will be the responsibility of the code the manages the ascii
buffer, not the code that does file input-output operations.
- How many binary data types will we support in V0.1.?
- data type of input == data type of output
... where input might be 16-bit integers (signed or unsigned),
32-bit integers, 64-bit integers, IEEE 32-bit floating point?
Consensus appears to be to delay support of floating-point
input till later.
- Compressed disc data could be uncompressed into memory.
- 32-bit to unsigned-16 bit conversions. In this case the marker for
16-bit overflows would be 65535.
- Human reading of header
- If (see above) we specify an order for the dictionary-data
items, then when we run `more' on the file, the output will
come out in a defined order.
- Alternative is leave the data in whatever order the
header-generating code wishes to produce, and then produce a
tool called cbf_beautifier that reformats the ascii of the header
into a format the user wants. We could even make that pretty flexible:
the user could produce a .cbfrc file that contains a list of
mmCIF dictionary items that he/she wants to see, in the user's
desired order; the code would then extract only those items from
the header, sort them into the user's ordering, and print them.
- Preserving a history of how the image is manipulated
- We need to remember whether the image has been
dark-current-subtracted, spatially corrected, sensitivity-corrected,
dezingered, . . .
- mmCIF already provides for audit records; these will help a lot.
But unless we're going to have multiple data blocks within a single
header (undesirable!?) it'll be hard to preserve the whole record
with the standard mmCIF audit formalism.
Therefore, we should include data names that indicate specific types
of manipulations of image. The overall list isn't all that
long; the cases mentioned above are about the entire list.
- Spectra and other 2-D plottable data
- This could be done with binary data blocks containing
2-D plot coordinate values (X,Y).
- This could also be done in ascii, wherein
the mmCIF dictionary itself would include data names for
ascii (X,Y) pairs of data, along with control items like
label_X, label_Y, range_X, range_Y, graph_title, log/linear.
Consensus is that this should be an mmCIF issue, not
a CBF issue--Jim Fait will discuss this with John Westbrook
and the mmCIF community.
- Data names for experimental controls
Specifying these names is associated with Bob Sweet's
goal of having the header contain all the information needed
to characterize the experiment.
The aliasing mechanisms of DDL 2.1 allow us to fully populate
specialized experimental control data categories even if
some of those names already exist in other categories of
the mmCIF dictionary.
Assignments of Tasks for CBF V0.1
n.b.I've re-ordered these tasks relative to
the grotty-looking overhead we produced into an ordering
that makes a bit more sense.
- Overall shepherding of the project:
Bob Sweet, Andy Hammersley.
- Report on what we did at this meeting:
Bob Sweet, perhaps by Thanksgiving.
- Maintaining a CBF homepage at NDB:
John Westbrook, with contributions from many folks.
- Coordinate systems: Jim Pflugrath, via overall
report from meeting.
- Further honing of file-structure syntax:
Andy Hammersley.
- Assembling a list of data names needed for processing
steps: Bob Sweet.
- Adding Dictionary Names: Andy Howard
and Paula Fitzgerald, remembering to send the results
quickly to Paul Ellis so he can use them!
- Maintaining integrity of dictionary additions
according to DDL 2.1 syntax and current mmCIF names:
John Westbrook.
- First coding of header data: Paul Ellis,
to handle his MAR system.
- Data compression/decompression:
Paul Ellis, Andy Hammersley.
- Publicizing what we're doing:
IUCr via Brian McMahon; John Westbrook for NDB website;
IUCr news; Synchrotron Radiation News.
- Subsequent code-writing: Yves Epelboin,
Jim Fait, Andy Howard, many others.
- Testing of code: Yves Epelboin, Paul Ellis,
Chris Nielsen, Andy Hammersley, Jim Fait, John Skinner.
- COMCIFS comments: Brian McMahon, David Brown.
- D*Trek implementation: Jim Pflugrath.
- Bruker software implementation:Jim Fait,
Bruker employees.
- ADSC implementation: Chris Nielsen.
- MOSFLM interface: Paul Ellis.
- X-GEN interface: Andy Howard.
- Powder-diffraction interface:
Andy Hammersley, Brian Toby, perhaps Carlo Segre.
- CBF-to-CIF translation tools: Herb Bernstein.
- DDL1.4 aliases: to be determined.
- Recorder of final discussions: Andy Howard