[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
imageNCIF
- To: Multiple recipients of list <imgcif-l@bnl.gov>
- Subject: imageNCIF
- From: Andy Hammersley <hammersl@esrf.fr>
- Date: Tue, 13 Feb 1996 06:11:26 -0500 (EST)
Hello, Sorry, I've been distracted by other things like writing analysis software so I haven't been in communication for a while... So, coming back to my last e-mail, which raised a few questions, and to David's reply. These were my questions: > 1. Do we have concensus on the binary nature of storage for the "image" > data ? (As opposed to ASCII encoding of "image" data.) > 2. Do we have concensus on holding header information and binary "image" > information together in the same file ? ( The main alternative to > this would be to have a separate header file which could therefore be > a pure text file, and a binary file for the "image" data.) > 3. Within the COMCIF framework could a "Crystallographic Binary File" > (or similarly named) format be defined ? With a "CIF-compatible" > header section, and a tool to convert the "cif-compatible" header > sections to "cif-compliant" files. My points 1 and 2 seem to be accepted: David writes: > Andy has summarised very well the consensus that is developing > in the group, ... whereas point 3 gets a less clear answer. Davids continues: > but it is important to point out that, however much > the ascii part of the binary file may look like a cif, it will not be a > cif as long as it is included in a file that contains binary > information. but goes on to say: > Still, it would make sense to keep the ascii part in a form that, > when extracted, constituted a legitimate cif. ... I take this to mean a "technical" NO, but more or less a practical YES. I think that if a binary CIF-compatible format is developed then ultimately it would be highly desirably that the whole format is "owned" by the IUCR maintained in the same or similar manner that CIF is presently maintained. The simplest and best method to do this, I believe, would be to extend the function of COMCIFS to cover both CIF and the binary format. A parallel committee could be imagined, but I would see that as something best avoided. Still I get the impression that this is a problem of technicality, rather than necessarily a real practical problem, and at present is somewhat hypothetical. I think its time to start defining details of the format... For the present I'll continue calling the format "Crystallographic Binary File" (CBF). This is similar to "CIF:, but clearly has the word "binary" inserted. But may be other people have better ideas ... BIF ?? ------------------------------------------------------------------------------- Here's my attempt to out-line the format. 1. CBF is a binary file, containing self-describing "image" and auxiliary data. 2. It is an exact number of blocks of **** bytes in length. Jim suggested 512 bytes block size, for efficiency reasons on OpenVMS, but there was objection to this. I think that we need to support the concept a "record length" for Fortran direct access I/O and for certain O.S.'s. For other O.S.'s which don't have file structures all this necessarily means is that the files are some exact multiple of some number of bytes. A program written in "C" or a similar language would pad out the end of the file to the right number of bytes. Such a concept may also be generally useful for efficiency reasons. In our choice of this number, we should not especially favour VMS for efficiency reasons, but then again ideally we want the reading and writing of the files to be as efficient as possible on ALL possible O.S.'s. If we can we should avoid building in inefficiency, and if possible leave the opportunity for memory mapping and similar techniques. I suggest either 512 or 1024 byte block size, but maybe other numbers make more sense for other O.S.'s. 3. The very start of the file has an identification item I would like some simple method of checking whether the file really is a CBF or not. Ideally this would be right at the start of the file. Thus, a program only needs to read in n bytes and should then know immediately if the file is of the right type or not. I think this identifier should be some straightforward and clear ASCII string. 4. Somewhere near the start of the file is the CBF version or level. c.f. PostScript level I and II. Initially, a restricted format is probably the most practical to define and implement e.g. only one header and binary section per file. However, later on we may want to extent the format to cover multiple header/binary sections. Such an important change could be communicated to a program through this version/level number. This could be combined with the identification item e.g. ### CRYSTALLOGRAPHIC BINARY FILE FORMAT: VERSION 1.0 (Such an identifier should be long enough that it is highly unlikely to occur randomly, and if it is ASCII text, should be very slightly obscure, again to reduce the chances that it is found accidently. Hence I added the three hashes, but some other form may be equally valid.) 5. Header section: describing following binary section, and containing other auxiliary information. Defined as for CIF, with the exception of the line separators. e.g. _image_size_dimensions 2 # Or equivalent (Clearly much more detail to be defined.) [At the ESRF a data format was developed which used the keyword "IMAGE", which turned out to mean any binary data section, hence I had reservations about using the word "image". However, if we are definitely referring to images (in some sense) and will use other keywords for other types of binary data, my previous objection disappears.] 6. Some clear identifier signalling the end of the header section and where the binary section begins, or some equivalent method for achieving the same. I favour a very clear identifier, Jim some time ago seemed to favour a byte count keyword. 7. The binary data. Starting at a new block ? If normal computer data e.g. 2-byte integers, or IEEE reals are being stored in essentially native format then word boundaries should be respected. Given that higher "quadruple" precision data types and complex data types may potentially be wanted, I suggest that at least 32 byte boundaries are respected, but maybe for efficiency or simplity reasons it's desirable to use the full block boundaries. (Data types, possible compression, etc. to be defined) 8. Recommended file extension (restricted to three characters). e.g. cbf This allows users to recognise file types easily, and gives programs a chance to "know" the file type without having to prompt the user. --------------------- I guess that those are the main features I would like to see in the format. The precise syntax is not too important (to me), although it is important that it is precisely defined. (Precise definition, I feel, is a strong point of the existing CIF dictionary.) ------------------------------------------------------------------------------- I'll make a few points on other matters which have been raised: A. Jim objects to words like "horizontal", "vertical", 'X-direction", and "Y-direction" which I tend to use. I understand his objections, but we do need to be able to relate an abstract byte stream, first into some regular array form, and then to be able to relate the array to an experimental set-up, and to a computer screen. I think we also want a simple language in which to be able to do this (at least for the simple cases). [ I guess I sit too much in front of a computer screen, so all images I work with have an up and a down. Whilst clearly a 2-D detector does not have to be vertically mounted (in the Lab frame), in practice almost all are. So usually the detector has a clear sense of up and down. Unfortunately, by the time the image has been stored and displayed the two are often not the same ! The same is true for left and rignt, but with the added complication that it needs to be defined whether the image should defined from the sample looking at the detector, or vice versa.] B. I think that it is best to avoid to words like "short" which has been used ("usi"), and I guess "long" which hasn't. These mean particular things to particular language/compiler implementations and may well change in the future. Some equivalent wording which is less open to (mis)-interpretation is preferable. e.g. 2_byte, 4_byte ------------------------------------------------------------------------------- Lastly: Whilst my point 4 was not directly answered (being at least in part dependent on point 3), I have been asked to present a short talk on "imageNCIF" at the CIF workshop, which takes place during the IUCr congress in August. I see this as encouraging. Andy Hammersley
Reply to: [list | sender only]
- Prev by Date: imageNCIF Convergence ?
- Next by Date: Re: imageNCIF
- Prev by thread: Re: imageNCIF
- Next by thread: Re: imageNCIF
- Index(es):