[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: ImageNCIF/CBF
- To: Multiple recipients of list <imgcif-l@bnl.gov>
- Subject: Re: ImageNCIF/CBF
- From: Andy Hammersley <hammersl@esrf.fr>
- Date: Mon, 17 Jun 1996 08:34:03 -0400 (EDT)
Dear Nick, (I've posted your proposal to the imageNCIF discussion group, and I'm sending this to them as well.) I appreciate your interest and input to the question of storing large image data-sets in a CIF or CIF-like manner. However, I feel that discussion should be in an appropriate forum e.g. the imageNCIF group or COMCIFS, or both, or ... I also feel that some of your comments are somewhat unfair and out of context. The document which you have seen is not a "submission to COMCIFS". It is only a draft proposal from within the imageNCIF working group. Many questions such as allowing multiple images of different sizes are still being discussed. (Incidently "imageNCIF" and "CBF" are two names for the same thing, or if you prefer imageNCIF is the discussion group and CBF is the present name for the draft proposal.) I disagree with some of the points which you make: > 1. A CBF cannot be transferred from site to site without modification > (eg. encoding, packing or zipping) and is not at all portable. Not true. ftp in binary mode. WWW/netscape etc. has no problems with playboy pictures, diffraction data doesn't have to be any different. > 2. A CBF is not extensible in the same way as STAR. Whereas new data can be > inserted or appended to an existing STAR File, as can extensions to > a dictionary without requiring software to be re-written, this > cannot be the case with a CBF. The present DRAFT proposal is not extensible in the same sense as a CIF. However the possibility to append images etc. could be included, but with extra complication overheads. > Here we quote from Andy Hammersley's document, from 2.0 OVERVIEW OF > THE FORMAT, "A change in the major version may well mean that a > program for the previous version cannot input the new version as some > major change has occurred to CBF." The sorts of changes I was thinking about are far less important than the ones which you are proposing for CIF ! i.e. Removing the 80 byte line limit. I was thinking of things like adding extra compression algorithms (minor version change), or maybe later addition of multiple binary sections (major version change), but NOT changes which change the structure of the format. > 3. Hard-coded block sizes, line lengths, dataname sizes etc are no > longer part of the CIF standard, or soon will not be, according to > discussions in late 1995. The 512 byte blocking size is only a proposal. It could be removed completely (but with disadvantages), made a different number, made a variable, etc. However, if CIF changes fundamentally it may be more appropriate to do things differently. (My personnel view is that CIF should not be changed fundamentally, backwards compatibility is very important.) The present CBF document and extra dictionary has been "cast" in the version one DDL. There is no reason that it should not be re-defined in DDL-2 terms prior to being presented to COMCIFS etc. (In fact I've so far tried and failed to copy the PostScript document describing the DDL-2 4 times. The transfer just stops after about 20 pages.) > 4. A CBF cannot be handled by the substantial library of STAR conformant > software currently available and being developed. The duplication of > such tools for specialised applications is very wastful. This is (would be) true, Hence the extraction tool. However, it would make sense to design the format and/or software, so that as much as possible software could be shared. I don't think that this would be difficult. > 5. The CBF format looks like a CIF and there is a significant danger that > it could be mistaken for a CIF (if there is an editor that will > handle a binary file). This is potentially confusing and may retard > rather than accelerate the acceptance of CIF as a standard > crystallographic data exchange approach. This is a potential danger. Suggestions to reduce this danger are welcome. > It is worth stressing that although the header section of a CBF > looks like a CIF, the data is not attached to a dataname in a > convenient or easily usable form, and multiple "images" cannot > be looped or contained with in the file. A data name for the binary data could be added, instead of the end of header identifier, however, the exact byte position where the binary data starts is crucial. The looping mechanism could be used to allow multiple different sized binary "images". This has been suggested by David Brown. I, however, do not favour this suggestion. David Brown asks: DB> Is it clear how one array is terminated and a second begun when DB> the cif contains multiple arrays? What is the normal method of DB> terminating a binary array? Are there separators in the binary string DB> that can be used for this purpose? The simple answer to this (I think) is that there aren't. Not without external knowledge, such as that defined within the header section. Knowledge of the number of pixels in each binary string could be used to calculate the byte position at which a particular array starts; unless, of course, data compression has been used. I would prefer a format with multiple header/binary section pairs. This would need the number of bytes or blocks in each binary section to be stored in the header sections. So that a program would know how to "jump over" a binary section and find the start of the next header section. Such a mechanism could be defined, but I feel that it would be better to start with a format which only has one header section and one binary section initially, for the sake of simplicity. CBF at present could store multiple images, but all of the same size e.g. a time sequence. > 6. Finally, mutant forms of CIF such as CBF will tend to be a catalyst > for others....based on the often mistaken belief that there is > always a better mousetrap, and that its more efficient to adapt a > standard than work within it! Such enhancements eventually lead to > the complete collapse of the standard....as has been the case for > a number of computer languages. The STAR File is a LONG-TERM archival > and exchange approach and therefore its syntax must be considered > sacrosanct. This is why it's very important that whatever imageNCIF (the working group) do, that it's within COMCIFS and coordinated with CIF people. This seems to be happening. > problems, and appreciate that the standard "text" image approach MAY > not work for massive data files - which may be terrabytes in size. Terabytes seems a bit of an exaggeration at present, but who knows what will happen in the next 10 years ... > (i) If the descriptive parameters of a binary file could be easily > "linked" to that file, why can't these be in a separate text file? Yes, it's possible, I know two file formats which do this. However your example shows a huge problem with this "solution" Your file pointers are wrong as soon as the files are renamed, and in your example, as soon as they are copied to another directory. This is a REAL problem. The two file solution was raised in the imageNCIF discussions, but nobody favoured it. Even if you manage to overcome the file pointer problem, through names conventions etc. you are still left with one "logical" data-set being stored in two separate places. This leaves the possibility for the two to become separated. And if the possibility exists then it WILL happen. > (ii) Because binary data is machine-specific (and, therefore, so is > the encompassing file), is this file suitable for anything > other than "transitory local" use (in other words, it is unsuitable > for portable or archival purposes)? Binary images are portable and are transferred between different computer systems. With integer data only byte swapping is necessary, and IEEE reals are becoming standard. (The format could be stricted to only hold integers, if this was felt to be very important.) > [We doubt binary files will ever form part of the IUCr archives > but such files may be retained inhouse until the appropriate > information is extracted in a more archival form. None of us need > to be reminded of the inadequacy of machine-specific data in an > age when the half-life of a chip or an OS is about 12 months!] "Archiving" is probably not the real aim, but transferability and portability are. Hence the aims are largely the same as for CIF. I see no reason whatsoever to believe that ASCII encoding has a greater longevity than computer integer representations, nor probably IEEE floating point representation. In fact at present text data is much less standardised than Integer data types. We (ImageNCIF) have identified four commonly used and different ways in which ASCII text data are defined on different operating systems, whereas multi-byte integer data are only stored in two different forms, commonly known as big endian or little endian. Changes in the future seem more likely to affect character data than integer and floating point data. Multi-byte character sets are presently being developed. Whilst I appreciate your efforts in understanding the problem of mass data storage and transport, I'm afraid that I and I think the large majority of the imageNCIF group will reject your proposal for the reasons given above. Nevertheless, I would welcome your continued involvement and constructive criticism of the proposals. Best Regards, Andy Hammersley
Reply to: [list | sender only]
- Prev by Date: Re: Comments on Pflugrath's comments
- Next by Date: (Data Compression)
- Prev by thread: (Data Compression)
- Next by thread: Comments on Pflugrath's comments
- Index(es):