[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Line Separators
- To: Multiple recipients of list <imgcif-l@bnl.gov>
- Subject: Re: Line Separators
- From: Peter Keller <bsspak@bath.ac.uk>
- Date: Thu, 29 Feb 1996 12:02:36 -0500 (EST)
On Thu, 29 Feb 1996, Andy Hammersley wrote: > (Please remember that the file is a BINARY file, and should not be confused > with an ASCII text file. The following is only an attempt to make conversion > to an ASCII text file, or viewing with some editors, of the header section as > simple as possible.) > > I know of three basic ways in which different operating systems store ASCII > text files (there are probably variants of at least some of these methods): > > 1. Variable-length records, where typically the first two bytes of each > record specify the length of the record. This is how VMS used to store > ASCII text, and I guess other "old" operating systems. (This seems the > most elegant to me, as programs do not need to examine bytes in a line, > once they know that they want to jump to another line) Used to? This is still one of the common file structures under VMS. If you ftp a file in ASCII mode from a Unix system, the file structure will be converted to this format. > 2. "Stream-LF" The ASCII text is one long byte stream, and the line-feed > character (ASCII byte value 10) is used to signal the end of a "line". > This is the normal Un*x method, but VMS can recognise such files. > (This seems the simplest, but inefficient method to me.) It isn't strictly true that VMS can always recognise this structure, but given that a tool to extract the header in native text format for any system is available, this gets my vote. In any case, it can be made both simple and efficient - under virtually all Unix, use mmap() to memory-map the file, and bypass all the inefficient stream or file i/o stuff. (I don't think that Linux has mmap yet, but it probably soon will.) Under VMS, use the $CRMPSC call to do the same thing - this also bypasses all information about the file structure stored in the header, and allows you to see the data exactly as it is on disk. In both cases, you get a memory address which corresponds to the beginning of the file, and you treat its contents as a buffer. This also has the advantage of being memory-efficient - using mmap doesn't increase the size of the program's process space appreciably. > 3. On DOS ASCII text is stored in a manner similar to "Stream-LF", but > has an additional carriage-return character (ASCII byte value 13) > before the line-feed. (If this has a name, I'm sorry, but I don't > know it.) 4. You didn't mention text format on a Mac ;-). In this case, it is Stream-CR. > i. Use ONLY Stream-lf I think this one, as long as a tool to extract the header into an editable form for non-Unix systems is available from the start. > ii. Use ONLY the DOS approach > > iii. Allow either Stream-lf or the DOS approach to be used. > Do something slightly different presumably jumping over the CR [snip] > "A DOS text file can be viewed as a stream-lf file on a Un*x system, and > the extra carriage-returns just make it look slightly messy (^M's I think). > However, if a DOS editor looks at a file without the carriage-returns the > result is far worse." (I'm not sure what happens, but this what I am told.) There are several public-domain or shareware PC-Windows editors around which handle Unix text perfectly well. Linux is likely to become more used in a scientific context, as the operating system for PC's, anyway. In crystallography, both X-plor and CCP4 can now be run under Linux, and data processing software can't be far behind, as PC's become more powerful. My 2p worth, Peter. ======================================================================== Peter Keller. \ "The self-respect which other men enjoy Dept. of Biology and \ in rising early I feel due to me for Biochemistry, \ waking up at all." University of Bath, \ Bath, BA2 7AY, UK. \ --- William Gerhardie ------------------------------\----------------------------------------- Tel. (+44/0)1225 826826 x 4302 | Email: P.A.Keller@bath.ac.uk (Internet) Fax. (+44/0)1225 826449 | P.A.Keller%bath.ac.uk@UKACRL (BITNET) ========================================================================
Reply to: [list | sender only]
- Prev by Date: Too little too late?
- Next by Date: Re: Line Separators
- Prev by thread: Re: Line Separators
- Next by thread: Re: Line Separators
- Index(es):