Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Line Separators

On Thu, 29 Feb 1996, Andy Hammersley wrote:

> (Please remember that the file is a BINARY file, and should not be confused
> with an ASCII text file. The following is only an attempt to make conversion
> to an ASCII text file, or viewing with some editors, of the header section as
> simple as possible.)
> 
> I know of three basic ways in which different operating systems store ASCII
> text files (there are probably variants of at least some of these methods):
> 
> 1. Variable-length records, where typically the first two bytes of each
>    record specify the length of the record. This is how VMS used to store
>    ASCII text, and I guess other "old" operating systems. (This seems the
>    most elegant to me, as programs do not need to examine bytes in a line, 
>    once they know that they want to jump to another line)

Used to? This is still one of the common file structures under VMS. If 
you ftp a file in ASCII mode from a Unix system, the file structure will 
be converted to this format.

> 2. "Stream-LF" The ASCII text is one long byte stream, and the line-feed 
>     character (ASCII byte value 10) is used to signal the end of a "line".
>     This is the normal Un*x method, but VMS can recognise such files.
>     (This seems the simplest, but inefficient method to me.)

It isn't strictly true that VMS can always recognise this structure, but 
given that a tool to extract the header in native text format for any 
system is available, this gets my vote.

In any case, it can be made both simple and efficient - under virtually
all Unix, use mmap() to memory-map the file, and bypass all the
inefficient stream or file i/o stuff. (I don't think that Linux has mmap
yet, but it probably soon will.) Under VMS, use the $CRMPSC call to do the
same thing - this also bypasses all information about the file structure
stored in the header, and allows you to see the data exactly as it is on
disk. In both cases, you get a memory address which corresponds to the
beginning of the file, and you treat its contents as a buffer. This also
has the advantage of being memory-efficient - using mmap doesn't increase
the size of the program's process space appreciably. 

> 3. On DOS ASCII text is stored in a manner similar to "Stream-LF", but 
>    has an additional carriage-return character (ASCII byte value 13)
>    before the line-feed. (If this has a name, I'm sorry, but I don't
>    know it.)

4. You didn't mention text format on a Mac ;-). In this case, it is 
Stream-CR.

> i. Use ONLY Stream-lf

I think this one, as long as a tool to extract the header into an 
editable form for non-Unix systems is available from the start.

> ii. Use ONLY the DOS approach 
> 
> iii. Allow either Stream-lf or the DOS approach to be used.
>    Do something slightly different presumably jumping over the CR

[snip]

> "A DOS text file can be viewed as a stream-lf file on a Un*x system, and
> the extra carriage-returns just make it look slightly messy (^M's I think).
> However, if a DOS editor looks at a file without the carriage-returns the
> result is far worse." (I'm not sure what happens, but this what I am told.)

There are several public-domain or shareware PC-Windows editors around
which handle Unix text perfectly well. Linux is likely to become more used
in a scientific context, as the operating system for PC's, anyway.  In
crystallography, both X-plor and CCP4 can now be run under Linux, and data
processing software can't be far behind, as PC's become more powerful. 

My 2p worth,
Peter.

========================================================================
Peter Keller.            \ "The self-respect which other men enjoy
Dept. of Biology and      \  in rising early I feel due to me for
    Biochemistry,          \  waking up at all."
University of Bath,         \ 
Bath, BA2 7AY, UK.           \        --- William Gerhardie
------------------------------\-----------------------------------------
Tel. (+44/0)1225 826826 x 4302 | Email: P.A.Keller@bath.ac.uk (Internet)
Fax. (+44/0)1225 826449        |   P.A.Keller%bath.ac.uk@UKACRL (BITNET)
========================================================================


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.