Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Line Separators

  • To: Multiple recipients of list <imgcif-l@bnl.gov>
  • Subject: Line Separators
  • From: Andy Hammersley <hammersl@esrf.fr>
  • Date: Thu, 29 Feb 1996 08:20:50 -0500 (EST)

Hello,

   I think that we're almost at a stage when the best way forward might be 
to prepare a detailed specification document of the proposed data format and
centre discussion around the specification.

However, I think that the manner in which "lines" are separated in the 
header section deserves some careful consideration.

(Please remember that the file is a BINARY file, and should not be confused
with an ASCII text file. The following is only an attempt to make conversion
to an ASCII text file, or viewing with some editors, of the header section as
simple as possible.)

I know of three basic ways in which different operating systems store ASCII
text files (there are probably variants of at least some of these methods):

1. Variable-length records, where typically the first two bytes of each
   record specify the length of the record. This is how VMS used to store
   ASCII text, and I guess other "old" operating systems. (This seems the
   most elegant to me, as programs do not need to examine bytes in a line, 
   once they know that they want to jump to another line)

2. "Stream-LF" The ASCII text is one long byte stream, and the line-feed 
    character (ASCII byte value 10) is used to signal the end of a "line".
    This is the normal Un*x method, but VMS can recognise such files.
    (This seems the simplest, but inefficient method to me.)

3. On DOS ASCII text is stored in a manner similar to "Stream-LF", but 
   has an additional carriage-return character (ASCII byte value 13)
   before the line-feed. (If this has a name, I'm sorry, but I don't
   know it.)

So far we have talked about a header section which would follow the
"stream-lf" approach, but maybe it would be better to follow the DOS
approach. The variable-length records despite their elegence seem to be 
gradually disappearing, and are very different to either the Un*x or DOS
approaches. In this "modern" (?) world I don't see the variable-length
record approach as being viable choice.

I see three reasonable alternatives:

i. Use ONLY Stream-lf

ii. Use ONLY the DOS approach 

iii. Allow either Stream-lf or the DOS approach to be used.

-----------------

iii. is possible, but would complicate the task of parsing a file. e.g.

if (no_carriage_return) then

   Do something

else

   Do something slightly different presumably jumping over the CR

end if

As it makes the format more complicated I think iii. is best avoided.

At the ESRF, this point was discussed and the following conclusion drawn:

"A DOS text file can be viewed as a stream-lf file on a Un*x system, and
the extra carriage-returns just make it look slightly messy (^M's I think).
However, if a DOS editor looks at a file without the carriage-returns the
result is far worse." (I'm not sure what happens, but this what I am told.)

Thus it was concluded that taking the DOS approach was better. (Of
course you can also wonder about which O.S. will be dominant in 10
years time.)

Any other comments or suggestions ?

        Andy Hammersley




Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.