The standard approach to the computer archiving of text and numerical data is to use fixed format files. Such files employ a fixed data structure based on current data requirements. Subsequent changes in archival requirements are difficult for this type of approach, as changes to the data structure will often make previously archived files inaccessible.
Upwards compatability and flexibility are therefore two very desirable properties of any archival format. They are especially important when there is a wide diversity of data types, or when the data requirements vary depending on the point of application. Archival files must be portable so that commonly used data are accessible independent of the data structure or origin. It is also essential that an archive file format be flexible enough to incorporate future data requirements without the need to modify existing files. The Self-defining Text Archive and Retrieval (STAR) file structure is designed to meet these requirements. The STAR format is intended for archiving text and numerical data of any type and in any order. It is particulatly suitable for electronic publication purposes.
Data Structure of a STAR File
A STAR file is a formatted sequential file composed of text lines. These lines contain standard visible ascii characters and may be viewed or edited with a standard text editor. A STAR file is sub-divided into any number of separate data blocks. Each data block starts with the string "data_xxxx" where "xxxx" is the block name.
The text following the "data_" string specifies both the "structure" (i.e. layout) of the data, as well as the data items contained therein. All information is machine-readable, as well as being intelligible as text.
The identity of each data item within a data block is specified with a unique "data name". A data name is a single character string starting with an underline "_". Each data item must be preceded by a data name. In addition to the data names and data items, there is a command to indicate if data items are repeated or "looped". This is the "loop_" command. Repeated data is terminated when another data name is encountered or the block is finished.
The syntax of a STAR file is straightforward.
A text string is defined as either a sequence of non-blank characters, a sequence of characters bounded by matching single or double quotes, or a sequence of lines bounded by a semicolon (;) as the first character of a line. A text string must not span more than one line, unless bounded by semicolons.
A data name is a text string starting with an underline "_".
A data item is a text string not starting with an underline "_", and preceded by the identifying data name.
A data loop is a list of data names, followed by a repeated list of data items, and preceded by the text string "loop_".
A save frame is a sequence of data names, data items and data loops preceded by the text string "save_framecode" where "framecode" is a unique identifying code within a data block. A save frame sequence is closed by another save frame command, by the text string "stop_" or by a data block command.
A data block is a sequence of data names, data items, data loops and save frames preceded by the text string "data_blockcode" where "blockcode" is a unique identifying code within the STAR file. The data block sequence is closed by another data block or the end of the STAR file.
A data name must be unique within each save frame sequence and a data block sequence. A save frame declaration must be unique within a data block sequence. The save frame code may be referred to within a data block as the data item "$framecode".
Except if contained within a text string, a sequence of blank or tab characters is used only to separate text strings.
Except if contained within a text string, a single hash character "#" signals that the characters following on a line are used for comment only.