SB(1) SB(1) NAME sb - data base management system of the STAR_Base Project SYNOPSIS sb [ -o output_file ] [ -e error_file ] [ -d dictionary_file ] [ -r request ] [ -R request_file ] [ -v verbosity_level ] [ -f ] [ input_file... ] AVAILABILITY This System may be available on Unix, IBM PCs or Macintoshes. That is to say, almost anything with an ANSI C compiler. DESCRIPTION sb parses the input_files (default is standard input) according to the STAR language, validating them against a given dictionary_file, and writing the result of selection requests as a complete STAR file to the output_file (default is standard output). OPTIONS -o output_file Redirect the system's normal output to the named out- put_file. The default is to write to standard output. -e error_file Redirect any diagnostic messages to the named error_file. The default is to write them to standard error. -d dictionary_file Read the Dictionary Specification from the named dictio- nary_file and validate the input against it. The default is to not validate the input. -r request Select that data from the input which satisfies the given request, and write it as a valid STAR file to the output. See USAGE/Request Language for more detail on the form of a request. The default is to perform no selection requests. -R request_file Take the requests from the named request_file. Multiple -r and -R options are allowed. They are exe- cuted in the order they occur in the command line. -v verbosity_level Reset the extent to which the user is informed of the system's progress. There are four possible settings to the verbosity_level : none Suppress all diagnostic messages. error Give only those messages which result from the system detecting a fatal error. inform Provide any error messages plus brief messages as to the system's progress through its vari- ous stages. debug Provide any error or inform messages plus a verbose listing of the system's step by step progress. The default is to report messages to standard error at inform level. See DIAGNOSTICS for more information. -f Filter mode, ignore any -r or -R option. Once the input_file has been parsed and validated write it straight to the output_file. The default is to only write the result of selection requests to the output. input_file Read the input from the named input_files. Multiple files are concatenated in the order they appear in the command line. The default is to read from standard input. USAGE STAR Language The STAR language is the means by which the data stored in a STAR file is defined. a) A STAR file is a sequence of data_blocks and global_blocks. b) A data_block is written as: data_name sequence_of_data_items Where data_name is unique within a STAR file. c) A name is an identifier made up of non-blank ASCII characters. It must be at least one character long, and is terminated by a blank. d) A sequence_of_data_items is a sequence of data in any of the following forms: i _name text_string ii loop_structure iii save_block Where _name is unique within the sequence_of_data_items. e) A text_string is sequence of ASCII characters bounded by match- ing spaces ( ASCII(9), ASCII(32), end of line), " ( ASCII(34)) or a line whose first character is a semi-colon ( ASCII(59)). A frame_code is a text_string whose first character is a dollar sign ( ASCII(36)). f) A loop_structure is the mechanism by which one piece of data may have more than one value. It is written as: data_loop_definition sequence_of_text_strings g) A data_loop_definition is written as: loop_ sequence_of_loop_fields h) A sequence_of_loop_fields is a sequence of: i _name ii data_loop_definition stop_ i) A sequence_of_text_strings is a sequence of: i text_string ii sequence_of_text_strings stop_ Whose number is a multiple of the number of fields in the corresponding data_loop_definition. Each multiple is known as a loop_packet. j) A save_block is used to group related data. It is written as: save_name sequence_of_data_items save_ The save_name which names a save_block must be unique within the sequence_of_data_items from whence it came. A save_block can be referenced by another piece of data in the same data_block by a frame_code of the form: $name. k) A global_block is the means by which data can be defined to exist in all following data_blocks in a file. It is written in the form: global_ sequence_of_data_items If a _name or save_name exists in a global_block then it is assumed to exist in all following data_blocks. If it appears in any subsequent global_blocks or data_blocks, then the later appearance takes precedence over the former. Dictionary Specification A dictionary specification is a STAR file where each data_block speci- fies the limits placed upon a piece of data. Request Language Data can be requested from the input by the means of a request lan- guage. This request language consists of a sequence of requests, each of which are considered in turn. The data is selected from the input file and appended to the output file. Any ambiguities or conflicts that may arise due to the appending of data is resolved by giving precedence to the existing contents of the output file. a) A request may be either, a data_request; a conditional_request, or a branching_request. b) A data_request is a space terminated string. It is used to select data by matching the data_names, save_head- ings, _names or global_names present within the input file. This string may contain the wild card characters * and ?. * matches any sequence of characters and ? any single character. c) A conditional_request selects all the data from the input file that satisfies the given condition. The condition can also have a value associated to it. That value can be either, TRUE for the case where the condition is satisfied; FALSE for when the condition is not satisfied, or UNKNOWN for the case when the condition is not satisfied because the data requested can not be found. The conditional_request can be either in the form of: data_request As per the previous description of a data_request. conditional_request operator text_string Allows the operation to act against the given constant. conditional_request & conditional_request Selects that data which is in both of the condi- tional_requests. This is comparable to the the intersec- tion of the two conditional_requests. conditional_request | conditional_request Selects all the data from both of the condi- tional_requests. This is comparable to the the union of the two conditional_requests. If necessary, a backslash can be used instead of |. ! conditional_request Selects all the data that does not make up the condi- tional_request. This is comparable to the the complement of the conditional_request. assume_true_ conditional_request This operator forces the value of the condition to be TRUE when the value of the given conditional_request is UNKNOWN. In any other case the value of the predicate is the same as the value of the conditional_request. d) An operator can be either: ~= Selects data from the conditional_request that is identi- cal to the text_string. ?= Selects data from the conditional_request that contains data from the text_string as a sub-string. ~< selects data from the conditional_request that is less than the text_string in ASCII order. ~> selects data from the conditional_request that is greater than the text_string in ASCII order. Similarly, the negation of these operators exist as: ~!=, ?!=, ~<= and ~>=. = Selects data from both the conditional_request that is numerically equal to the numerical value of the text_string. < Selects data from the conditional_request that is numeri- cally less than the text_string. > Selects data from the first conditional_request that is numerically greater than the text_string. Similarly, the negation of these operators exist as: !=, <= and >=. e) A branching_request allows requests to be made depending on the result of a condition. It takes the form of, if_ condition branch_request [ else_ branch_request ] [ unknown_ branch_request ] endif_ f) A condition is the same as a conditional_request except that the data requested is not appended to the output file, but rather is used to determine an alternative input file for use in the sub- sequent branch_requests. The value of the condition is used to determine which branch_request is made. If the value of the condition is TRUE then the first branch_request is made. If the value of the condition is FALSE then the else_ branch_request is made. If the value of the condition is UNKNOWN then the unknown_ branch_request is made. In the case where there is no unknown_ branch_request then the else_ branch_request is made. Other than in this case, if a branch is omitted and a condition leads to it, then no branch_request is performed and the next request following the endif_ is performed. g) A branch_request can be a sequence of either: conditional_request branching_request scope_setting branch_requestendscope_ h) A scope_setting uses the alternative input file generated from the condition to temporarily change the input file from which requests are made. A scope_setting can be either: scope_data_item_ Restricts the scope of the input file to be only that data which satisfies the condition. scope_loop_packet_ Expands the data which satisfies the condition to include that data which is in the same loop packet in the current input file. The input file is then restricted to this expanded data. scope_loop_structure_ Expands the data which satisfies the condition to include all the data which occurs in the same loop definition in the input_file. scope_save_frame_ Expands all the save frames which satisfy the condition to their full extent, as in the original input_file. scope_data_block_ Expands all the data_blocks which satisfy the condition to their full extent, as in the original input_file. scope_file_ Reverts the current input file back to the original input file. Where no explicit scope_setting ...endscope_ is made, then the current scope setting is inherited from the previous one. Where there is no previous scope setting, the original input file is used. DIAGNOSTICS Diagnostic messages come in three levels of significance. From most significant to least significant they are: error, inform and debug. Error messages occur whenever sb detects that something has gone wrong. The cause is usually a mistake in the user's input. If the error is an internal one, then the name of the function detecting the error is given. All such internal errors should be reported. sb will attempt to proceed with processing whenever it can. Inform messages occur at strategic points through out the progress of sb. The exist to keep the user updated as to the progress of the sys- tem. They are particularly useful in maintaining the user's confidence when processing large files. Debug messages are are extremely verbose and are intended for the use of the maintainer. They may be used to determine what is going wrong if an error message is not sufficiently informative. FILES /usr/local/lib/CIF.dic CIF standard dictionary SEE ALSO any other manuals or user documentation we may write possibly the Specs and Design Documentation Hall, S.R. The STAR File: A New format for Electronic Data Transfer and Archiving J.Chem.Inf.Comput.Sci. Vol.31, No. 2, 1992, 326-333 CAVEATS This manual entry is associated with the beta release. BUGS No bugs are known to date. 1/27/93 SB(1)