[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Westbrook's draft dictionary
- To: Multiple recipients of list <imgcif-l@bnl.gov>
- Subject: Re: Westbrook's draft dictionary
- From: John Westbrook <jwest@ndbdev.Rutgers.EDU>
- Date: Thu, 30 Jan 1997 13:41:36 -0500 (EST)
Greetings, Here is some follow-up to David Brown's comments... I. David Brown wrote: > > I had a look at John's draft dictionary. I may be missing > something but it seems to me that this draft presents a solution > that we discussed earlier and rejected. > > He treats the binary file as a piece of STAR text set between > semicolon delimiters. There are a number of difficulties with this > seemingly simple solution. Firstly the STAR definitions requires all > fields to contain only ascii characters. Secondly carriage returns are > used to terminate lines even within text strings without themselves being > part of the text. Finally there is no guarantee that a binary string will > not contain the code for 'CR ;' thereby terminating the string in the > middle. Any binary sequence cannot, in the nature of things, be > self-terminating, its length has to be specified externally, and this is > contrary to all the principles of STAR. I would be delighted to discover > that this problem is overcome in DDL2, but it seems to me insurmountable. These observations are quite correct and I apologize for leaving out some important implementation details of the embedded binary data item approach that Andy and I had discussed off-line. My suggestion for how to overcome the parsing problem is to treat the binary data items like variable length network packets. A binary data item might be look something like the following: {data_length, [chksum or some other kind of signatures ...], data} The advantage here is that you have a rather simple mechanism that would permit the integration of binary data into CIF-like files. The disadvantage is that it breaks all of the STAR and CIF conventions. > That is why we have been leaning towards a fully binary file with an > extractable ascii header that when extracted is cif compatible. > # # Pure ascii block ... data_experiment1 _entry.id experiment1 loop_ _entry_link.id _entry_link.entry_id _entry_link.details binary_block_1 experiment1 'First binary data set' binary_block_2 experiment1 'Second binary data set' # # Define encoding details # loop_ _array_structure.id _array_structure.byte_order _array_structure.encoding_type dataset1 big_endian 64_bit_real_ieee dataset2 big_endian 64_bit_real_ieee # # Define the organization # loop_ _array_structure.array_id _array_structure.index _array_structure.dimension _array_structure.precedence _array_structure.direction dataset1 1 10 1 increasing dataset1 2 100 2 decreasing dataset2 1 20 1 increasing dataset2 2 20 2 increasing # # First binary block ... # data_binary_block_1 _entry.id binary_block_1 _entry_link.id experiment1 _entry_link.entry_id binary_block_1 _entry_link.details 'Contains the description of my binary data' _array_data.array_id _array_data.data dataset1 {1000,FFAD00A,0FFFFA82774688299A9A9A9A99A9ADFA897255377377 .... .........} # # Second binary block # data_binary_block_2 _entry_link.id experiment1 _entry_link.entry_id binary_block_2 _entry_link.details 'Contains the description of my binary data' _array_data.array_id _array_data.data dataset2 {400,FFAD00A,0FFFFA82774688299A9A9A9A99A9ADFA897255377377 ..... .........} #-------------------------------------------------------------------------- Following David's suggestion, the binary data is segregated in separate datablocks. In the pure ascii datablock, the entry_link items identify the blocks containing the binary items. CIF provides no mechanism for referencing data items between data blocks. Hence, it is not possible to specify that 'dataset1' resides in the data block named 'binary_block_1. However, in this example we stretch the significance of the entry_link items to mean that the indicated data blocks are required to resolve all of the data items referenced in the current block. In this way you are essentially specifying a search list of data blocks in the current file. In the present example since _array_data.array_id is the child of _array_structure.id, and correspondingly, a link is specified in each binary block pointing to the ascii block where the the parent item is defined. In the ascii block their is no formal necessity for the entry_link specification as the binary blocks that are referenced only contain child data items. The inclusion of these items does seem a convenience from an organization point of view. Given the above example, one could simply separate the file on data block boundaries and be left with one ascii CIF and two other files that would require a bit of special treatment. -- ****************************************************************** * John Westbrook Ph: (908) 445-4290 * * Department of Chemistry Fax: (908) 445-4320 * * Rutgers University * * PO Box 939 e-mail: jwest@ndb.rutgers.edu * * Piscataway, NJ 08855-0939 * ******************************************************************
Reply to: [list | sender only]
- Prev by Date: Westbrook's draft dictionary
- Next by Date: Re: Westbrook's draft dictionary
- Prev by thread: Westbrook's draft dictionary
- Next by thread: Re: Westbrook's draft dictionary
- Index(es):