[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Important CIF items for discussion
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Important CIF items for discussion
- From: Joe Krahn <krahn@niehs.nih.gov>
- Date: Thu, 17 Jul 2008 13:12:05 -0400
- In-Reply-To: <279aad2a0807162343r2fa9b0cby1b31a2845c273f69@mail.gmail.com>
- References: <48777D55.6050606@mcmaster.ca><279aad2a0807162343r2fa9b0cby1b31a2845c273f69@mail.gmail.com>
I'm not a database developer. My interest is more towards making CIF more useful in local software before it gets deposited to a standard database. So, James Hester wrote: > My thoughts on the three issues raised by David. > > 1. Virtual dictionaries > > I favour option D (the dictionary is contained within the datablock as > the text value of a dataitem). This ensures that the dictionary > remains as close as possible to the dataitems it is concerned with. > Note that creating a CIF with such a built-in definition in no way > forces the IUCr to condone the content of that definition: if someone > were to define betas in their CIF and then submit a CIF with betas > rather than UIJs, the IUCr remains free to act as it has in the past > (especially if the beta definition boils down to a description along > the lines of "beta i j as conventionally defined"). I suggested something like this recently on this list, but got no interest. Arguments against it were that non-standard data just makes things complicate database management. I favor it because not all data is database oriented, and that this would allow easy addition of non-standard data, which is often needed when experimenting with new methods. The contained virtual dictionary can specify the parent dictionary. The same syntax could be used to just specify the associated dictionary that the data block conforms to. The parent dictionary referenced could also be a partial dictionary which references another higher-level dictionary to establish a dictionary hierarchy. > > The technical issues boil down to being able to escape the > "<EOL><semicolon>" digraphs in the embedded dictionary text, which > would otherwise prematurely end the datavalue. Some suggestions: This is definitely a problem. A single text datablock should be able to hold an entire CIF dictionary without having to indent the text. > > (1) Define "<backslash><semicolon>" as being an escaped semicolon (and > this would require defining "<backslash><backslash>" as an escaped > backslash in order to cope with those situations in which you actually > want the text that comes out to be "<backslash><semicolon>"). > Obviously the escaping character doesn't have to be backslash. > > (1a) Define "<EOL><backslash><semicolon>" to be an escaped "<EOL><semicolon>". I made a similar suggestion over a year ago. Nobody wants to define special escape sequences, do to interference with the set codes for special characters. > > (2) Substitute <EOL><hash> for <EOL> in the entire text field. This > immediately signals to the human reader that the entire text field is > a single block, rather than bits of a CIF file (same as quoting in > emails), and is easily reversible. And nestable, but let's not go > there... Another alternative is to just use MIME encapsulation, already defined for Binary CIF. As it is implemented there, the data block is all base-64 encoded, which contains no semicolons. But, MIME can also encapsulate unencoded text, by defining the end-of-data marker. This requires the low-level parser to understand the MIME format. I should also mention that the newer STAR format has a bracketed list format that includes backslash escape sequences to allow contained items to have a bracket. If this is adopted by CIF, that may be enough of a precedent to support backslash-escapes for beginning-of-line semicolons as well. Here is a snippet of my idea for storing dictionary data within a datablock, using a SAVE frame, so that it is part of the actual CIF syntax rather than using a text block, which means the data has to be double-parsed. By using save frames, it is easy to avoid conflicts, because CIF currently restricts their use to dictionaries. I would also support nested SAVE frames so that the whole dictionary syntax can be fit inside of one parent SAVE frame, rather than having to split it. data_cns_mtf save__dictionary _item.name '_cns_mtf.title' _item.category_id cns_mtf _item_type.code text save_ _cns_mtf.title ; FILENAME="10.mtf" Written by O version 9.0.3 Thu Feb 22 16:54:17 2007 DATE:23-Feb-07 10:46:23 created by user: krahn VERSION:1.1 ; ...
Reply to: [list | sender only]
- Follow-Ups:
- Re: Important CIF items for discussion (James Hester)
- References:
- Important CIF items for discussion (David Brown)
- Re: Important CIF items for discussion (James Hester)
- Prev by Date: Re: Important CIF items for discussion
- Next by Date: Re: Important CIF items for discussion
- Prev by thread: Re: Important CIF items for discussion
- Next by thread: Re: Important CIF items for discussion
- Index(es):