[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Important CIF items for discussion
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Important CIF items for discussion
- From: "James Hester" <jamesrhester@gmail.com>
- Date: Tue, 22 Jul 2008 12:04:22 +1000
- In-Reply-To: <20080717081900.A94274@epsilon.pair.com>
- References: <48777D55.6050606@mcmaster.ca><279aad2a0807162343r2fa9b0cby1b31a2845c273f69@mail.gmail.com><20080717081900.A94274@epsilon.pair.com>
Herbert wrote: > Allowing a dictionary to be the value of a tag is a reasonable > extension of the range of possible URI's for dictionaries, but > to avoid ambiguities we need to include such tag-located dictionaries > within the DDLm import paradigm, so that we know precisely where > within the composite virtual dictionary the tag-located dictionaries > should be placed. Yes, we should come up with a way of referring to such embedded dictionaries in an import statement, or else we could use the dictionary URI defined within the dictionary itself. For this latter to work, any item names corresponding to dictionaries need to be pre-parsed before the full dictionary processing step is embarked upon - a reasonable requirement I think. Those programs wishing to be dictionary aware would follow the following steps: 1. Parse the CIF file 2. Pre-process (ie remove escapes from) the data item(s) containing dictionaries 3. Parse these data item(s) 4. Add the parsed dictionary URIs to the list of known URIs 5. Locate dictionaries based on the contents of the _audit_conform data items and proceed as usual > With respect to the use of \; to allow embedded text fields within > text fields, we need to deal with the long-standing use of \; as > ogonek, and other existing uses of \. Rather than continue flagging > special treatment of text fields in context, I would suggest > adding the syntax and semantics of a dictionary text field as > a DDLm data type. A data point: the "\;" digraph occurs 319 times in the 32377 files in the IUCr CIF archive as of June 08, overwhelmingly inside author's names. There are currently no files containing <EOL><backslash><semicolon>. I would fiddle with Howard's suggestion here of defining a (presumably single) DDLm data type for the text field. In general, a domain dictionary should be free to define the content of text fields e.g. LaTeX, or MIME, or a set of possible escape characters, etc. What DDLm can provide is a few templates for inclusion in dictionary definitions of those data items which have plain text values. This/these template(s) would simply consist of a description of the allowed escape characters and any other special conventions. As I see it, the role of such definitions would not be to help machine interpretation, but as a definition for human readers to help them in producing conformant text that will be understood by downstream interpreters of text delivered by a CIF parser. Note that I am not proposing that the contents of such text fields could or would be validated by CIF software, but downstream recipients of the content are free to do so. For example, the definition text for the IUCr publication_ item names would contain a potentially long description (or URI reference) describing all of the escapes available which would be understood by the Chester software. The DDLm dictionary itself could define some conventions for writing domain dictionary text, which would allow automated typesetting. In the present case (embedded dictionaries) the downstream application is the CIF parser itself. Assuming we don't plan to apply a dREL method to the escaped text to produce the pure text, the human authors of a given CIF parser will be the source of the de-escaping code. Therefore, it is sufficient to include a statement in the descriptive text of the _audit domain dictionary stating the method of escaping <EOL><semicolon> digraphs. Note that it is not desirable to add this <EOL><semicolon> escape to our general text template described in the previous paragraphs and then just include that template in the _audit dictionary, because including this general template will also potentially include all sorts of other escapes which we don't want to use in this special case (e.g. we don't want to re-escape already escaped text in the embedded definitions). In this particular case, we want this one escape only. Best wishes, James. T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148
Reply to: [list | sender only]
- References:
- Important CIF items for discussion (David Brown)
- Re: Important CIF items for discussion (James Hester)
- Re: Important CIF items for discussion (Herbert J. Bernstein)
- Prev by Date: Re: Important CIF items for discussion
- Next by Date: Re: Important CIF items for discussion
- Prev by thread: Re: Important CIF items for discussion
- Next by thread: Re: Important CIF items for discussion
- Index(es):