[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Important CIF items for discussion

To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <[email protected]>
Subject: Re: Important CIF items for discussion
From: "James Hester" <[email protected]>
Date: Tue, 22 Jul 2008 12:04:22 +1000
In-Reply-To: <[email protected]>
References: <[email protected]><[email protected]><[email protected]>

Herbert wrote:

>   Allowing a dictionary to be the value of a tag is a reasonable
> extension of the range of possible URI's for dictionaries, but
> to avoid ambiguities we need to include such tag-located dictionaries
> within the DDLm import paradigm, so that we know precisely where
> within the composite virtual dictionary the tag-located dictionaries
> should be placed.

Yes, we should come up with a way of referring to such embedded
dictionaries in an import statement, or else we could use the
dictionary URI defined within the dictionary itself.  For this latter
to work, any item names corresponding to dictionaries need to be
pre-parsed before the full dictionary processing step is embarked upon
- a reasonable requirement I think.  Those programs wishing to be
dictionary aware would follow the following steps:

1. Parse the CIF file
2. Pre-process (ie remove escapes from) the data item(s) containing
dictionaries
3. Parse these data item(s)
4. Add the parsed dictionary URIs to the list of known URIs
5. Locate dictionaries based on the contents of the _audit_conform
data items and proceed as usual

>   With respect to the use of \; to allow embedded text fields within
> text fields, we need to deal with the long-standing use of \; as
> ogonek, and other existing uses of \.  Rather than continue flagging
> special treatment of text fields in context, I would suggest
> adding the syntax and semantics of a dictionary text field as
> a DDLm data type.

A data point: the "\;" digraph occurs 319 times in the 32377 files in
the IUCr CIF archive as of June 08, overwhelmingly inside author's
names.  There are currently no files containing
<EOL><backslash><semicolon>.

I would fiddle with Howard's suggestion here of defining a (presumably
single) DDLm data type for the text field.  In general, a domain
dictionary should be free to define the content of text fields e.g.
LaTeX, or MIME, or a set of possible escape characters, etc.  What
DDLm can provide is a few templates for inclusion in dictionary
definitions of those data items which have plain text values.
This/these template(s) would simply consist of a description of the
allowed escape characters and any other special conventions.  As I see
it, the role of such definitions would not be to help machine
interpretation, but as a definition for human readers to help them in
producing conformant text that will be understood by downstream
interpreters of text delivered by a CIF parser.  Note that I am not
proposing that the contents of such text fields could or would be
validated by CIF software, but downstream recipients of the content
are free to do so.

For example, the definition text for the IUCr publication_ item names
would contain a potentially long description (or URI reference)
describing all of the escapes available which would be understood by
the Chester software.  The DDLm dictionary itself could define some
conventions for writing domain dictionary text, which would allow
automated typesetting.

In the present case (embedded dictionaries) the downstream application
is the CIF parser itself.  Assuming we don't plan to apply a dREL
method to the escaped text to produce the pure text, the human authors
of a given CIF parser will be the source of the de-escaping code.
Therefore, it is sufficient to include a statement in the descriptive
text of the _audit domain dictionary stating the method of escaping
<EOL><semicolon> digraphs.  Note that it is not desirable to add this
<EOL><semicolon> escape to our general text template described in the
previous paragraphs and then just include that template in the _audit
dictionary, because including this general template will also
potentially include all sorts of other escapes which we don't want to
use in this special case (e.g. we don't want to re-escape already
escaped text in the embedded definitions).  In this particular case,
we want this one escape only.

Best wishes,
James.

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

Reply to: [list | sender only]

References:

Important CIF items for discussion (David Brown)

Re: Important CIF items for discussion (James Hester)

Re: Important CIF items for discussion (Herbert J. Bernstein)

Prev by Date: Re: Important CIF items for discussion

Next by Date: Re: Important CIF items for discussion

Prev by thread: Re: Important CIF items for discussion

Next by thread: Re: Important CIF items for discussion

Index(es):

Date

Thread

Discussion List Archives

Re: Important CIF items for discussion