This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
[IUCr Home Page] [CIF Home Page] [mmCIF Home Page]

Re: Conventions for item and category descriptions?

Brian McMahon (bm@iucr.org)
Tue, 13 Jan 1998 10:10:11 +0000 (GMT)


Dale Tronrud asks some good questions about the presentation of information
in the mmCIF dictionary. The conventions employed are for the most part
common to the core and other CIF dictionaries, so I feel I might chip in my
two cents worth from that broader context.
 
> While I realize that descriptions are free format text and
> the DDL does not restrict their content I am curious as to any
> conventions that have been or might be adopted for the
> descriptions in the official mmCIF dictionary.
> 
> For example, if someone submitted a tag definition in Spanish
> would it be accepted or must mmCIF descriptions be in English?
> If English is required is there a preference in spelling
> convention (US vrs UK)?

There's no rule that restricts the text part of the definitions to be
English, though one would prefer for consistency that the master dictionary
were in English - for better or worse, the current lingua franca of science
- throughout. The US/UK English divide is rarely a matter of heated
argument, though there are some differences in style between the core and
mmCIF dictionaries (and these are of course apparent in the core "image"
that is embedded in mmCIF). Definition writers are encouraged to adopt as
mid-Atlantic a style as possible. Notice, however, that components of data
names are constrained to some extent, and new data names are best selected
in accordance with existing conventions: components relating to colour
should appear as "_colour_" and not "_color_", for example. COMCIFS has put
together a list of abbreviations found in data name components (in a file
found from the IUCr CIF home page http://www.iucr.org/iucr-top/cif/) - would
it be useful to have a similar file of approved word components?
 
> A similar issue is the (un)desirability of HTML formating tokens.
> I received a tag definition which contained " "'s.  Clearly
> the person had simply cut the definition out of a web page.  My
> question is; does the mmCIF to HTML converters pass through these
> tokens or "escape" them out and make them visible to the reader?
> Should they be avoided or used?  While a non-breaking space is
> rather boring, there are other characters, such as a proper Angstrom
> symbol, that could be incorporated if HTML were allowed in mmCIF
> descriptions.
> 
> My last question has to do with embedded mathematics.  I find
> it rather difficult to read typewriter math.  While I can figure it
> out, usually, I find a nicely typeset equation much easier.  If
> the mmCIF to HTML converter was to incorporate some of the code
> from LaTeX2HTML one could enter equations into the description in
> LaTeX and, when viewed in a browser, see a GIF image of a nicely
> formatted equation.  The down side to this is that LaTeX does require
> some study and practise to write while typewriter math can be banged
> out pretty easily, and the raw mmCIF dictionary would be less
> accessible because the unprocessed math would be harder to read.  Is
> there a place for LaTeX in mmCIF?

These two questions address essentially the same point. The decision was
made at the beginning of CIF to have minimal coding for non-ASCII
characters (the actual codings permitted are listed in the IUCr Guide to CIF
for Authors, also available through the CIF Home Page). This permits the
angstrom symbol to be coded as \%A, for example, but is not very attractive
to the eye; and can't handle complex maths at all.

We've been thinking for some time of how to address this. The cleanest way
at present is probably to have multiple renderings of each definition (in
text with typewriter maths, in HTML and in TeX, say), and each rendering
should be selected for its designed display purpose - HTML for a web page,
TeX for the typeset version, ascii for a "glass teletype".

It's not quite as straightforward as it may seem - HTML and LaTeX are,
strictly, structured document markup schemes, and to be used entirely
properly, the entire dictionary would need to be in HTML or LaTeX. Of course
the intention is just to use the relevant subset of the markup that makes it
easy to render an Angstrom or an integral sign; but there would need to be a
proper and properly maintained concordance of the codings permitted in each
scheme. Because SGML (of which HTML is a subset) and TeX (of which LaTeX is
a superset) are both metalanguages, the meanings associated with any coding
string ( , \int, whatever) could in principle change with a different
declaration - it's even possible to run TeX with an instruction set that
recognises codes of the HTML-like form <BODY> as meaningful codes.

Well, you've probably dropped off to sleep by now. Suffice it to say that if
this discussion raises significant interest, I'd be very happy to help
explore the questions of maintaining multiple formats further.

_______________________________________________________________________________
Brian McMahon                                             tel: +44 1244 342878
Research and Development Officer                          fax: +44 1244 314888
International Union of Crystallography                  e-mail:  bm@iucr.ac.uk
5 Abbey Square, Chester CH1 2HU, England                         bm@iucr.org
(Coordinating Secretary, COMCIFS)