[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Treatment of Greek characters in CIF2
- Subject: Treatment of Greek characters in CIF2
- From: James Hester <jamesrhester@xxxxxxxxx>
- Date: Thu, 20 Apr 2017 15:38:12 +1000
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;h=mime-version:from:date:message-id:subject:to;bh=p0LM3/+XmA85+RqvG4LuL4G3UqN2AdrwEqL6Su5NOgU=;b=Bg9QIjKv/Isgrhgqqb+LVSvMmy16AvSvzUXf93I3q1sQGkNOxzNHsb+xyYQfFjyqzSEN0i2Aq0Sr+80RoMPCW84Xcxl4K6uCinGq9tw1mMCqGfhwWR+4tYmGgeSDhVyroHlu6Qe5BNlusPTBEXRMtJiDHcm+53fuSJvJVDn/dk6TCDOny12FEQDs59mGuSOpULRzR5rc2RrGMlhz9hdjoq1Jo56rwt1G45qSn1GhNKokalte+q2NsxOeVIHOzOpnoI66MsLSfYz2O5suBs4yHLGJfw5jQWkS3WmXvmdsHSvrdGKR97DQVvGCRIo1uJ4s0OdItBOXgNdC1NwE+LcjYw==
According to section 2.2.7.4.13 - 17 of International Tables Vol G, by default Greek and some other non-ASCII characters can be represented in text datavalues using a backslash notation <backslash><ascii character>, e.g. \a is alpha. Different markup conventions are possible on a per-dictionary or per-definition basis. In CIF2, these characters can be represented natively, but legacy CIF applications presented with a datavalue containing non-ASCII values may not be prepared to typeset or present them appropriately. On the other hand, it would seem inefficient to define separate Unicode-aware datanames for every text value simply to avoid legacy problems.
I view this as a secondary issue in that very few if any machine-interpretable datavalues rely on the backslash convention (perhaps a few enumerated values for X-ray tube radiation type?).
The problem arises only with CIF2 (obviously). As a starting suggestion, what do you think of the following as a notional additional paragraph for the above-mentioned section 2.2.7.4?
(2.2.7.4.18) Whenever an application is required to convert a datavalue from a CIF2 datafile containing code points outside the ASCII range to a datavalue containing only ASCII codepoints, the appropriate markup as per paragraphs 2.2.7.4.13-16 should be substituted, provided that the relevant definition meets the requirements of paragraph 2.2.7.4.13. If no markup is defined for the Unicode code point, no CIF1 equivalent value exists and application behaviour is undefined.
I toyed with the idea of allowing '\Uxxxxxx' for arbitrary Unicode code points, but (i) this would clash with '\U' for capital upsilon and (ii) is not expected by legacy applications and so would therefore require that they be updated, in which case adapting them to just ingest Unicode would be more straightforward.
Thoughts?
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________ cif-developers mailing list cif-developers@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- RE: Treatment of Greek characters in CIF2 (Bollinger, John C)
- Re: Treatment of Greek characters in CIF2 (Andrius Merkys)
- Prev by Date: Re: Draft JSON specification, round 2
- Next by Date: Re: Draft JSON specification, round 2
- Prev by thread: Re: CIF-JSON new draft
- Next by thread: Re: Treatment of Greek characters in CIF2
- Index(es):