[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Treatment of Greek characters in CIF2

Subject: RE: Treatment of Greek characters in CIF2
From: "Bollinger, John C" <John.Bollinger@xxxxxxxxxx>
Date: Thu, 20 Apr 2017 14:10:13 +0000
Accept-Language: en-US
authentication-results: iucr.org; dkim=none (message not signed)header.d=none;iucr.org; dmarc=none action=none header.from=STJUDE.ORG;
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=SJCRH.onmicrosoft.com; s=selector1-stjude-org;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;bh=9mfuJNmIXqdHfsE14fahrbdrhCvlmoDhLrpavQWpmgw=;b=Ek/jNTackNFDgqRpD/iF/wT9liONvmzEm9yxwfcYpHCHzSsRvgv5NLqxfHNmTRqoogJ1u24gEia66U1CD6bP+N0YVhn1QNc/pQxvsYZyWVZ9zXPxuaG47Hexwdp7b2DM06jwSo7r0NWG7Eiy09KOo0pO1kDTXHAY6HicV0BzV9Q=
In-Reply-To: <CAF_YUvX+3ptR1e18wCqH3BKQ9A2=B90mVQ8MVtXePzpKa2=Vbg@mail.gmail.com>
References: <CAM+dB2d5NCbCb1Zc_QS3KkjscDH7Sk9NQVbQxhLn0nPtO6E+zA@mail.gmail.com><MWHPR04MB0512FE67D8266ED57567F119E01B0@MWHPR04MB0512.namprd04.prod.outlook.com><CAF_YUvX+3ptR1e18wCqH3BKQ9A2=B90mVQ8MVtXePzpKa2=Vbg@mail.gmail.com>
spamdiagnosticmetadata: NSPM
spamdiagnosticoutput: 1:99

I’m all for using CIF2 over CIF1 where we can, and I favor recommending that software authors recognize and correctly handle CIF2. But as long as there exists software that we want to use but that doesn’t handle CIF2, we do need some way to deal with that. What I meant to convey in in my previous message is that I don’t think we need or want to formulate additional, centralized rules for interoperation purposes. Doing so presents a semantic problem, and it doesn’t really solve the problem of producing CIF1 data in a form that any specific program will interpret as desired.

John

From: cif-developers [mailto:[email protected]] On Behalf Of Robert Hanson
Sent: Thursday, April 20, 2017 8:59 AM
To: Forum for CIF software developers <[email protected]>
Subject: Re: Treatment of Greek characters in CIF2

I strongly recommend that all non-CIF2 data be transformed into CIF2 format. We have an opportunity to help with CIF1 -> CIF2 transition via CIF1 -> CIF2-JSON -> CIF2 even. The METADATA should reflect that this was done. Let's be progressive, not regressive. If need be, software can revert that to CIF1.

Is Brian McMahon listening? If so, Brian, can you chime in here?

Bob

On Thu, Apr 20, 2017 at 8:42 AM, Bollinger, John C <[email protected]> wrote:

On Thursday, April 20, 2017 12:38 AM, James Hester wrote:
> According to section 2.2.7.4.13 - 17 of International Tables Vol G, by default Greek and some other non-ASCII characters can be represented in text datavalues using a backslash notation <backslash><ascii character>, e.g. \a is alpha. Different markup conventions are possible on a per-dictionary or per-definition basis. In CIF2, these characters can be represented natively, but legacy CIF applications presented with a datavalue containing non-ASCII values may not be prepared to typeset or present them appropriately. On the other hand, it would seem inefficient to define separate Unicode-aware datanames for every text value simply to avoid legacy problems.
>
> I view this as a secondary issue in that very few if any machine-interpretable datavalues rely on the backslash convention (perhaps a few enumerated values for X-ray tube radiation type?).
The problem arises only with CIF2 (obviously). As a starting suggestion, what do you think of the following as a notional additional paragraph for the above-mentioned section 2.2.7.4?
> (2.2.7.4.18) Whenever an application is required to convert a datavalue from a CIF2 datafile containing code points outside the ASCII range to a datavalue containing only ASCII codepoints, the appropriate markup as per paragraphs 2.2.7.4.13-16 should be substituted, provided that the relevant definition meets the requirements of paragraph 2.2.7.4.13. If no markup is defined for the Unicode code point, no CIF1 equivalent value exists and application behaviour is undefined.
>
> I toyed with the idea of allowing '\Uxxxxxx' for arbitrary Unicode code points, but (i) this would clash with '\U' for capital upsilon and (ii) is not expected by legacy applications and so would therefore require that they be updated, in which case adapting them to just ingest Unicode would be more straightforward.
>
>
> Thoughts?

I think that to whatever extent and in whatever context the proposed specifications are natural, they do not require any special more force or support behind them. On the other hand, where they are _not_ natural, they _should not have_ any force or support behind them. Consider in particular that ITVG 2.2.7.4.13 conditions use of the markup codes on permission from the relevant dictionary, therefore any rule that directs conversions to those coded forms without regard to items' definitions is inappropriate.

I'm inclined to say that the best available approach is for software to use (some encoding of) Unicode as the internal representation of CIF data, and to convert that as needed to appropriate external forms, where the relative propriety of different external forms is context-dependent. CIF 1.1 with traditional CIF markup codes (+/- some convention for general Unicode characters) is a widely-recognized and used external form, so it may often be the best choice, but that's no justification for asserting that it should be the only choice, especially given its well-known limitations.

I acknowledge that it poses a clear risk of data corruption and/or misinterpretation if different external forms cannot easily be distinguished from each other, but I don't think that's a problem that we can reasonably hope to solve by fiat.

John

________________________________

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer

_______________________________________________
cif-developers mailing list
[email protected]
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want,
it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

_______________________________________________
cif-developers mailing list
[email protected]
http://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]

Follow-Ups:

Re: Treatment of Greek characters in CIF2 (Robert Hanson)

References:

Treatment of Greek characters in CIF2 (James Hester)

RE: Treatment of Greek characters in CIF2 (Bollinger, John C)

Re: Treatment of Greek characters in CIF2 (Robert Hanson)

Prev by Date: Re: Draft JSON specification, round 2

Next by Date: Re: Treatment of Greek characters in CIF2

Prev by thread: Re: Treatment of Greek characters in CIF2

Next by thread: Re: Treatment of Greek characters in CIF2

Index(es):

Date

Thread

Discussion List Archives

RE: Treatment of Greek characters in CIF2