Re: [ddlm-group] Treatment of CIF2 unicode characters withCIF1 equivalents
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Treatment of CIF2 unicode characters withCIF1 equivalents
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Mon, 24 Apr 2017 14:40:16 +0000
- Accept-Language: en-US
- authentication-results: iucr.org; dkim=none (message not signed)header.d=none;iucr.org; dmarc=none action=none header.from=STJUDE.ORG;
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=SJCRH.onmicrosoft.com; s=selector1-stjude-org;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;bh=Whk4n5fbz5A4ARha5SYDsxG2FCA+GwyYgRkMSr8FchU=;b=GO/SVTKh526m+Y0IhNwzGdP8fXHpvcyDav+EcJXE7gk6gW4FZENcf49gWA39/5qR+hHKQoCQ5n67tUpgFroZQBrdDerNoiuSS2Ttfxq3oleav8UpbHr0mdnYJ99PJYQKyEJ+0XM0NZvz6YSMtKACC1qtuEKGPjuJV3g4+3cHITs=
- In-Reply-To: <CAM+dB2fj5b9wvBk2JZU4ATX-4qjJkKJfY1p8zst5k8jrR_XiWQ@mail.gmail.com>
- References: <CAM+dB2fj5b9wvBk2JZU4ATX-4qjJkKJfY1p8zst5k8jrR_XiWQ@mail.gmail.com>
- spamdiagnosticmetadata: NSPM
- spamdiagnosticoutput: 1:99
Dear DDLm-group, Although the kind of transformation described by the James’s proposed rule is appropriate under the conditions the rule describes, I am a bit apprehensive about
adding the specific proposed text to Vol G. In particular, (1) I am inclined to think it inappropriate to assert rules for *applications* in a chapter presenting specifications for the CIF *format*, except to the extent
that such rules can be construed as de facto specifications for details of the format or its intended interpretation. (2) The proposed rule comes awfully close to comingling syntax with semantics, and I think the more separation we can maintain between those, the better. (3) I’m not convinced that such a provision is needed at all, inasmuch as it seems to follow directly from the principle that CIF data values will be interpreted
according to their items’ definitions. Evidently we’re talking about the next edition of Vol G, and it is unclear to me exactly how that edition will need to be changed to cover CIF 2.0. Inasmuch
as I imagine it will incorporate at least some of the content of the CIF 2.0 specification paper, however, I observe that appendix A.2 of that paper addresses exactly this area already, in the context of a more complete discussion of format conversions such
as the current proposal considers. Since we already have applicable text, I suggest we use that as context and starting point, instead of drafting something new from scratch. (Hyperlink for the paper: http://journals.iucr.org/j/issues/2016/01/00/aj5269/index.html) John From: ddlm-group [mailto:ddlm-group-bounces@iucr.org]
On Behalf Of James Hester Dear DDLm-group (aka COMCIFS technical committee) There has been some lively discussion on the cif-developers mailing list of late which you may review at
http://www.iucr.org/__data/iucr/lists/cif-developers/ . One issue raised was what to do about CIF2 datavalues that contained unicode characters that have equivalent ASCII sequences described by the CIF markup conventions (e.g. Greek characters). According to section 2.2.7.4.13 - 17 of International Tables Vol G, by default Greek and some other non-ASCII characters can be represented in text datavalues using a backslash notation <backslash><ascii character>, e.g. \a is alpha.
Different markup conventions are possible on a per-dictionary or per-definition basis. In CIF2, these characters can be represented natively, but legacy CIF applications presented with a datavalue containing non-ASCII values may not be prepared to typeset
or present them appropriately. On the other hand, it would seem inefficient to define separate Unicode-aware datanames for every text value simply to avoid legacy problems. Proposal: add the following paragraph to Vol G section 2.2.7.4. Note that "meets the requirements of paragraph 2.2.7.4.13" means that this paragraph only applies in those cases for which the CIF1 markup conventions
would apply. (2.2.7.4.18) Whenever an application is required to convert a datavalue from a CIF2 datafile containing code points outside the ASCII range to a datavalue containing only ASCII codepoints, the appropriate markup as per paragraphs 2.2.7.4.13-16
should be substituted, provided that the relevant definition meets the requirements of paragraph 2.2.7.4.13. If no markup is defined for the Unicode code point, no CIF1 equivalent value exists and application behaviour is undefined. Please comment. James. -- T +61 (02) 9717 9907 Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer |
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- References:
- Prev by Date: Re: [ddlm-group] Treatment of CIF2 unicode characters withCIF1 equivalents
- Next by Date: Re: [ddlm-group] Treatment of CIF2 unicode characters with CIF1equivalents
- Prev by thread: Re: [ddlm-group] Treatment of CIF2 unicode characters with CIF1equivalents
- Next by thread: Re: [ddlm-group] Treatment of CIF2 unicode characters with CIF1equivalents
- Index(es):