[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Case sensitivity

To: "'Group finalising DDLm and associated dictionaries'" <[email protected]>
Subject: Re: [ddlm-group] Case sensitivity
From: "Bollinger, John C" <[email protected]>
Date: Mon, 24 May 2010 12:13:57 -0500
Accept-Language: en-US
acceptlanguage: en-US
In-Reply-To: <[email protected]>
References: <8F77913624F7524AACD2A92EAF3BFA54165DF337E7@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><[email protected]><[email protected]><[email protected]>


On Sunday, May 23, 2010 11:33 PM, James Hester wrote:

>I'm glad we agree.  Yes, I think NFKC normalisation is the appropriate normalisation for our purposes.
>
>Does anybody disagree with this proposal?

Not I, but RFC 3454 offers several options among which we must select, and about which it behooves us to be explicit.  Here's a go at it: two CIF data names will be considered equivalent if they map to the same string under

() a character-by-character mapping according to section 3 and appendices *B.1 and B.2* of RFC 3454

() *followed by normalization* according to Unicode normalization form KC, per section 4.

() There are *no prohibited output characters* (atypical for a Stringprep profile, but required by the current form of CIF2 change 3).

() Unassigned characters are *allowed*, and

() bidirectional text is *not* subject to the optional analysis in section 6.

I think that captures the proposed algorithm, and it's ok with me.

>On Mon, May 24, 2010 at 1:32 PM, Herbert J. Bernstein <[email protected]> wrote:
>Dear James,
>
> I think we are saying the same thing conceptually.  You say to
>use normalization and case folding.  Nic says to use tolower.  I say specify  precisely what the "tolower" we >are to use is to do, and using the RFC3454 normalization and casefolding sounds reasonable to me.
>To avoid any misunderstanding, I believe you are proposing to do
>an NFKC normalization, i.e. decompose and then recombine.

[...]

>On Mon, 24 May 2010, James Hester wrote:
>>Dear All,
>>
>>I believe that Nick and Herbert's preference for choosing a particular
>>tolower() implementation disregards the complexities of the situation as
>>outlined in the document that John references, particularly the
>>locale-dependence of the Turkish/Azeri mappings for the letter 'I' and the need
>>for normalisation.
>>
>>My reading of TR21 indicates that the most correct way to proceed would be to
>>specify that two datanames are equivalent if their normalised, case-folded
>>equivalents are identical, as John B suggests.  In addition to TR21, RFC3454
>>gives a solid specification as to how comparison should be done
>>(http://tools.ietf.org/html/rfc3454.html) and has been implemented in the
>>Python 'stringprep' module.

[...]

Regards,

John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital




Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] Case sensitivity (Bollinger, John C)

Re: [ddlm-group] Case sensitivity (Herbert J. Bernstein)

Re: [ddlm-group] Case sensitivity (James Hester)

Re: [ddlm-group] Case sensitivity (Herbert J. Bernstein)

Re: [ddlm-group] Case sensitivity (James Hester)

Prev by Date: [ddlm-group] imgCIF versus CIF2

Next by Date: Re: [ddlm-group] UTF-8 BOM

Prev by thread: Re: [ddlm-group] Case sensitivity

Next by thread: [ddlm-group] LOOP versus LIST

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Case sensitivity