[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Recommended character set and use restrictions. .

The draft I have has the following 'disclaimer', which basically allows unicode for data names unless they're defined in a DDLm dictionary?

"Important restriction: In the case where the contents of a CIF2 data file are defined in the
new DDLm dictionary there is an imposed restriction on the character set of a data name..."

See Brian's email of 1/3/2010

As I read it, you can use unicode in your datanames, but you shouldnt 'officially' be able to define those names in your own DDLm dictionary?

Cheers

Simon



From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Monday, 21 June, 2010 16:58:07
Subject: Re: [ddlm-group] Recommended character set and use restrictions. .


On Monday, June 21, 2010 10:28 AM, David Brown wrote:
>I can see the advantages of using Unicode in data values where one may
>wish to render text is some non-ascii formmat, but is there any reason
>why data names should not be restricted (at least for the forseeable
>future) to ASCII characters?  These names are assigned by COMCIFS and
>we are in no real danger of running out of ASCII data names.  One day
>we may need to write our dictionaries in Arabic, but I doubt that any
>of us will be around wheb that happens.  If we only allowed non-ASCII
>characters in delimited strings we would meet all the needs of the
>community for many years to come, and save ourselves a lot of grief
>trying to sort out which code points to allow.

That's a fair point.  I observe, though, that COMCIFS controls data names only in the official dictionaries it maintains, not in local dictionaries or other third-party dictionaries.  It appears to be parties maintaining such dictionaries that have the most potential benefit from an expanded character repertoire for data names.  Additionally, general users might receive a small benefit from having a larger character repertoire available for use in data block codes.

Having come late to the party, I hadn't before considered whether there was a real use case for general Unicode data names, etc..  It was already in the first spec draft I saw.  If there is no persuasive use case for it then I don't have any objection to restricting use of non-ASCII characters to within the bounds of one of the multitude of quoted string syntaxes.  That would be the conservative choice, suitable to be relaxed later if need be.


John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital


Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]