Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Recommended character set and use restrictions. .

The meaning of "non-printing" is tricky.  The structure of joining and
non-joing characters in unicode is intrinsic to its approach in
printing Arabic.  It is similar in concept to dead-key accents --
it is the sequence of code points that determines what will be
printed.


=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Tue, 22 Jun 2010, James Hester wrote:

> Yes, I think the correct approach for COMCIFS would be to restrict the
> character set of definitions in DDLm dictionaries, but not overly
> restrict the character set in the format specifications.  This would
> allow third parties to use generic CIF parsers to construct data files
> and dictionaries in other languages.
>
> I'm beginning to think that the original proposal to simply exclude
> all non-printing characters from tags might be the simplest approach,
> and then specific characters could be included on a case-by-case
> basis.
>
> On Tue, Jun 22, 2010 at 6:26 AM, SIMON WESTRIP
> <simonwestrip@btinternet.com> wrote:
>> The draft I have has the following 'disclaimer', which basically allows
>> unicode for data names unless they're defined in a DDLm dictionary?
>>
>> "Important restriction: In the case where the contents of a CIF2 data file
>> are defined in the
>> new DDLm dictionary there is an imposed restriction on the character set of
>> a data name..."
>>
>> See Brian's email of 1/3/2010
>>
>> As I read it, you can use unicode in your datanames, but you shouldnt
>> 'officially' be able to define those names in your own DDLm dictionary?
>>
>> Cheers
>>
>> Simon
>>
>>
>> ________________________________
>> From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
>> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
>> Sent: Monday, 21 June, 2010 16:58:07
>> Subject: Re: [ddlm-group] Recommended character set and use restrictions. .
>>
>>
>> On Monday, June 21, 2010 10:28 AM, David Brown wrote:
>>> I can see the advantages of using Unicode in data values where one may
>>> wish to render text is some non-ascii formmat, but is there any reason
>>> why data names should not be restricted (at least for the forseeable
>>> future) to ASCII characters?  These names are assigned by COMCIFS and
>>> we are in no real danger of running out of ASCII data names.  One day
>>> we may need to write our dictionaries in Arabic, but I doubt that any
>>> of us will be around wheb that happens.  If we only allowed non-ASCII
>>> characters in delimited strings we would meet all the needs of the
>>> community for many years to come, and save ourselves a lot of grief
>>> trying to sort out which code points to allow.
>>
>> That's a fair point.  I observe, though, that COMCIFS controls data names
>> only in the official dictionaries it maintains, not in local dictionaries or
>> other third-party dictionaries.  It appears to be parties maintaining such
>> dictionaries that have the most potential benefit from an expanded character
>> repertoire for data names.  Additionally, general users might receive a
>> small benefit from having a larger character repertoire available for use in
>> data block codes.
>>
>> Having come late to the party, I hadn't before considered whether there was
>> a real use case for general Unicode data names, etc..  It was already in the
>> first spec draft I saw.  If there is no persuasive use case for it then I
>> don't have any objection to restricting use of non-ASCII characters to
>> within the bounds of one of the multitude of quoted string syntaxes.  That
>> would be the conservative choice, suitable to be relaxed later if need be.
>>
>>
>> John
>> --
>> John C. Bollinger, Ph.D.
>> Department of Structural Biology
>> St. Jude Children's Research Hospital
>>
>>
>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
>
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.