Monday, June 21, 2010 3:26 PM, SIMON WESTRIP wrote:
>The draft I have has the following 'disclaimer', which basically allows unicode for data names unless they're defined in a DDLm dictionary?
>"Important restriction: In the case where the contents of a CIF2 data file are defined in the
>new DDLm dictionary there is an imposed restriction on the character set of a data name..."
>See Brian's email of 1/3/2010
>As I read it, you can use unicode in your datanames, but you shouldnt 'officially' be able to define those names in your own DDLm dictionary?

I'm reading from the draft dated February 18, 2010.  It explicitly says that "data names are comprised of characters from the allowed UTF-8 set above, excluding a whitespace [...]."  Thus, according to that draft, CIF data names may contain almost any Unicode character.

As far as I can tell, the "restriction" is a lengthy diversion into the details of data names that will be allowed in DDLm dictionaries.  The comments are relevant principally to authors of DDLm dictionaries.  They say that only data names composed of characters drawn exclusively from the specified 64-character subset of ASCII can be defined in a DDLm dictionary.  I read it that a dictionary that purports to define a name not meeting the criteria presented is thereby not compliant with DDLm.

None of the DDLm comments affect the CIF format specification at all.  There is nothing to prevent people from using ad hoc, non-ASCII data names in CIF2 files, nor from defining such data names in DDL1 or DDL2 dictionaries written in CIF2 format.  The DDLm comments are not relevant to those situations, hence they do not speak to the character set for CIF2 data names in general.  They simply say that if you happen to write a formally valid CIF, relying entirely on DDLm dictionaries, then none of your data names will contain non-ASCII characters.  (Nor either will they contain any of about 30 ASCII punctuation characters allowed, but rarely used, in DDL1 and DDL2 data names).


