Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Treatment of Greek characters in CIF2



On Thursday, April 20, 2017 9:26 AM, Robert Hanson wrote:
> Now that John is with us, let's summarize where we are. Feel free to disagree!
> - CIF-JSON is a great idea

One of my concerns is that it is *several* great ideas.

> - COD is already using something like this
> - Jmol is already creating something like this for internal use or export (undocumented)
> - we're talking about the future, not the present
> - no programs as of yet are implementing CIF-JSON, including COD and Jmol
> - as long as we don't cause an incompatibility, we can do whatever we want
> Agreed to so far -  maybe?

Agreed so far.

> - all CIF keys will be made lower case, since in the CIF format it doesn't matter, and in JSON it does
>  and this also allows us to

I'm fine with that, provided that strings being "lower case" is understood as shorthand for them being in a form that is reproduced unchanged by converting to Unicode normalization form NFD, applying the Unicode case-folding algorithm to the result, and converting the case-folded result to Unicode normalization form NFD (case folding does not necessarily preserve normalization).  The resulting form is the basis for Unicode canonical caseless matching, on which CIF2 relies for data name, block code, and frame code matching.  It will indeed require all Latin letters to be presented in lower case, but it will put certain letters in other scripts in upper case, and it has additional effects on characters that have canonical decompositions.

> - upper-case keys will be non-CIF metadata or other application-specific or translation-specific keys,
>   including CIF1/2 compatibility information

I can accept that.

> - UTF-8 character encoding; \uFFFF for CIF <?> and JSON standard null for <.>

That's ok with me, if indeed we agree that we want CIF-JSON to preserve the distinction.  However, I offer for consideration the proposition that JSON null fits CIF <?> better than it fits CIF <.>, so perhaps we want flip those assignments.

> - some question about whether top level should be [] or {}

I agree that consensus has not been reached on that question.

Personally, I'm not much swayed by arguments that CIF-JSON must be able to encode invalid CIF constructs (i.e. duplicate block codes), or to preserve details of the native CIF serialization format that are not actually significant in CIF (i.e. data block order).  I'm not thinking in terms of transforming CIF *files* to JSON, but rather in terms of serializing data that are structured according to the CIF data model.

> - some question about what to do with CIF1 non-latin characters
​
I wasn't sure that was part of the same conversation, but OK.  It bears discussion either way.


John


________________________________

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________cif-developers mailing listcif-developers@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.