Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .... .. .. .

>I don’t understand.  How is it worse to provide authors an opportunity to specify the encoding they have used, even though they may specify wrongly, than it is to deny them an opportunity to specify the encoding at all?

I dont think it is worse to provide them with an opportunity to specify their encoding - I just dont think they should need to.

>How is it a worse or more impactful mistake for an author to include an incorrect encoding tag than it is for them to use an encoding different from some small set that you are prepared to accept?

I am not saying that it is a worse or more impactful mistake - rather, if these signatures are to be part of the standard, then I can foresee errors being raised by an incorrect flag even when the rest of the CIF is encoded according to the specification. In my experience, authors already find CIF slightly annoying in that they have to adhere to seemingly pedantic rules (e.g. 'Monoclinic' should be 'monoclinic' because the dictionary enumeration is case sensitive, or <0.001 is not a number type). Requiring manually edited encoding signatures which will have to be checked is of no real help to anyone (no more than a 'hint')? Again, I feal that we have to respect that in the world of CIF, users have been required to edit raw CIF - this is rarely the case with xml, where end users are rightly unaware of the encoding they are using as they invariably work with tools that shield them from the raw xml. In the short/medium term at least, I do not see this situation changing.

The reason I am prepared to accept 'some small set' is that I would like that set to be unambiguously identifiable, so that authors do not have to worry about such things, and in the hope that non-CIF-aware software might still do a good job of decoding the text, without employing heuristics, thereby minimizing the impact on curent practise of specifying an encoding at all in the new spec.

You might note that I often refer to CIF users as authors - this is my experience I'm afraid. It would be nice if the IUCr could exert as much first-hand control over CIF content as say the PDB, whose online data collection tools are used to populate mmCIFs, and whose users seem quite happy for them to do that. So I stress, my views on this are only based on experience with CIFs submitted to IUCr journals by authors.

>>We're also further restricting the number of non-CIF-aware programs that can be used to read the text.

>Can you expand on that?  I don't follow you.

I was referring to the practice of editing CIFs with any available text editor - however I concede that having an encoding flag makes no difference to non-CIF-aware programs - they will simply save the CIF in whatever is their default encoding if that is how they work.

Cheers

Simon




From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Friday, 25 June, 2010 19:59:56
Subject: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .

On Friday, June 25, 2010 12:41 PM, SIMON WESTRIP wrote:
>Its using a field for specifying the encoding that worries me.
>Who is to make such a declaration in the CIF - an author who may be blissfully unaware of the encoding they're using?
>Or an author who is preparing a new CIF by editing an old one, again unaware that the text editor they are using is about to save
>the CIF in some other encoding? At least with UTF BOM's we have a fighting chance - I'd rather only accept these.

I don’t understand.  How is it worse to provide authors an opportunity to specify the encoding they have used, even though they may specify wrongly, than it is to deny them an opportunity to specify the encoding at all?

How is it a worse or more impactful mistake for an author to include an incorrect encoding tag than it is for them to use an encoding different from some small set that you are prepared to accept?

>We're also further restricting the number of non-CIF-aware programs that can be used to read the text.

Can you expand on that?  I don't follow you.

>You've also mentioned that we should learn from HTML - just because HTML has an encoding declaration does not mean it is correct,
>which is why browsers seem to apply there own heuristics to determine the encoding.

I see no way to write the specification that can eliminate all possibility of encoding-related errors.  None.  All we can do is choose which errors are possible.  In so doing, there are a lot of competing factors consider, such as likelihood of various errors to be committed, coverage and robustness of the resulting spec, implied responsibilities of various parties, user convenience, and cultural sensitivity.  I think when James's summary is ready it will help us sort through all that.


Regards,

John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.