Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Community consulation regarding CIF2 encoding

Hi Herbert and others,

Let us define more carefully what we want from this survey.  Firstly, informed comment is always welcome, so to that end I will include a link to our discussion and an invitation to provide a comment, as you suggest.  However, in the unlikely case that these comments turn out to be variations on the themes that we have already covered, and so provide no clearer path forward than we have already,  we need to include the simple survey questions, which are designed to approximately answer the following questions:

1. What proportion of crystallographers would be significantly inconvenienced by having to manipulate CIF files in UTF-8 (this is a variation of the 'respect' argument but using less loaded words)? (Q1-5)

2. What is most important: the 'text' nature of CIF files, or fidelity of file transfer and retrieval? (Q8)

I have rejigged to questions to avoid any appearance of seeking a predetermined answer (if that's what Herbert meant by 'push-polling'), so that we now have:

Herbert's introductory paragraph, followed by...
"The following questions relate to editing text files.  Examples of such files include CIF files, Windows .INI files, and programming source code."
1. How often do you edit text files?
2. What scripts or languages do you use when editing text?
3. What operating systems do you use when editing text?
4. What text editors do you usually use for editing text?
5. How difficult would it be for you to input and output text files in UTF-8 encoding? (Very difficult/difficult/neutral/easy/very easy)
6. What is your preferred method (if any) of transmitting text files containing non-ASCII characters to colleagues or organisations in other countries?
7. How often has the method described in (6) led to incorrect display of the transmitted file?
8. How important do you think the following things are when designing changes to the CIF standard (rank 1-4, presented in random order)?
  (a) Reliable transmission and retrieval of CIF file contents
  (b) Ease of use in text tools such as editors and text-based search applications
  (c) Backwards compatibility
  (d) Availability of non-ASCII characters
9. Please comment (if you wish) on your rankings in question 9
10. Please comment (if you wish) on this encoding issue, and include a name and email address if you are prepared to discuss your comments further.

I have no doubt not expressed these questions as well as some of you may have, so I welcome improvements.  Note that Q8 includes some gratuitous market research which we could also use as a baseline to determine the relative importance of our discussions.

Let's try to agree ahead of time how we will interpret the results.  A small (<10%?) number answering 'difficult' or 'very difficult' for 5 when restricted to those users who often deal with non-ASCII text (based on answers to (1) and (2)) suggests to me that restricting encoding to UTF-8 would not involve much inconvenience.  We can check the OS and editor choice as well to cross-check difficulty. I would interpret a higher ranking for option (7.a) compared to option (7.b) as being in favour of the UTF-8 only option.  Questions 6 and 7 are designed to gather information about the prevalence of problems in text file transfer, and solutions that others have found. 
 
I'd like to get this done early next week, and I will post a link to the survey to this group for feedback before sending it further afield.  I would then post requests to national crystallographic associations (perhaps via the European Crystallographic Association, AsCA and the ACA?) and wait a month for responses.  in the meantime we will attempt to clear up some non-UTF-8 business. 



On Fri, Jul 2, 2010 at 12:15 AM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote:
Dear James,

 Please point not just to your survey, but to the entire discussion.
Please recall that the entire discussion is already public on the
web.  This is just a matter of calling community attention to it.

 While I find the survey somewhat slanted if presented without the background, essentially a push-poll, if we can get the community involved in the discussion, and not just presented with structured questions, I have faith that we will get a good sense of what is workable in the current context.  Those who just want to respond to your survey questions
can do that.  Those who wish to delve into the issue more deeply can do
that.  It is up to them, not to us.

 In any case, I recommend putting something out to some assortment
of lists very soon, so we can have the discussion well started before
the ACA meeting.

 Regards,
   Herbert

P.S.  To some extent the poll has a bit of the flavor of voting on
the value of PI.  What matters is not whether a huge majority likes
solution A or B.  What matters is whether choosing one solution or
the other will impede some significant chunk of science from getting
done, and, as with the value of PI, that is hopefully a factual
determination, not a matter of opinion.  We need informed commentary
much more than we need to count votes on preferences.

=====================================================
 Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
       Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Thu, 1 Jul 2010, James Hester wrote:

Hi Herbert: the idea would be to distribute an email with a pointer to the survey.  Your suggested paragraph would be a reasonable text for that email, acting as an introduction to the questionnaire, although the mention of XML and HTML I think is slanting the question somewhat. And indeed we should include an open-ended question in the survey asking for their thoughts.  The point of the short series of questions is that those who have no time to spend familiarising themselves with our discussion and formulating a thoughtful reply are still able to spend a few minutes and provide important information on which we can base our decision.

On Thu, Jul 1, 2010 at 8:37 PM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
     Dear Colleagues,

      Unless we are assuming that the CIF2 transition is not acutally
     going to happen, that transition is going to involve a wide
     range
     of both software developers and users of crystallographic
     software
     throughout the community.  Either we have te dicsussion with
     them
     on a UTF-8-ony standard now, or we will have to have the
     discussion
     with them later, when it is much harder and more expensive to
     revise what we will have done.

      If James is reluctant to post his own summary to the lists,
     then
     how about the following:

      COMCIFS, the IUCr Committee of the Maintenance of the CIF
     standard
     is considering some important improvements and extensions to
     CIF.
     Among the extensions being considered is enlarging the character
     set allowed from simple ASCII to the full UNICODE character set
     (the same set of characters used in web browsers with HTML and
     in XML).  There is strong disagreement on COMCIFS as to whether
     this would best be done by mandating just a single UNICODE
     encoding,
     UTF-8, or whether is would be best to follow the practives of
     HTML
     and XML in allowing alternate encodings.  The full thread of the
     discussion thus far can be seen at:

      http://www.iucr.org/__data/iucr/lists/ddlm-group/

     Comments from interested members of the community would be
     appreciated.

      Regards,
         Herbert

     =====================================================
      Herbert J. Bernstein, Professor of Computer Science
       Dowling College, Kramer Science Center, KSC 121
            Idle Hour Blvd, Oakdale, NY, 11769

                     +1-631-244-3035
                     yaya@dowling.edu
     =====================================================

     On Thu, 1 Jul 2010, SIMON WESTRIP wrote:

     I agree this would probably be more productive.

     Perhaps the IUCr could point its authors at such a survey
     - via its CIF
     author services pages (printCIF, checkCIF...)?

     Cheers

     Simon


___________________________________________________________________________
     _
     From: James Hester <jamesrhester@gmail.com>
     To: ddlm-group <ddlm-group@iucr.org>
     Sent: Thursday, 1 July, 2010 6:51:47
     Subject: [ddlm-group] Community consulation regarding CIF2
     encoding

     Dear DDLm-ers,

     I think Herbert's suggestion of sending a version of my
     summary out is
     unlikely to produce a great deal of enlightenment, because
     I expect the
     range of responses to simply mirror that which we have
     already seen in this
     group, with no ultimate resolution.  I would like to
     propose instead a
     simple questionnaire that we can use to inform our
     decision.  The questions
     I would like to see answered are:
 1. Do you regularly use non-ASCII characters when editing
text?  Examples
   of such characters include accented ASCII characters, and the
characters
   from Arabic, Japanese, Chinese, Cyrillic etc.  (Yes/No/Don't
know)
 2. What languages do you usually deal with when editing text?
 3. What text editing programs do you usually use?
 4. Can the text editors that you usually use read and write
files in UTF-8
   format? (yes/no/don't know)
 5. Which non-ASCII encoding do you think would result in the
least problems
   when transferring your text files across the internet?
 6. Would you object to a new CIF standard which allowed only
UTF-8 encoded
   files? If so, why?
 7. Do you have any comments regarding suitable choice of
encoding(s) for
   the new CIF standard?
Once we have fine-tuned the questions, I would suggest creating
the survey
using www.surveymonkey.com, then posting requests for responses
wherever
crystallographers are to be found, but especially in groups
where non ASCII
scripts are likely to be found (European Crystallographic
Society, Japanese
Crystallographic Society, Computing Commission etc.).

James.
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group




--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group




--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.