Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .... .. .. .. .. .

Dear Colleagues,

   If this discussion makes any sense, then why not allow the
entire community to participate?  It could save us a lot of
time.  If the community is ready to switch entirely to UTF-8
we are done.  If the community wants something closer to the
way XML and HTML work with multiple encodings, then we
are least have some guidance to get this finished.  It is
now nearly full 3 years since the  DDLm posting on the IUCr
web site.  If we start the broader discussion now, we have a
chance of getting some real feedback at the ACA meeting.

   Please, let us get the people who will have to pay the price
for what is decided involved now.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Wed, 30 Jun 2010, James Hester wrote:

> Herbert: while I'm mindful of your wish to discuss this with the wider
> community, John B's comments over the last few days I think have been
> helpful in moving the discussion along -or at least the one going on
> in my head - so I'd like to explore this a bit before attempting to
> engage the wider community.  And John B's suggested improvements to my
> summary should be incorporated in any case.
>
> James.
>
> On Wed, Jun 30, 2010 at 1:16 AM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>> I would like to see this matter brought to the community as a whole
>> to discuss and decide.  -- Herbert
>>
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>    Dowling College, Kramer Science Center, KSC 121
>>         Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                  +1-631-244-3035
>>                  yaya@dowling.edu
>> =====================================================
>>
>> On Tue, 29 Jun 2010, Bollinger, John C wrote:
>>
>>>
>>>
>>> On Monday, June 28, 2010 6:00 PM, SIMON WESTRIP wrote:
>>>
>>>> John suggests "the goal of CIF being compatible with general-purpose text tools"
>>>>
>>>> This is possibly the crux of the matter.
>>>
>>> It is right at the heart of the matter, I agree, and it comes with an historical impetus.  As I composed these comments, I distilled what I think are the essences of the two main positions into two short statements that capture, for me, the alternatives before us.  Please forgive the somewhat didactic discussion leading up to these, and skip straight to the *** if you wish to ignore my long-windedness altogether.
>>>
>>>> Unless a general-purpose text tool is capable of the determining text encoding system, it ain't going to be much use
>>>> for a CIF that was encoded on a different system and uses non-ASCII chars?
>>>
>>> Forgive me if I am reading too much into the question, but I think it highlights a central difference of understanding: some parties to this discussion seem to hold that text vs. binary is an inherent characteristic of a file, but I maintain that a stream of bytes divorced from any explicit or implicit metadata about its encoding is binary, not text.  This complication of electronic text handling is not new, but it has assumed much more prominence as internationalization issues have gained importance.
>>>
>>> Implicit encoding metadata commonly takes the form of the text in question being encoded according to the default scheme for the system or tool.  It could, in one sense, also take the form of a requirement in the format specification, but that is meaningful only for tools specific to the format, which rather moots the text vs. binary question.  It could also take the form of local policy, such as "all CIFs in this archive are encoded in CESU-8," which would be useful to tools configured for the relevant environment (e.g. a web server).
>>>
>>> Explicit metadata can be carried by the file itself or conveyed out-of-band.  XML's encoding attribute is an example of the former, and HTTP's content-type header is an example of the latter.  These are useful only to certain tools, specific to a particular format, environment, or exchange mechanism.
>>>
>>> One of the upshots of all this is that transcoding must in general be a routine aspect of text file exchange, as that can make explicit encoding metadata implicit.  As Simon has shown, transcoding not automatic in many contexts, so it may require extra work on the receiving end.  To the extent that there is a current assumption and practice of CIFs being stored and forwarded byte-for-byte as received (i.e. without transcoding or explicit metadata), CIF is already being treated as a binary format.  In a sense, perhaps, it is being treated simultaneously as several distinct binary formats.
>>>
>>>
>>> ***
>>>
>>>> By extending the character set beyond ASCII, we have to accept that not all general-purpose text tools are going to
>>>> be applicable as CIF editors/viewers.
>>>
>>> That's a valid perspective, but I would sharpen it: as part of extending the character set beyond ASCII, we abandon the premise that CIF is a text format, though under some circumstances it may still be possible to manipulate CIFs with tools designed for text.
>>>
>>> Alternatively, I have been advocating essentially this: by extending the character set beyond ASCII, we magnify the importance of exchanging and storing CIFs according to text conventions, including correctly communicating encodings as necessary and transcoding as appropriate.
>>>
>>> I hope the latter position adequately encompasses Herb's view as well.  Each position carries additional baggage, which I have omitted to focus on the essential ideas.  If wider comment is sought, then I submit that these alternatives provide a suitable basis for soliciting such.
>>>
>>>
>>> Whichever position prevails, I should like to see something substantially similar to the corresponding position statement above be inserted into the spec.
>>>
>>>
>>> Regards,
>>>
>>> John
>>> --
>>> John C. Bollinger, Ph.D.
>>> Department of Structural Biology
>>> St. Jude Children's Research Hospital
>>>
>>>
>>>
>>>
>>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.