[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] Addressing Brian's concerns

Thanks to James for taking the time to describe the text-encoding
functionality already available through Qt. It helps me to
understand how a CIF-authoring application will work in diverse
locales. I'll also test the robustness of juffed if I can locate
one of our problematic PDFs on my SMB drive :-) As you say, the
discussion doesn't greatly help to tip the scales, though what
does help is the comment about file import being more reliable
if a CIF has a known unique encoding (or small set of automatically
distinguishable encodings).

Thanks also to John for his reply. I want to respond briefly to
just one point here.

>> I put option 5 at the bottom because of the non-portability of
>> a "local" encoding.
> 
> This is the part I understand least.  "Text" is at least roughly
> equivalent to "local", and entirely as non-portable.  Merely tagging
> CIFs with encoding information doesn't fix that very well, as we
> covered in the course of our discussion, particularly when doing
> so is optional.

You're right - in my mind "text" and "local" are essentially
the same thing, but I have understood "local" to imply that no
information about the actual encoding algorithms are available
if such a file is transported elsewhere. Maybe that's not so. I
do see some value in embedding an encoding declaration as a hint,
albeit the hint cannot be completely relied upon. By making it
optional, I hope that an *application* that writes the hint has
a reasonable possibility of getting it right (whereas mandating
it might encourage a user to invent a random encoding declaration
if one is absent).

Best wishes
Brian


On Tue, Sep 28, 2010 at 05:25:51PM -0500, Bollinger, John C wrote:
> 
> On Sunday, September 26, 2010 6:47 PM, James Hester wrote:
> 
> >I am however
> >unhappy that both Brian and Simon introduced new concerns and nobody
> >has had a chance to comment on how the various proposals under
> >consideration might affect those concerns.  I would therefore like to
> >suggest that the voting period continues until the end of this week,
> >and that we all endeavour to express any concerns or comments that we
> >need to make in a timely fashion.
> 
> I have responded to Simon's new concerns, I think, but not to Brian's.  Supplemental to James's well-reasoned comments, then:
> 
> On Friday, September 24, 2010 4:24 AM, Brian McMahon wrote:
> 
> >I still feel this argument is at heart a "binary/text"
> >dichotomy, where "binary" implies that one can prescribe specific byte-level representations of every distinct character; "text"
> >implies that you're at the mercy of external libraries and mappings between encoding conventions - and those >mappings are not always explicit or easy to identify.
> 
> That characterization of "text" sounds suspiciously similar to the "local" part of option 5 -- as it should, because the two attempt to describe the same (I think) concept.  I am open to alternative definitions, but I do not comprehend the apparent aversion to defining these terms.  If they are so obvious as to not require definition, then providing definitions anyway will be simple and harmless.  If not, then how else do we expect consumers of the spec to come to the same conclusion about what it means?
> 
> >I sympathise greatly with James's desire for a prescriptive, "binary"
> >approach, but its corollary is that a CIF application must take full responsibility for expressing any supported extended character set (I mean accented Latin letters, Greek characters, Cyrillic or Chinese alphabets).
> 
> I do not follow this logic, inasmuch as it seems to be about the CIF2 character repertoire, rather than about the encodings with which characters from that repertoire may be encoded.  The character repertoire is not the subject of this debate.
> 
> Relying on "text" to define allowed characters would mean that some reasonable content expressed in conformant CIF form on one system cannot be expressed in any conformant CIF form on another.  For example, a CIF-format, Chinese-language journal article encoded in EUC-CN might be perfectly valid CIF in the journal office, but there would be no CIF-conformant way to represent it at all on a system whose definition of "text" does not accommodate Chinese characters.
> 
> [...]
> 
> >I put option 5 at the bottom because of the non-portability of a "local" encoding.
> 
> This is the part I understand least.  "Text" is at least roughly equivalent to "local", and entirely as non-portable.  Merely tagging CIFs with encoding information doesn't fix that very well, as we covered in the course of our discussion, particularly when doing so is optional.  Moreover, even optional tagging is a feature only of choice 2, not choice 1.
> 
> 
> Regards,
> 
> John
> --
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
> 
> 
> Email Disclaimer:  www.stjude.org/emaildisclaimer
> 
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding

Reply to: [list | sender only]