Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] How we wrap this up

>The only way I can see that being true is if "text file" or "text" 
>is intended to be interpreted, at least in part, as "containing only 
>ASCII characters."  Is that your intended meaning, Herb?  Otherwise, 
>CIF2's expansion to the full (more or less) Unicode character set 
>opens the door for users to insert literal characters into their 
>(conformant) CIFs in place of or in addition to elides, and none of 
>the alternatives on the table change that.  What the various 
>alternatives *do* affect is which byte-sequence representations of 
>those characters will conform to CIF2, under which circumstances.

Ah, now I begin to understand the difference in our view.  I view CIF for
journal use and PDB deposition as having a controlled vocabulary, via
combinations of dictionaries, advice to authors, deposition standards,
etc.  You seem to few CIF as allowing completely arbitrary, uncontrolled
text.  In some contexts, e.g. in private use inside laboratories as
part of a lab notebook or experiment data harvest, but in those
contexts, people have been stretching CIF for years, and the
change from ASCII to UTF8 is no change at all.

Please note that proposals 1 and 2 do _not_ affect "which 
byte-sequence representations of those characters will conform to 
CIF2, under which circumstances" because they are not rigidly 
prescriptive about any
particular byte sequences.  It is only when we get to 3, 4 and 5
that we are trying to rigidly prescribe what users may write.  Until
and unless we have made certain that those prescriptions indeed do
fit with the workflows of the IUCr and the PDB and are properly
supported with software, I think it to be premature to promulgate
those prescriptions.

Please look at your own words:

"that the only viable opportunity to leave this area for further
development is to be explicitly restrictive now"

To me, that is precisely backwards.  We have an existing system called
CIF.  CIF2 was promised to the community as something that could
work with _without_ the users having to make major changes in what
they do, i.e. as an revision and extension of CIF1.  We are now
on the verge of suddenly telling them -- no we were just kidding,
we want you to change all your editors and software to conform
the this new "explicitly restrictive" model of CIF, but we don't
have the editors and software ready.  We're just telling you what
to do, but are going to leave you up the creek without a paddle
for a year or two while we figure out what we meant.  That does not
leave CIF2 open for further development, it leave it open to be
completely replaced by something else.

Modern, conservative software engineering practice for incremental
development is to complete specify the user externals of the
change you are making and to understand all its interactions
with the existing system before you put it in place.  The time
to take your deep breath is _before_ make some maximally disruptive
restrictive change.  Proposals 1 and 2 leave what is working in
the existing system in place until we have done our job on the
encoding issue properly.

This is really getting out of hand.  We need a meeting.  If
everyone will send me their Skype id's, I will volunteer to
set up a Skype conference call at some time that works for
everybody (which I suspect will be 4 am EDT).  My guess is that
1-2 hours of polite discussion will resolve this.  What
do we have to lose?

Regards,
   Herbert




At 9:45 AM -0500 9/27/10, Bollinger, John C wrote:
>Dear Colleagues,
>
>On Monday, September 27, 2010 5:49 AM, Herbert Bernstein wrote:
>>    Under the CIF2 specification with UTF8 in place of ASCII there is
>>_no_ change in the use of elided ASCII sequences to represent non-ASCII
>>characters until and unless the IUCr publications office decides that,
>  >for that particular application, they are ready to accept something
>>new.
>
>Absolutely correct.  The character elides of CIF1 are among its 
>"common semantic features", which are expressly *not* part of the 
>CIF1 format standard.  CIF2 explicitly omits them as well, leaving 
>them in exactly the same place they are now.  None of this is at all 
>affected by which encoding option we choose.
>
>>    It is _only_ if you go forward with options 3, 4 or 5 that you
>>are giving the green light to users to do precisely what you are
>>concerned about -- using the unicode characters instead instead
>>in possibly strange admixtures that nobody is ready to process.
>
>The only way I can see that being true is if "text file" or "text" 
>is intended to be interpreted, at least in part, as "containing only 
>ASCII characters."  Is that your intended meaning, Herb?  Otherwise, 
>CIF2's expansion to the full (more or less) Unicode character set 
>opens the door for users to insert literal characters into their 
>(conformant) CIFs in place of or in addition to elides, and none of 
>the alternatives on the table change that.  What the various 
>alternatives *do* affect is which byte-sequence representations of 
>those characters will conform to CIF2, under which circumstances.
>
>Independent of this particular issue, my greatest problem with 
>options (1) and (2) is the imprecision of describing CIF2 simply as 
>"text".  That this served well enough for CIF1 is irrelevant; CIF2's 
>character set lends much more importance and impact to the 
>interpretation of this aspect of the spec.  I see two, maybe even 
>three, viable and functionally distinct possible definitions.  Would 
>any of the proponents of that wording care to advance a definition 
>of that term as it is intended to be interpreted in a CIF2 context? 
>This is substantially equivalent to James's open question, so no 
>need to answer both.
>
>[...]
>
>>    My apologies to James, who I know is trying to do what he believes
>>to be right, but I believe James has things backwards -- the "deep
>>breath" is provided by my proposal -- taking the time to properly engineer
>>the use of the extra characters UTF8 allows us to discuss clearly,
>>while James' push for an immediate prescriptive use of UTF8 with
>>prescriptions that differ drastically from what has been adopted
>>by all other frameworks (HTML, XML, python, etc.) in ways that
>>are untested and unsupported by most existing software is
>>the untimely rush to judgement.
>
>[...]
>
>I doubt any of us could disagree that there is an engineering 
>challenge here, but I have to agree with James that the only viable 
>opportunity to leave this area for further development is to be 
>explicitly restrictive now, ala option (3) or (4).  Not even my most 
>preferred option (5) allows sufficient latitude for future extension 
>without potentially invalidating some CIF2 CIFs and programs.
>
>Furthermore, I don't think that "all other frameworks" adopt an 
>entirely uniform approach, nor one that is necessarily equivalent to 
>option (1) or (2).  For example, Sun's various implementations of 
>the Java compiler seem to use "local" (in my sense of the term) 
>unless the user passes an option to tell it otherwise.  XML and 
>XHTML use UTF-8 unless a different encoding is explicitly named in 
>the file, identified via a byte-order mark, or otherwise 
>communicated at a higher level.  HTML tends to rely on a 
>higher-level protocol to communicate encoding, but provides a 
>mechanism for communicating it in-line.  ALL of the CIF2 options 
>currently on the table share some characteristics with one or more 
>of those.
>
>
>Regards,
>
>John
>--
>John C. Bollinger, Ph.D.
>Department of Structural Biology
>St. Jude Children's Research Hospital
>
>
>Email Disclaimer:  www.stjude.org/emaildisclaimer
>
>_______________________________________________
>cif2-encoding mailing list
>cif2-encoding@iucr.org
>http://scripts.iucr.org/mailman/listinfo/cif2-encoding


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.