Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] How we wrap this up

Dear Herbert

Not for the first time, I find your arguement persuasive. Brian's vote and explanation have also raised some
questions that I would like to look into.

I will confirm or otherwise my vote as soon as possible, assuming that is OK with James and assuming that
this round of votes might wrap this up.

Cheers

Simon


From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@iucr.org>
Sent: Friday, 24 September, 2010 13:17:14
Subject: Re: [Cif2-encoding] How we wrap this up

If he ignores the standard, in most cases all he has to do to comply with CIF2 is to run whatever applications he currently runs to produce CIF1 and, perhaps, in some cases, run a minor edit pass at the end, to convert for the minor syntactive differences and/or changed tags required to comply with CIF2 and the new dictionaries, but he is unlikely to have to do anything to deal with the messy business of whether his encoding is really a proper UTF8 encoding or not.

The punishment if he tries to comply, is that he has to totally uproot and reconfigure the environment in which he produces CIFs from whatever he is currently doing to create an enviroment in which he can reliably create and, more importantly, transmit compliant UTF8 files.  This can be very tricky if he does only a partial job, say fudging in one special application (yet to be written), because if he stays with his old system, all kinds of tools will keep trying to transcode whatever he has produced back to whatever his system considers a standard. Those of us who have files, applications and tools that have lived through several generations of macs are living proof of the problem. Macs now have excellent UTF8/16 unicode support, but every once in a while in working with a unicode file I find it has been strangely and unexpectedly converted to something else, and it can be really tricky to spot when the unaccented roman text part has been left untouched but just a few accented letters have gotten different accents.

Mandating UTF8 is simply trying to shift a serious software problem from the central handlers of CIF (IUCr, PDB, etc.) to the external users. Most users will probably have the good sense to simply ignore the demand and leave the burden just where it is now.  A few sophisticated users will probably adapt with no trouble, but the punishment for those users who blindly follow orders before we have a complete multiplatform supporting infrastructure in place by mandating UTF8 is severe, expensive and undeserved.  Until and unless we have developed solid support, we will just be alienating people from CIF.  I will continue to oppose such a move.

Simon, I beg you to change your vote.  Once we have the rest of CIF2 in
place and supported, I will be happy to cooperate in trying to develop
the software support we would need to make UTF8/UTF16 work well for
users on Mac, Linux and Windows, but it is a big job that I do not
believe can be done soon enough and well enough for options 3 through
5 to make sense right now.  Please do not "make the perfect the
enemy of the good".

=====================================================
Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Fri, 24 Sep 2010, SIMON WESTRIP wrote:

> I do not understand why a user who adhere's to a CIF2 standard
> that specifies an encoding will be 'punished'?
> What worries me about a specification that allows any encodng
> is that users who ignore any recommendations regarding
> a preferred encoding might experience difficulties when e.g.
> submitting their CIF to a journal/archive, even though they
> have adhered to the standard (unjustly punished).
>
> With regard to the lack of CIF2 software support, surely CIF2
> in general is of little use to users, not just its encoding requirements.
> But perhaps you already have CIF2 software that can be dropped into existing
> workflows save for the fact that it would require modification to work
> with 'specified encodings'?
>
>
>
> ____________________________________________________________________________
> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding@iucr.org>
> Sent: Friday, 24 September, 2010 2:03:50
> Subject: Re: [Cif2-encoding] How we wrap this up
>
> I see not point in a final specification that users will
> ignore, and that will actually punish users who
> pay attention to it.  That is not a useful standard,
> and very damaging to the CIF brand.  We should be
> promolgating reasonable standards that we expect will
> in fact be adhered to, not ignored.  In the present
> state of lack of software support and clear guidance,
> all the prescriptive UTF8 recommendations are unhelpful
> to users who read and pay attention to what the standard
> says.
>
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
>
> On Thu, 23 Sep 2010, SIMON WESTRIP wrote:
>
> > I agree to some extent with what you say, Herbert, but I'm
> > a bit more optomistic (for once) that the IUCr at least will be able to
> > adapt to
> > a 'specified encoding' system relatively quickly, and in the interim
> > certainly not reject non-UTFx CIFs. I'm not convinced that whatever
> > appears in the specification will have any influence on user practice,
> > especially in the non-IUCr world; rather I think the success (or
> otherwise)
> > of CIF2 will result from the software that implements it (as you suggest).
> > I don't share your pessimism about the potential confusion of specifying
> > UTF8 etc.,
> > and certainly don't think that a restricted encoding will be any more
> > confusing than
> > 'any encoding', given, as you say, "people may not understand what they
> are
> > doing..."
> >
> > I suppose much of the difference in our views lies in our perception of
> user
> > interest -
> > I suspect there may even be overlap in this respect - but I'm perhaps less
> > inclined to
> > think that the final specification will have a marked influence on users:
> > "they can keep doing whatever they are currently doing that is currently
> > working for them"
> >
> > Anyway, its not me you have to convince :-), and its time I went to bed!
> >
> > Cheers
> >
> > Simon
> >
> >___________________________________________________________________________
> _
> > From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> > To: Group for discussing encoding and content validation schemes for CIF2
> > <cif2-encoding@iucr.org>
> > Sent: Thursday, 23 September, 2010 22:39:24
> > Subject: Re: [Cif2-encoding] How we wrap this up
> >
> > Dear Simon,
> >
> >   That is precisely the point -- there is a serious and growing
> > problems with encodings.  The strict UTF8 proposal then makes
> > it a universal problem for everybody using CIF, and we do _not_
> > have a coherent means setup to deal with it.  The substitution
> > of UTF8 for ASCII in the CIF1 spec does not, in and of itself
> > make anything worse for anybody currently receiving 128 character
> > ASCII -- it is identical, and it does not force users working
> > in other systems that the IUCr journals are currently coping
> > with to jump into the boiling water, they can keep doing whatever
> > they are currently doing that is currently working for them
> > and the IUCr.  All the journals have to do until something that
> > is actually supports not-lower-128-ASCII is ready is to tell people that
> for
> > the jounrnals they will still have to use Brian's reverse solidus
> > escape codes for anything else -- nothing major changes for most
> > people.  If and when there really is a coherent scheme to support
> > more native Unicode code points for journal submission with tested
> > software, then we can do something more.  Right now, proposals
> > 3,4 and 5 will make things worse for large numbers of users
> > and not really make anything better for the IUCr.  It is too
> > early in the UTF8 conversion process.
> >
> > =====================================================
> > Herbert J. Bernstein, Professor of Computer Science
> >   Dowling College, Kramer Science Center, KSC 121
> >         Idle Hour Blvd, Oakdale, NY, 11769
> >
> >                 +1-631-244-3035
> >                 yaya@dowling.edu
> > =====================================================
> >
> > On Thu, 23 Sep 2010, SIMON WESTRIP wrote:
> >
> > > Just because I'm still at my desk - and despite the fact that I told
> > myself
> > > I would not
> > > contribute further beyond my vote - it might be worth mentioning that
> the
> > > IUCr are already
> > > experiencing problems related to encoding issues (in their web
> services),
> > > and the occurence
> > > of such problems is most likely to increase when CIFs can contain
> > non-ASCII
> > > text.
> > >
> > > Cheers
> > >
> > > Simon
> > >
> >>__________________________________________________________________________
> _
> > _
> > > From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> > > To: Group for discussing encoding and content validation schemes for
> CIF2
> > > <cif2-encoding@iucr.org>
> > > Sent: Thursday, 23 September, 2010 21:31:24
> > > Subject: Re: [Cif2-encoding] How we wrap this up
> > >
> > > Votes:
> > >
> > > In terms of the requested preference voting, I vote in declining order
> of
> > > preference
> > >
> > > 1, then 2, then (big gap) 5, then 4, then 3.
> > >
> > > On absolute voting up or down in COMCIFS, I will accept 1 or 2, but will
> > > lobby against and vote strongly against 3, 4, and 5.
> > >
> > > Explanation:
> > >
> > > I am not opposed to Brian recommendations.  The only reason I would vote
> > > for 1 over 2 is that I fear Brian's recommendation would generate yet
> > > more debate over the precise details and I believe we have more than
> > > run out of time to get something concrete ready for the IUCr meeting.
> > >
> > > I am very strongly opposed to 3, 4 and 5 because I believe they will
> > > cause confusion and delay in adoption of CIF2, while choices
> > > 1 and 2 keep the practices the community and the IUCr have lived
> > > with successfully for many years, simply applying then to UTF8
> > > instead of ASCII.  People may not understand what they are doing
> > > in that mode, but they manage to successfully submit CIFs to the
> > > IUCr that way, and we don't have software ready to support anything
> > > else.
> > >
> > >   -- Herbert
> > >
> > > At 8:13 PM +0000 9/23/10, SIMON WESTRIP wrote:
> > > >Faced with the options:
> > > >
> > > >1. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII'
> > > >recently posted here and to COMCIFS.
> > > >2. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII',
> > > >together with Brian's *recommendations*
> > > >3. UTF8-only as in the original draft
> > > >4. UTF8 + UTF16
> > > >5. UTF8, UTF16 + "local"
> > > >
> > > >I have to vote for (4).
> > > >
> > > >When it comes down to it, I believe that the specification of a
> > > >'standard' should not be based on uncertainty,
> > > >and as 'any encoding' presents uncertainty, it should not be in the
> > > standard.
> > > >
> > > >I might be accused of changing my position (I have recently
> > > >expressed support for flexibilty and even a qualified
> > > >acceptance of the 'as for CIF1 proposal with UTF8 in place of
> > > >ASCII'), but part of the value of these discussions
> > > >is to question your own views in the light of other's perspectives.
> > > >Indeed, I have found these discussions
> > > >extremely informative and am now in a far better position to handle
> > > >the realities of introducing non-ASCII CIFs,
> > > >whatever the final COMCIFS decision.
> > > >
> > > >Cheers
> > > >
> > > >Simon
> > > >
> > > >
> > > >
> > > >From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
> > > >To: Group for discussing encoding and content validation schemes for
> > > >CIF2 <cif2-encoding@iucr.org>
> > > >Sent: Thursday, 23 September, 2010 15:02:25
> > > >Subject: Re: [Cif2-encoding] How we wrap this up
> > > >
> > > >On Thursday, September 23, 2010 5:46 AM, SIMON WESTRIP wrote:
> > > >
> > > >>1. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII'
> > > >>recently posted here and to COMCIFS.
> > > >>2. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII',
> > > >>together with Brian's *recommendations*
> > > >>3. UTF8-only as in the original draft
> > > >>4. UTF8 + UTF16
> > > >>5. UTF8, UTF16 + "local"
> > > >>
> > > >>These can be broken down to:
> > > >>
> > > >>'any encoding' (1, 2, and 5)
> > > >>
> > > >>'specified encoding' (3 and 4)
> > > >>
> > > >>Note I put 5 in the 'any encoding' category as I think 'local'
> > > >>could be interpretted as any encoding.
> > > >
> > > >I agree that 'local' could be interpreted as "any encoding", but I
> > > >choose to view it as "context-dependent".  Thus a file that is
> > > >CIF-conformant on one computer might not be CIF-conformant on
> > > >another.  Some will find that unsatisfactory.  In my view, however,
> > > >it is the best interpretation of CIF1's provisions; its purpose is
> > > >thus to ensure that *all* well-formed CIF1 files are also
> > > >well-formed CIF2 files (a context-dependent question).  Lest I
> > > >appear to overstate the case, I acknowledge that the UTF8-only and
> > > >UTF-8 + UTF-16 proposals would have the result that a large majority
> > > >of well-formed CIF1 files are also well-formed CIF2 files.  The
> > > >variations of Herb's proposal probably also make all well-formed
> > > >CIF1 files well-formed CIF2 files, but I disfavor them on different
> > > >grounds (mostly that they are too open to differing interpretations).
> > > >
> > > >[...]
> > > >
> > > >>In either case, a degree of work will be required to accommodate
> > > >>user practice and the legacy of CIF1.
> > > >
> > > >I think the entire question reduces to which accommodations for the
> > > >CIF1 legacy are assured by CIF2 vs. which will constitute
> > > >non-standard extensions.  I don't think that individual responses,
> > > >from Chester for example, are likely to depend much on which option
> > > >is adopted, but I do think the overall consistency of responses will
> > > >be affected.  Thus I favor precision of the specification and
> > > >coverage of the likely uses, in hope of achieving the greatest
> > > >consistency of response.
> > > >
> > > >I doubt this has swayed anyone's opinion, so please consider it an
> > > >advance explanation for my upcoming vote (inasmuch as I rely on
> > > >James's previous assurance that voting rights in this context are
> > > >not restricted to COMCIFS members).
> > > >
> > > >
> > > >Best Regards,
> > > >
> > > >John
> > > >--
> > > >John C. Bollinger, Ph.D.
> > > >Department of Structural Biology
> > > >St. Jude Children's Research Hospital
> > > >
> > > >
> > > >Email Disclaimer:
> > > ><http://www.stjude.org/emaildisclaimer>www.stjude.org/emaildisclaimer
> > > >_______________________________________________
> > > >cif2-encoding mailing list
> > > ><mailto:cif2-encoding@iucr.org>cif2-encoding@iucr.org
> >>><http://scripts.iucr.org/mailman/listinfo/cif2-encoding>http://scripts.iu
> c
> >
> > > r.org/mailman/listinfo/cif2-encoding
> > > >
> > > >
> > > >_______________________________________________
> > > >cif2-encoding mailing list
> > > >cif2-encoding@iucr.org
> > > >http://scripts.iucr.org/mailman/listinfo/cif2-encoding
> > >
> > >
> > > --
> > > =====================================================
> > >   Herbert J. Bernstein, Professor of Computer Science
> > >     Dowling College, Kramer Science Center, KSC 121
> > >         Idle Hour Blvd, Oakdale, NY, 11769
> > >
> > >                   +1-631-244-3035
> > >                   yaya@dowling.edu
> > > =====================================================
> > > _______________________________________________
> > > cif2-encoding mailing list
> > > cif2-encoding@iucr.org
> > > http://scripts.iucr.org/mailman/listinfo/cif2-encoding
> > >
> > >
> >
> >
>
>
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.