[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@iucr.org>
Sent: Saturday, 25 September, 2010 19:18:54
Subject: Re: [Cif2-encoding] How we wrap this up
Dear Simon,
Unfortunately, that is likely to take us back into our infinite loop or into a diverging spiral. Right now, we would have UTF8 as no more or less a default for CIF2 than ASCII is for CIF1 -- i.e. a not too bad first guess as the likely default encoding for any given CIF, but not a formal constraint. I would suggest we leave the wording in that imprecise state, get CIF2 out and accepted and then work further on the encoding issue.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
yaya@dowling.edu
=====================================================
On Sat, 25 Sep 2010, SIMON WESTRIP wrote:
> Dear all
>
> In the event that CIF2 adopts the 'any encoding' approach, would there be
> any objections to
> explicitly defining a default encoding in the specification, to be defaulted
> to when there were no indications
> to the contrary. At worst this would give CIF2 service providers an excuse
> to interpret CIFs as e.g. UTF8 if they couldnt
> determine the encoding by other means - but such intollerant service
> providers would soon find that their service is
> not successful - while at best this might raise awareness of the issues
> regarding encoding once non-ASCII is used in
> a CIF. Essentially, it does not require users to change there working
> practices, which is one of the main arguments for
> 'any encoding'.
>
> So, CIF2 would remain 'any encoding', and specifications in terms of e.g.
> "Herbert's as for CIF1..."
> might only require a single sentence to define the default after stating
> what the 'preferred' encoding was;
> the proposal might be phrased as "Herbert's as for CIF1..." + "explicit
> default encoding"?
>
> I do not wish to prolong this debate - if there are objections I will not
> launch into an endless round of exchanges
> that cover the same ground that has led us this far.
>
> Cheers
>
> Simon
>
>
>
>
>
>
> ____________________________________________________________________________
> From: SIMON WESTRIP <simonwestrip@btinternet.com>
> To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding@iucr.org>
> Sent: Friday, 24 September, 2010 20:10:13
> Subject: Re: [Cif2-encoding] How we wrap this up
>
> Dear James
>
> As you may have gathered I have been reconsidering my position on this
> issue.
> Please forgive me, but I would like to change my vote if that is OK, in
> favour of the 'any encoding' camp.
> This apparent U-turn is not a response to recent contributions; rather it is
> the outcome of a meeting I had this morning
> where I demonstrated some new software to the Managing Editor of IUCr
> journals.
>
> By way of explanation:
>
> I have been developing a new docx template which the IUCr editorial office
> is shortly to release for use by
> authors. The template will be packaged with some tools to extract data from
> CIFs
> and tabulate them in the Word document, e.g. open an mmCIF, click a button,
> and standard
> tables populated with data from the CIF will be included in the document,
> acting as
> table templates for the author to edit as appropriate for their manuscript.
>
> Inclusion of the mmCIF tools is part of an unofficial policy to 'coax'
> biologists to start using/accepting mmCIF
> as a useful medium, rather than as a product of their deposition to the PDB,
> and to encourage them to become comfortable
> with passing mmCIFs between applications, and even to edit the things (in
> the same way as the core-CIF community
> treats CIFs). For example, our perception is that there is no reason why an
> author should not feel free to take an mmCIF
> that has been created by e.g. pdb_extract and populate it using third-party
> software before uploading to the PDB for
> deposition.
>
> This cause would not be furthered by effectively invalidating an mmCIF if it
> were not to be encoded in one of
> the specified encodings.
>
> So although I am uneasy about a specification that propogates uncertainty,
> I'm also uneasy about alienating users,
> especially when we are struggling to change their mindset as in the case of
> the biological community
> (my perception of the biological community's attitude to mmCIF is based on
> feedback from authors/coeditors to
> IUCr journals).
>
> Granted this may not be the most compelling argument in favour of 'any
> encoding', but recognizing the hurdles that
> may have to be overcome once we move beyond ASCII whatever the CIF2
> specification, I support 'any encoding'
> as 'a means to an end'.
>
> I will not provide my preferences in terms of the numbered options until you
> say so; afterall, I have already voted and
> all this has to be signed off by COMCIFs in any case.
>
> Cheers
>
> Simon
>
>
>
>
> ____________________________________________________________________________
> From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
> To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding@iucr.org>
> Sent: Friday, 24 September, 2010 14:50:57
> Subject: Re: [Cif2-encoding] How we wrap this up
>
> Dear Simon,
>
> It is exactly this sort of issue that drove me to support more permissive
> encoding rules and ultimately to devise the UTF-8 + UTF-16 + local proposal.
>
> Do please think about the considerations Herb raised. As you reconsider
> your votes, I urge you also to ask yourself what, *precisely*, a "text file"
> is, and to consider whether your answer is functionally different from my
> "local". If you decide not, then please consider what that answer implies
> about CIF2 support of UTF-8 and UTF-16 (which evidently you favor) under
> each option on the table, especially for CIFs containing non-ASCII
> characters. Whatever you decide about the meaning of "text file", please
> consider whether reasonable people might reach a different conclusion, as I
> assert they might do, and to what extent the standard needs to address that.
>
>
> Regards,
>
> John
> --
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
>
>
> >From: cif2-encoding-bounces@iucr.org
> [mailto:cif2-encoding-bounces@iucr.org] On Behalf Of SIMON WESTRIP
> >Sent: Friday, September 24, 2010 7:53 AM
> >To: Group for discussing encoding and content validation schemes for CIF2
> >Subject: Re: [Cif2-encoding] How we wrap this up. .
> >
> >Dear Herbert
> >
> >Not for the first time, I find your arguement persuasive. Brian's vote and
> explanation have also raised some
> >questions that I would like to look into.
> >
> >I will confirm or otherwise my vote as soon as possible, assuming that is
> OK with James and assuming that
> >this round of votes might wrap this up.
> >
> >Cheers
> >
> >Simon
> >
> >________________________________________
> >From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding@iucr.org>
> >Sent: Friday, 24 September, 2010 13:17:14
> >Subject: Re: [Cif2-encoding] How we wrap this up
> >
> >If he ignores the standard, in most cases all he has to do to comply with
> CIF2 is to run whatever applications he currently runs to produce CIF1 and,
> perhaps, in some cases, run a minor edit pass at the end, to convert for the
> minor syntactive differences and/or changed tags required to comply with
> CIF2 and the new dictionaries, but he is unlikely to have to do anything to
> deal with the messy business of whether his encoding is really a proper UTF8
> encoding or not.
>
> >The punishment if he tries to comply, is that he has to totally uproot and
> reconfigure the environment in which he produces CIFs from whatever he is
> currently doing to create an enviroment in which he can reliably create and,
> more importantly, transmit compliant UTF8 files. This can be very tricky if
> he does only a partial job, say fudging in one special application (yet to
> be written), because if he stays with his old system, all kinds of tools
> will keep trying to transcode whatever he has produced back to whatever his
> system considers a standard. Those of us who have files, applications and
> tools that have lived through several generations of macs are living proof
> of the problem. Macs now have excellent UTF8/16 unicode support, but every
> once in a while in working with a unicode file I find it has been strangely
> and unexpectedly converted to something else, and it can be really tricky to
> spot when the unaccented roman text part has been left untouched but just a
> few accen
> ted letters have gotten different accents.
>
> >Mandating UTF8 is simply trying to shift a serious software problem from
> the central handlers of CIF (IUCr, PDB, etc.) to the external users. Most
> users will probably have the good sense to simply ignore the demand and
> leave the burden just where it is now. A few sophisticated users will
> probably adapt with no trouble, but the punishment for those users who
> blindly follow orders before we have a complete multiplatform supporting
> infrastructure in place by mandating UTF8 is severe, expensive and
> undeserved. Until and unless we have developed solid support, we will just
> be alienating people from CIF. I will continue to oppose such a move.
>
> [...]
>
>
> Email Disclaimer: www.stjude.org/emaildisclaimer
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>
>
Reply to: [list | sender only]
Re: [Cif2-encoding] How we wrap this up
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] How we wrap this up
- From: SIMON WESTRIP <simonwestrip@xxxxxxxxxxxxxx>
- Date: Sat, 25 Sep 2010 19:34:14 +0000 (GMT)
- In-Reply-To: <alpine.BSF.2.00.1009251413550.93269@epsilon.pair.com>
- References: <AANLkTi=hmKNFMgaeMqt69=sG6dOmxZRUrffB1khjF+mZ@mail.gmail.com><63870.31508.qm@web87006.mail.ird.yahoo.com><8F77913624F7524AACD2A92EAF3BFA5416659DEDDC@SJMEMXMBS11.stjude.sjcrh.local> <80062.82001.qm@web87012.mail.ird.yahoo.com><a06240802c8c165d79c1a@[149.72.2.188]><162941.37460.qm@web87004.mail.ird.yahoo.com><alpine.BSF.2.00.1009231729300.51637@epsilon.pair.com><780727.99055.qm@web87010.mail.ird.yahoo.com><alpine.BSF.2.00.1009232100530.35116@epsilon.pair.com><526633.3484.qm@web87004.mail.ird.yahoo.com><alpine.BSF.2.00.1009240742480.8859@epsilon.pair.com><613218.81205.qm@web87011.mail.ird.yahoo.com><8F77913624F7524AACD2A92EAF3BFA5416659DEDDE@SJMEMXMBS11.stjude.sjcrh.local><281388.90819.qm@web87012.mail.ird.yahoo.com><463665.7127.qm@web87004.mail.ird.yahoo.com><alpine.BSF.2.00.1009251413550.93269@epsilon.pair.com>
OK - as promised, I wont pursue the matter :-)
From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@iucr.org>
Sent: Saturday, 25 September, 2010 19:18:54
Subject: Re: [Cif2-encoding] How we wrap this up
Dear Simon,
Unfortunately, that is likely to take us back into our infinite loop or into a diverging spiral. Right now, we would have UTF8 as no more or less a default for CIF2 than ASCII is for CIF1 -- i.e. a not too bad first guess as the likely default encoding for any given CIF, but not a formal constraint. I would suggest we leave the wording in that imprecise state, get CIF2 out and accepted and then work further on the encoding issue.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
yaya@dowling.edu
=====================================================
On Sat, 25 Sep 2010, SIMON WESTRIP wrote:
> Dear all
>
> In the event that CIF2 adopts the 'any encoding' approach, would there be
> any objections to
> explicitly defining a default encoding in the specification, to be defaulted
> to when there were no indications
> to the contrary. At worst this would give CIF2 service providers an excuse
> to interpret CIFs as e.g. UTF8 if they couldnt
> determine the encoding by other means - but such intollerant service
> providers would soon find that their service is
> not successful - while at best this might raise awareness of the issues
> regarding encoding once non-ASCII is used in
> a CIF. Essentially, it does not require users to change there working
> practices, which is one of the main arguments for
> 'any encoding'.
>
> So, CIF2 would remain 'any encoding', and specifications in terms of e.g.
> "Herbert's as for CIF1..."
> might only require a single sentence to define the default after stating
> what the 'preferred' encoding was;
> the proposal might be phrased as "Herbert's as for CIF1..." + "explicit
> default encoding"?
>
> I do not wish to prolong this debate - if there are objections I will not
> launch into an endless round of exchanges
> that cover the same ground that has led us this far.
>
> Cheers
>
> Simon
>
>
>
>
>
>
> ____________________________________________________________________________
> From: SIMON WESTRIP <simonwestrip@btinternet.com>
> To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding@iucr.org>
> Sent: Friday, 24 September, 2010 20:10:13
> Subject: Re: [Cif2-encoding] How we wrap this up
>
> Dear James
>
> As you may have gathered I have been reconsidering my position on this
> issue.
> Please forgive me, but I would like to change my vote if that is OK, in
> favour of the 'any encoding' camp.
> This apparent U-turn is not a response to recent contributions; rather it is
> the outcome of a meeting I had this morning
> where I demonstrated some new software to the Managing Editor of IUCr
> journals.
>
> By way of explanation:
>
> I have been developing a new docx template which the IUCr editorial office
> is shortly to release for use by
> authors. The template will be packaged with some tools to extract data from
> CIFs
> and tabulate them in the Word document, e.g. open an mmCIF, click a button,
> and standard
> tables populated with data from the CIF will be included in the document,
> acting as
> table templates for the author to edit as appropriate for their manuscript.
>
> Inclusion of the mmCIF tools is part of an unofficial policy to 'coax'
> biologists to start using/accepting mmCIF
> as a useful medium, rather than as a product of their deposition to the PDB,
> and to encourage them to become comfortable
> with passing mmCIFs between applications, and even to edit the things (in
> the same way as the core-CIF community
> treats CIFs). For example, our perception is that there is no reason why an
> author should not feel free to take an mmCIF
> that has been created by e.g. pdb_extract and populate it using third-party
> software before uploading to the PDB for
> deposition.
>
> This cause would not be furthered by effectively invalidating an mmCIF if it
> were not to be encoded in one of
> the specified encodings.
>
> So although I am uneasy about a specification that propogates uncertainty,
> I'm also uneasy about alienating users,
> especially when we are struggling to change their mindset as in the case of
> the biological community
> (my perception of the biological community's attitude to mmCIF is based on
> feedback from authors/coeditors to
> IUCr journals).
>
> Granted this may not be the most compelling argument in favour of 'any
> encoding', but recognizing the hurdles that
> may have to be overcome once we move beyond ASCII whatever the CIF2
> specification, I support 'any encoding'
> as 'a means to an end'.
>
> I will not provide my preferences in terms of the numbered options until you
> say so; afterall, I have already voted and
> all this has to be signed off by COMCIFs in any case.
>
> Cheers
>
> Simon
>
>
>
>
> ____________________________________________________________________________
> From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
> To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding@iucr.org>
> Sent: Friday, 24 September, 2010 14:50:57
> Subject: Re: [Cif2-encoding] How we wrap this up
>
> Dear Simon,
>
> It is exactly this sort of issue that drove me to support more permissive
> encoding rules and ultimately to devise the UTF-8 + UTF-16 + local proposal.
>
> Do please think about the considerations Herb raised. As you reconsider
> your votes, I urge you also to ask yourself what, *precisely*, a "text file"
> is, and to consider whether your answer is functionally different from my
> "local". If you decide not, then please consider what that answer implies
> about CIF2 support of UTF-8 and UTF-16 (which evidently you favor) under
> each option on the table, especially for CIFs containing non-ASCII
> characters. Whatever you decide about the meaning of "text file", please
> consider whether reasonable people might reach a different conclusion, as I
> assert they might do, and to what extent the standard needs to address that.
>
>
> Regards,
>
> John
> --
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
>
>
> >From: cif2-encoding-bounces@iucr.org
> [mailto:cif2-encoding-bounces@iucr.org] On Behalf Of SIMON WESTRIP
> >Sent: Friday, September 24, 2010 7:53 AM
> >To: Group for discussing encoding and content validation schemes for CIF2
> >Subject: Re: [Cif2-encoding] How we wrap this up. .
> >
> >Dear Herbert
> >
> >Not for the first time, I find your arguement persuasive. Brian's vote and
> explanation have also raised some
> >questions that I would like to look into.
> >
> >I will confirm or otherwise my vote as soon as possible, assuming that is
> OK with James and assuming that
> >this round of votes might wrap this up.
> >
> >Cheers
> >
> >Simon
> >
> >________________________________________
> >From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding@iucr.org>
> >Sent: Friday, 24 September, 2010 13:17:14
> >Subject: Re: [Cif2-encoding] How we wrap this up
> >
> >If he ignores the standard, in most cases all he has to do to comply with
> CIF2 is to run whatever applications he currently runs to produce CIF1 and,
> perhaps, in some cases, run a minor edit pass at the end, to convert for the
> minor syntactive differences and/or changed tags required to comply with
> CIF2 and the new dictionaries, but he is unlikely to have to do anything to
> deal with the messy business of whether his encoding is really a proper UTF8
> encoding or not.
>
> >The punishment if he tries to comply, is that he has to totally uproot and
> reconfigure the environment in which he produces CIFs from whatever he is
> currently doing to create an enviroment in which he can reliably create and,
> more importantly, transmit compliant UTF8 files. This can be very tricky if
> he does only a partial job, say fudging in one special application (yet to
> be written), because if he stays with his old system, all kinds of tools
> will keep trying to transcode whatever he has produced back to whatever his
> system considers a standard. Those of us who have files, applications and
> tools that have lived through several generations of macs are living proof
> of the problem. Macs now have excellent UTF8/16 unicode support, but every
> once in a while in working with a unicode file I find it has been strangely
> and unexpectedly converted to something else, and it can be really tricky to
> spot when the unaccented roman text part has been left untouched but just a
> few accen
> ted letters have gotten different accents.
>
> >Mandating UTF8 is simply trying to shift a serious software problem from
> the central handlers of CIF (IUCr, PDB, etc.) to the external users. Most
> users will probably have the good sense to simply ignore the demand and
> leave the burden just where it is now. A few sophisticated users will
> probably adapt with no trouble, but the punishment for those users who
> blindly follow orders before we have a complete multiplatform supporting
> infrastructure in place by mandating UTF8 is severe, expensive and
> undeserved. Until and unless we have developed solid support, we will just
> be alienating people from CIF. I will continue to oppose such a move.
>
> [...]
>
>
> Email Disclaimer: www.stjude.org/emaildisclaimer
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>
>
_______________________________________________ cif2-encoding mailing list cif2-encoding@iucr.org http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- Follow-Ups:
- Re: [Cif2-encoding] How we wrap this up (Herbert J. Bernstein)
- References:
- [Cif2-encoding] How we wrap this up (James Hester)
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- Re: [Cif2-encoding] How we wrap this up (Bollinger, John C)
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- Re: [Cif2-encoding] How we wrap this up (Herbert J. Bernstein)
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- Re: [Cif2-encoding] How we wrap this up (Herbert J. Bernstein)
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- Re: [Cif2-encoding] How we wrap this up (Herbert J. Bernstein)
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- Re: [Cif2-encoding] How we wrap this up (Herbert J. Bernstein)
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- Re: [Cif2-encoding] How we wrap this up (Bollinger, John C)
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- Re: [Cif2-encoding] How we wrap this up (Herbert J. Bernstein)
- Prev by Date: Re: [Cif2-encoding] How we wrap this up
- Next by Date: Re: [Cif2-encoding] How we wrap this up
- Prev by thread: Re: [Cif2-encoding] How we wrap this up
- Next by thread: Re: [Cif2-encoding] How we wrap this up
- Index(es):