[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Cif2-encoding] Fundamental source of disagreement
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] Fundamental source of disagreement
- From: James Hester <jamesrhester@xxxxxxxxx>
- Date: Mon, 16 Aug 2010 16:53:25 +1000
- In-Reply-To: <alpine.BSF.2.00.1008100619590.21755@epsilon.pair.com>
- References: <AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><AANLkTimLmnpS-HHP9en-zwUDeVKtbHSUJa36tUCOlQtL@mail.gmail.com><826180.50656.qm@web87010.mail.ird.yahoo.com><563298.52532.qm@web87005.mail.ird.yahoo.com><520427.68014.qm@web87001.mail.ird.yahoo.com><a06240800c84ac1b696bf@192.168.2.104><614241.93385.qm@web87016.mail.ird.yahoo.com><alpine.BSF.2.00.1006251827270.70846@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54166122952D@SJMEMXMBS11.stjude.sjcrh.local><33483.93964.qm@web87012.mail.ird.yahoo.com><8F77913624F7524AACD2A92EAF3BFA541661229533@SJMEMXMBS11.stjude.sjcrh.local><AANLkTilqKa_vZJEmfjEtd_MzKhH1CijEIglJzWpFQrrC@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229542@SJMEMXMBS11.stjude.sjcrh.local><AANLkTikTee4PicHKjnnbAdipegyELQ6UWLXz9Zm08aVL@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229552@SJMEMXMBS11.stjude.sjcrh.local><AANLkTinZ4KNsnREOOU6sVFdGYR_aQHcjdWr_ko648NGm@mail.gmail.com><alpine.BSF.2.00.1008100619590.21755@epsilon.pair.com>
I'm not sure that you have identified the fundamental source of disagreement, but if we disagree on our approaches to optional behaviour we will have trouble finalising the standard, so I have addressed Herb's comments below. On Tue, Aug 10, 2010 at 8:42 PM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote: > > With all due respect to James and others who adhere to the view that: > > "There is no such thing as 'optional' for an information interchange standard." > > I believe this the fundamental source of our disagreement on the > the direction for CIF2. > > Optional features are common in almost all current successful standards > for information interchange, including HTML4, XMF and CIF1. As a > practical matter, one tries to have strict writers and liberal readers > for interchange standards to encourage migration to as common a > convention as possible. Even so, if we are too strict in our rules > for what is and is not a proper CIF, we will probably encourage > the growth of multiple unofficial, unmanaged and non-interchangeable > CIF2 dialects. I dispute all three unsupported statements in the above paragraph. Taking the first one, where HTML, XML and CIF1 are put forward as successful standards for information interchange that have optional features: (1) HTML is no more an information interchange standard that Rich Text Format. It is primarily a standard for marking up documents for presentation to the human reader. If you wish to argue by analogy with HTML, you will need to draw a much tighter parallel to the goals of CIF. (2) I agree that the goals of XML are similar to those of CIF, and I would be pleased if we adopted their approach to optional behaviour. The fifth of the 10 design goals for XML was (see the XML 1.1 standard at http://www.w3.org/TR/2006/REC-xml11-20060816): "5.The number of optional features in XML is to be kept to the absolute minimum, ideally zero." So, if XML is to be our guiding light, then we should avoid optional behaviour. (3) As for CIF1 having optional behaviours, what might those be? I would assert that regardless of the wording of the standard, those optional features are either never supported, or else always supported, or else irrelevant to the core use of CIF. So: I don't think that appealling to HTML or XML proves that optional behaviour is a good thing, and lacking supporting argument, the appeal to CIF1 does not prove it either. Moving on to your second assertion about liberal readers and strict writers: while that philosophy has its adherents, an alternative philosophy also exists: readers should exit gracefully on standards violations. I quote a recent Linux Weekly News article which bears on this discussion: "The notion that one should be liberal in what one accepts while being conservative in what one sends is often expressed in the networking field, but it shows up in a number of other areas as well. Often, though, it can make more sense to be conservative on the accepting side; the condition of many web pages would have been far better had early browsers not been so forgiving of bad HTML." (http://lwn.net/Articles/394175/) So: which approach you adopt to writing standards-conformant readers requires some thought, particularly given the possibility that liberal readers will encourage liberal writers. The final assertion about the consequences of being too strict might in theory be true, but it will require clear use-cases to support it rather than simply asserting it as a truism. I would suggest that we are nowhere near the point of forcing incompatible dialects to emerge, given that the addition of UTF8 to the standard does not meaningfully restrict the choices offered to CIF users, and any other restrictions that we have introduced into CIF2 relative to CIF1 are very minor. Based on this observation, my expectation is that CIF2 will no more produce incompatible dialects than CIF1, *provided we have no optional behaviour*. I will address what I believe is the real source of disagreement on this point, which is my statement that "Standards-conformant readers must be able to read all files produced by standards-conformant writers", in an answer to John B's other post. > > As for John's hashing scheme, I suspect some variation of it will find signficant use in major archives, just as associating MD5 checksums > with tarballs does for many software distributors, but that we also > will need some easier-to-generate-and-transfer _optional_ encoding > hint schemes, such as the accented "o's". One simple way to handle > it would be: > > 1. Put some variant of the accented "o's" into the _optional_ > magic number; and > 2. Adopt the tarball approach to MD5 checksums by having it not > in the header but in a separate file, simply generating it from > a canonical UTF8 representation of the CIF2 file. > > The accented o's are easy to carry along as an encoding hint, and > if you get the encoding hint right, then you will easily be able > to generate a canonical UTF8 file to validate the MD5 checksum against > if you wish for a critical file transfer, e.g. to an archive or a journal. > > Regards, > Herbert > > > > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ cif2-encoding mailing list cif2-encoding@iucr.org http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- Follow-Ups:
- Re: [Cif2-encoding] Fundamental source of disagreement (Herbert J. Bernstein)
- References:
- Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- [Cif2-encoding] Fundamental source of disagreement (Herbert J. Bernstein)
- Prev by Date: Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line . .. .. .. .. .. .. .. .. .. .. .. .. .. .
- Next by Date: Re: [Cif2-encoding] Fundamental source of disagreement
- Prev by thread: [Cif2-encoding] Fundamental source of disagreement
- Next by thread: Re: [Cif2-encoding] Fundamental source of disagreement
- Index(es):