Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] Fundamental source of disagreement

  • To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
  • Subject: Re: [Cif2-encoding] Fundamental source of disagreement
  • From: James Hester <jamesrhester@xxxxxxxxx>
  • Date: Mon, 16 Aug 2010 16:53:25 +1000
  • In-Reply-To: <alpine.BSF.2.00.1008100619590.21755@epsilon.pair.com>
  • References: <AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><AANLkTimLmnpS-HHP9en-zwUDeVKtbHSUJa36tUCOlQtL@mail.gmail.com><826180.50656.qm@web87010.mail.ird.yahoo.com><563298.52532.qm@web87005.mail.ird.yahoo.com><520427.68014.qm@web87001.mail.ird.yahoo.com><a06240800c84ac1b696bf@192.168.2.104><614241.93385.qm@web87016.mail.ird.yahoo.com><alpine.BSF.2.00.1006251827270.70846@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54166122952D@SJMEMXMBS11.stjude.sjcrh.local><33483.93964.qm@web87012.mail.ird.yahoo.com><8F77913624F7524AACD2A92EAF3BFA541661229533@SJMEMXMBS11.stjude.sjcrh.local><AANLkTilqKa_vZJEmfjEtd_MzKhH1CijEIglJzWpFQrrC@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229542@SJMEMXMBS11.stjude.sjcrh.local><AANLkTikTee4PicHKjnnbAdipegyELQ6UWLXz9Zm08aVL@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229552@SJMEMXMBS11.stjude.sjcrh.local><AANLkTinZ4KNsnREOOU6sVFdGYR_aQHcjdWr_ko648NGm@mail.gmail.com><alpine.BSF.2.00.1008100619590.21755@epsilon.pair.com>
I'm not sure that you have identified the fundamental source of
disagreement, but if we disagree on our approaches to optional
behaviour we will have trouble finalising the standard, so I have
addressed Herb's comments below.

On Tue, Aug 10, 2010 at 8:42 PM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
>
> With all due respect to James and others who adhere to the view that:
>
> "There is no such thing as 'optional' for an information interchange standard."
>
> I believe this the fundamental source of our disagreement on the
> the direction for CIF2.
>
> Optional features are common in almost all current successful standards
> for information interchange, including HTML4, XMF and CIF1.  As a
> practical matter, one tries to have strict writers and liberal readers
> for interchange standards to encourage migration to as common a
> convention as possible.  Even so, if we are too strict in our rules
> for what is and is not a proper CIF, we will probably encourage
> the growth of multiple unofficial, unmanaged and non-interchangeable
> CIF2 dialects.

I dispute all three unsupported statements in the above paragraph.
Taking the first one, where HTML, XML and CIF1 are put forward as
successful standards for information interchange that have optional
features:
(1) HTML is no more an information interchange standard that Rich Text
Format.  It is primarily a standard for marking up documents for
presentation to the human reader.  If you wish to argue by analogy
with HTML, you will need to draw a much tighter parallel to the goals
of CIF.

(2) I agree that the goals of XML are similar to those of CIF, and I
would be pleased if we adopted their approach to optional behaviour.
The fifth of the 10 design goals for XML was (see the XML 1.1 standard
at http://www.w3.org/TR/2006/REC-xml11-20060816):
"5.The number of optional features in XML is to be kept to the
absolute minimum, ideally zero."

So, if XML is to be our guiding light, then we should avoid optional behaviour.

(3) As for CIF1 having optional behaviours, what might those be?  I
would assert that regardless of the wording of the standard, those
optional features are either never supported, or else always
supported, or else irrelevant to the core use of CIF.

So: I don't think that appealling to HTML or XML proves that optional
behaviour is a good thing, and lacking supporting argument, the appeal
to CIF1 does not prove it either.

Moving on to your second assertion about liberal readers and strict
writers: while that philosophy has its adherents, an alternative
philosophy also exists: readers should exit gracefully on standards
violations.  I quote a recent Linux Weekly News article which bears on
this discussion:
"The notion that one should be liberal in what one accepts while being
conservative in what one sends is often expressed in the networking
field, but it shows up in a number of other areas as well. Often,
though, it can make more sense to be conservative on the accepting
side; the condition of many web pages would have been far better had
early browsers not been so forgiving of bad HTML."
(http://lwn.net/Articles/394175/)

So: which approach you adopt to writing standards-conformant readers
requires some thought, particularly given the possibility that liberal
readers will encourage liberal writers.

The final assertion about the consequences of being too strict might
in theory be true, but it will require clear use-cases to support it
rather than simply asserting it as a truism.  I would suggest that we
are nowhere near the point of forcing incompatible dialects to emerge,
given that the addition of UTF8 to the standard does not meaningfully
restrict the choices offered to CIF users, and any other restrictions
that we have introduced into CIF2 relative to CIF1 are very minor.
Based on this observation, my expectation is that CIF2 will no more
produce incompatible dialects than CIF1, *provided we have no optional
behaviour*.

I will address what I believe is the real source of disagreement on
this point, which is my statement that "Standards-conformant readers
must be able to read all files produced by standards-conformant
writers", in an answer to John B's other post.

>
> As for John's hashing scheme, I suspect some variation of it will find signficant use in major archives, just as associating MD5 checksums
> with tarballs does for many software distributors, but that we also
> will need some easier-to-generate-and-transfer _optional_ encoding
> hint schemes, such as the accented "o's".  One simple way to handle
> it would be:
>
>  1.  Put some variant of the accented "o's" into the _optional_
> magic number; and
>  2.  Adopt the tarball approach to MD5 checksums by having it not
> in the header but in a separate file, simply generating it from
> a canonical UTF8 representation of the CIF2 file.
>
> The accented o's are easy to carry along as an encoding hint, and
> if you get the encoding hint right, then you will easily be able
> to generate a canonical UTF8 file to validate the MD5 checksum against
> if you wish for a critical file transfer, e.g. to an archive or a journal.
>
> Regards,
>   Herbert
>
>
>
>
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.