Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

On 12/10/09 5:50 PM, "Brian McMahon" <bm@iucr.org> wrote:

> I am leaning increasingly to the viewpoint that - so long as it is
> technically possible to interconvert old and new CIFs without loss
> of information (although there may be loss of functionality in the
> CIF1.2->CIF1.1 direction) - the new specification should be maximally
> disruptive.

By maximally disruptive, you mean strict. To this I strongly agree. This is
what I have been insisting on, not because I am trying to be an SOB, but
because it is a specification and hence has to be strictly defined. How
strictly it is enforced in the short term is a different matter, and that is
what has dominated the discussion to date.

I do have a problem with the way the term "we should deprecate" has been
thrown around. I think most habe grossly underestimated how difficult it
will be to parse according to the specification, and then folding back in to
the older spec if something violates the new spec. I can create any number
of strings that appear to be in old spec, but are actually in new spec and
the deprecating parser will be fooled completely. Needless to say I will
build "maximally disruptive" parsers so that I can hopefully fulfil the
full, new specification.
> "Maximally" of course doesn't mean "gratuitously". Only changes
> should be introduced that have clear purpose, and those changes
> should be clearly and cleanly documented on the CIF web site,
> with an accompanying description of the intended purpose.
> As Herbert says, there are many communities. They should be reassured
> that there is no need for them to change *anything*, until and unless
> they need to use the new functionality offered by CIF1.2.

But we will need to know in some way that it is not CIF1.2, hence the magic
comment etc. It can possibly be done with parsing phase that detects the
context and is able to establish the version, but the magic comment makes it
much easier.

> We do need to provide interconversion tools, and I think it may
> make life a little easier if we *mandate* a "#\#CIF_1.2" magic
> comment version identifier.

Agreed. Does its absence strictly mean it cannot be CIF1.2?

> Note also that it is very likely that the vast majority of data
> CIFs created by many (if not most) 1.2-capable applications will
> in practice be fully conformant to the existing 1.1 specs.

99% even to the point that ascii will be by far the most dominant.

> With this philosophy, therefore, my votes are:
> "1.2 Terminate all quoted strings on the occurrence of the
>      trailing quoted delimiter without consideration of the
>      next character"


> BUT with James' proviso: "I would require that there was always
> whitespace between tokens, regardless of the status of embedded
> delimiters."

You are requiring this of CIF1.2 writers.

> "2.3 Modify the current convention to forbid the use of
>      any characters other than a strictly limited set
>      of characters, making it an error to read a non-delimited
>      string that does not comply even if the intention
>      can be inferred from context"

> "3.2  Use UTF-8"

This I still have reservations about this. Simon accepts that text editors
can't handle these easily (especially in edit-insert mode) but soon no-one
will be using CIFs in this way. It will all be application driven. Somebody
said this nearly 15 years ago. Namely me. I was wrong then, I think Simon
will be proven wrong also - I hope not, but I fear he will be. People
underestimate the power of the Unix philosophy inter-piped tools sets that
deal with non binary input, with editing in text, the dominance of ascii
etc. I believe James makes the same presumption about the emerging dominance
of UTF-8. The UTF-8 spec is 16 years old! It is not as new as you think, and
ascii still dominates.

I am a little biased. Flex and Bison are a little wobbly on utf-8 so it
makes life much harder (I am not bout to build a full STAR parser by hand).
JavaCC is cool with utf-8, but I don't want to build everything in Java,
just some of it.



Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.