[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
From: Brian McMahon <[email protected]>
Date: Sat, 10 Oct 2009 13:59:24 +0100
In-Reply-To: <[email protected]>
References: <C6F5BF24.1200E%[email protected]><[email protected]><[email protected]><[email protected]>

Some general comments.

In agreeing (in principle) to adopt DDLm, COMCIFS has accepted the
need for changes to STAR and to DDL applications that are incompatible
with the original formulations. Not necessarily to CIF data files
(as Nick points out, COMCIFS can still mandate invariant data files);
but I think the mood of COMCIFS is to accept this as an opportunity
to improve CIF.

I also think that introducing new constructs such as bracket delimiters
to STAR/DDL will inevitably lead to pressure to include them in CIF.
COMCIFS may do this quickly or slowly, depending on the pressure from
the community, but we should suppose that at some point CIFs will
exist that have whatever syntactic changes we introduce here into
STAR and DDL.

The trick then is ensuring that the community can handle a universe
containing "old" and "new" CIFs. "Remediation" is not the answer,
because one can always use legacy software to create an "old" CIF
that is perfectly valid against the original specifications.

It is also unlikely that all CIF software will be upgraded to handle new
CIFs. We might want it to be, but suppose Ton Spek is unwilling or unable
to modify PLATON to read UTF-8 (is this easy to do with Fortran?). This
would have a severe impact on Acta's validation procedures. And for
the purposes of that particular program, the proposed CIF enhancements
have little relevance.

So there will need to be procedures allowing old software to handle
"new" CIFs to the extent that that is useful - and as in my PLATON
example, it may still be very useful. Hence I would like to be sure
that the new features we introduce will at least allow lossless
"old"->"new"->"old" AND "new"->"old"->"new" cycles of conversion.

Such conversions might actually be performed by standalone applications
or by library subroutines allowing on-the-fly management of CIFs of
both the new and the old type.

Lossless need not require the initial and final files to be
identical, so
  _name  O'Neill  ->  _name  "O\u27Neill"  ->  _name "O'Neill"
is acceptable (where I use \u27 in this email to stand for
whatever Unicode encoding we decide to support; though if I
understand things correctly, UTF-8 encoding of that character
is the same as an ASCII apostrophe, so would not be permitted
under the current proposal!).

This is still somewhat problematic, as one could not guarantee
that PLATON, let us say, will actually treat the atom label
identically in these two cases:
    _atom_site_label       O1'
    _atom_site_label      "O1'"
and one may therefore need additional normalization or translation
tools for specific legacy applications; but I think you need at
least to ensure that the information content can go through several
such cycles without loss.

Taking "new" CIFs with bracketed delimiters through the inverse cycle
should not be problematic, to the extent that one assumes "old" software
can't do anything useful with the contents of a bracketed data value,
so you just surround it with semicolon delimiters and some "magic
number" to indicate that's what you have done.

As I finish writing this, I see James' vote and comments has just come
in, and in some of what he says I see resonances with at least some of
these ideas.

I'll send more comments later.
Brian
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Brian McMahon)

References:

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Nick Spadaccini)

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (SIMON WESTRIP)

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)

Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.