[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- From: Brian McMahon <bm@iucr.org>
- Date: Sat, 10 Oct 2009 13:59:24 +0100
- In-Reply-To: <279aad2a0910100513u1e9ef18dua5f984cc20ac9a9b@mail.gmail.com>
- References: <C6F5BF24.1200E%nick@csse.uwa.edu.au><645410.77656.qm@web87015.mail.ird.yahoo.com><279aad2a0910100249o2c09897anb767ab28b06cbdcf@mail.gmail.com><279aad2a0910100513u1e9ef18dua5f984cc20ac9a9b@mail.gmail.com>
Some general comments. In agreeing (in principle) to adopt DDLm, COMCIFS has accepted the need for changes to STAR and to DDL applications that are incompatible with the original formulations. Not necessarily to CIF data files (as Nick points out, COMCIFS can still mandate invariant data files); but I think the mood of COMCIFS is to accept this as an opportunity to improve CIF. I also think that introducing new constructs such as bracket delimiters to STAR/DDL will inevitably lead to pressure to include them in CIF. COMCIFS may do this quickly or slowly, depending on the pressure from the community, but we should suppose that at some point CIFs will exist that have whatever syntactic changes we introduce here into STAR and DDL. The trick then is ensuring that the community can handle a universe containing "old" and "new" CIFs. "Remediation" is not the answer, because one can always use legacy software to create an "old" CIF that is perfectly valid against the original specifications. It is also unlikely that all CIF software will be upgraded to handle new CIFs. We might want it to be, but suppose Ton Spek is unwilling or unable to modify PLATON to read UTF-8 (is this easy to do with Fortran?). This would have a severe impact on Acta's validation procedures. And for the purposes of that particular program, the proposed CIF enhancements have little relevance. So there will need to be procedures allowing old software to handle "new" CIFs to the extent that that is useful - and as in my PLATON example, it may still be very useful. Hence I would like to be sure that the new features we introduce will at least allow lossless "old"->"new"->"old" AND "new"->"old"->"new" cycles of conversion. Such conversions might actually be performed by standalone applications or by library subroutines allowing on-the-fly management of CIFs of both the new and the old type. Lossless need not require the initial and final files to be identical, so _name O'Neill -> _name "O\u27Neill" -> _name "O'Neill" is acceptable (where I use \u27 in this email to stand for whatever Unicode encoding we decide to support; though if I understand things correctly, UTF-8 encoding of that character is the same as an ASCII apostrophe, so would not be permitted under the current proposal!). This is still somewhat problematic, as one could not guarantee that PLATON, let us say, will actually treat the atom label identically in these two cases: _atom_site_label O1' _atom_site_label "O1'" and one may therefore need additional normalization or translation tools for specific legacy applications; but I think you need at least to ensure that the information content can go through several such cycles without loss. Taking "new" CIFs with bracketed delimiters through the inverse cycle should not be problematic, to the extent that one assumes "old" software can't do anything useful with the contents of a bracketed data value, so you just surround it with semicolon delimiters and some "magic number" to indicate that's what you have done. As I finish writing this, I see James' vote and comments has just come in, and in some of what he says I see resonances with at least some of these ideas. I'll send more comments later. Brian _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- References:
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Nick Spadaccini)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (SIMON WESTRIP)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)
- Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Index(es):