[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Data-name character restrictions - one last time
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Data-name character restrictions - one last time
- From: Brian McMahon <bm@iucr.org>
- Date: Thu, 10 Dec 2009 12:03:08 +0000
- In-Reply-To: <alpine.BSF.2.00.0912091314470.49946@epsilon.pair.com>
- References: <20091209144035.GB29341@emerald.iucr.org><a06240801c74578ec8b59@[192.168.2.104]><alpine.BSF.2.00.0912091314470.49946@epsilon.pair.com>
On Wed, Dec 09, 2009 at 01:18:20PM -0500, Herbert J. Bernstein wrote: > The only problem with recognizing CIF2 files by the period in tag names is > that mmCIF and pdbx CIFs also use such data names. Hopefully those > files would have DDLm dictionaries early on, so that the presumption > of using DDLm dictionaries for all CIFs with name using periods would > be valid. Well, of course. My point is just that there will be some subset of single-crystal users for whom the transition to a dot-separated form of data name will cue them to the fact that they're dealing with a "new" version of CIF that might have other changes they need to be wary of. If that increases slightly their chances of editing their CIFs correctly, it's to their benefit and ours (in the editorial office). The subsequent discussions on this topic about "users" are certainly relevant. Although David has drawn a couple of very detailed descriptions of how computer software should be designed to facilitate an evolutionary change to CIF2, in the real world people will continue to create CIFs in all manner of ways, and there will always have to be validation, normalization and remediation procedures to minimise error and ambiguity (but I don't think we'll succeed in eleiminating them completely). The IUCr CIF archive contains known flaws - susceptible of remediation, but we haven't yet dedicated the resources to do that. Outside the IUCr world, there are all manner of creative "CIF" files in the archives of other publishers, in laboratories and on private desktops. My feeling is that the best analogy currently for the CIF world is the various HTML standards. I have responsibility for the IUCr web site, and consequently create hundreds if not thousands of web pages. Yet as a "user" of HTML I have only a hazy grasp of the distinctions between HTML 4.01 Strict, HTML 4.01 Transitional, XHTML 1.1 etc., and so my hand coding is certainly wrong in places; our site uses for historical reasons incompatible features of different HTML versions yet the individual pages are not properly labelled with their correct (or most nearly correct) DTDs; and sometimes the machine-generated pages are at best suspect, if not downright wrong. Yet, somehow, thanks to forgiving browsers, toleration of alternative renderings by different browsers, policy decisions to drop legacy support for some features, and a dose of good luck, the whole edifice somehow holds together and fulfils a useful function. It's not perfect - and not how you would want it to be designed if starting afresh with a blank sheet - but it gets the job done. Likewise CIF. And it's worth remembering that - rather like HTML - CIF remains, for the most part, pretty user-friendly. Most of what exercises us in these discussions are edge cases. In practice, a careful reading of a relatively short specification will allow anyone who pays attention to detail to create fully compliant CIFs with rather little effort. Regards Brian > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Wed, 9 Dec 2009, Herbert J. Bernstein wrote: > >> Personally, I would greatly prefer to allow all data names that do not >> create a major lexer/parser conflict to appear in a data CIF and >> only apply the strong restrictions to data names that appear in CIF2 >> dictionaries as defined data names (not as aliases). -- Herbert >> >> At 2:40 PM +0000 12/9/09, Brian McMahon wrote: >>>> I have one remaining niggle that I'd like to revisit before we put >>>> this finally to bed. As has been mentioned a couple of times >>>> recently, restricting the data-name character set does invalidate >>>> syntactically many existing CIF 1 files (e.g. _refine_ls_shift/esd_max ). >>>> We have discussed strategies for handling this, and I think these >>>> are workable strategies, but will involve investment and hence expense >>>> in workflow management in CIF archives. >>>> >>>> I understand the rationale behind this restriction is to simplify >>>> future processing of data names in areas such as dREL >>>> applications. The question really is whether we're choosing the right >>>> trade-off in making things cleaner at that end of the processing >>>> chain. I would suppose that a dREL or other application could ingest a >>>> data name with dangerous characters, convert it internally into a >>>> "safe" identifier that's used for all processing, and then restore the >>>> original form upon output; but writing that intermediate layer of >>>> processing is of course expensive (especially if there aren't readily >>>> available libraries that will do this transparently). >>>> >>>> I suspect that some of the original proposed syntactic changes also >>>> had the effect (whether by design or collaterally) of simplifying i/o, >>>> data structure management, symbol table processing etc., but those may >>>> have suffered in the subsequent revision exercise we've just been >>>> practising. Given the consensus we are now approaching, would the code >>>> builders now be prepared to incur the addition expense of handling >>>> "dangerous" data names? >>>> >>>> I really don't want to spark off a long discussion on this - if a >>>> quick round of response shows that there's no appetite to allow >>>> the additional punctuation characters in data names, I'll accept that >>>> gracefully. >>>> >>>> *** >>>> >>>> One last comment while I have the floor, though it is related in part >>>> to the above question. A concern raised in the editorial office was >>>> that there would be circumstances where users didn't know if they were >>>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting >>>> to the vi editor - and we're imagining most of them are dealing with >>>> small-molecule/inorganic CIFs). My supposition is that the IUCr >>>> editorial offices would only want to use CIF2 seriously in association >>>> with DDLm dictionaries, and that we would expect the revised core >>>> dictionaries to use the dot component in data names to signal this >>>> further evolution. So even a superficial glimpse of the middle of a >>>> CIF would make it clear whether it was CIF1 or CIF2. >>>> >>>> Does that fit in with how others see this progressing? >>>> >>>> Cheers >>>> Brian >>> -- >> ===================================================== >> Herbert J. Bernstein, Professor of Computer Science >> Dowling College, Kramer Science Center, KSC 121 >> Idle Hour Blvd, Oakdale, NY, 11769 >> >> +1-631-244-3035 >> yaya@dowling.edu >> ===================================================== _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Data-name character restrictions - one last time (Brian McMahon)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Index(es):