[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Data-name character restrictions - one last time
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Data-name character restrictions - one last time
- From: John Westbrook <jwest@pdb-mail.rutgers.edu>
- Date: Wed, 09 Dec 2009 15:21:35 -0500
- In-Reply-To: <4B1FF3BB.8010601@niehs.nih.gov>
- References: <20091209144035.GB29341@emerald.iucr.org> <a06240801c74578ec8b59@[192.168.2.104]><4B1FF3BB.8010601@niehs.nih.gov>
Hi all - On the issue of reserved characters in mmCIF/PDBx data items, these generally have been inherited from the style of items from the core. The majority of items in this class are data items related to short matrices/tensors and vectors (e.g. items including []). Virtually all have a syntax which could reasonably be interpreted as a programmatic reference. For instance, _atom_sites.fract_transf_matrix[1][1] 0.007738 _atom_sites.fract_transf_matrix[1][2] 0.000000 _atom_sites.fract_transf_matrix[1][3] 0.004298 _atom_sites.fract_transf_matrix[2][1] 0.000000 _atom_sites.fract_transf_matrix[2][2] 0.016545 _atom_sites.fract_transf_matrix[2][3] 0.000000 _atom_sites.fract_transf_matrix[3][1] 0.000000 _atom_sites.fract_transf_matrix[3][2] 0.000000 _atom_sites.fract_transf_matrix[3][3] 0.020200 _atom_sites.fract_transf_vector[1] 0.00000 _atom_sites.fract_transf_vector[2] 0.00000 _atom_sites.fract_transf_vector[3] 0.00000 Are we close to being able to treat these as legal in the context of CIF2/DDL+? I suppose I am asking what will constitute a legal assignment for an element of a matrix/array - Only this - _a.vec [1,2,3] or also expanded assignment by element such as - _a.vec[1] 1 _a.vec[2] 2 _a.vec[3] 3 If the latter is to be considered, then this will solve most of the data name issues for our data. Regards, John Joe Krahn wrote: > In practice, CIF2 parsers should allow CIF1 data names within a CIF2 > formatted file. The question is whether these files should be allowed as > valid CIF2, or just for convenience as a non-standard CIF2. > > When CIF files are used as working data files, the restrictions should > be relaxed. For long-term archival files, it makes sense to be more > restrictive. I would just make the CIF1 names inaccessible to dREL. > Alternatively, an implementation could allow CIF1 names only on reading, > and require dictionary alias mappings to CIF2 names. > > One argument in favor of allowing them would be that someone wants to > convert all data files to CIF2 format, but they want to preserve the > original data as-is, without alias mapping. > > I think that the current CIF2 syntax makes it possible to use CIF1 names > without any ambiguities. The question is whether they should be > considered valid CIF2, or just a non-standard version that will be > useful for the transitional period. > > Joe > > > Herbert J. Bernstein wrote: >> Personally, I would greatly prefer to allow all data names that do not >> create a major lexer/parser conflict to appear in a data CIF and >> only apply the strong restrictions to data names that appear in CIF2 >> dictionaries as defined data names (not as aliases). -- Herbert >> >> >> At 2:40 PM +0000 12/9/09, Brian McMahon wrote: >>> I have one remaining niggle that I'd like to revisit before we put >>> this finally to bed. As has been mentioned a couple of times >>> recently, restricting the data-name character set does invalidate >>> syntactically many existing CIF 1 files (e.g. _refine_ls_shift/esd_max ). >>> We have discussed strategies for handling this, and I think these >>> are workable strategies, but will involve investment and hence expense >>> in workflow management in CIF archives. >>> >>> I understand the rationale behind this restriction is to simplify >>> future processing of data names in areas such as dREL >>> applications. The question really is whether we're choosing the right >>> trade-off in making things cleaner at that end of the processing >>> chain. I would suppose that a dREL or other application could ingest a >>> data name with dangerous characters, convert it internally into a >>> "safe" identifier that's used for all processing, and then restore the >>> original form upon output; but writing that intermediate layer of >>> processing is of course expensive (especially if there aren't readily >>> available libraries that will do this transparently). >>> >>> I suspect that some of the original proposed syntactic changes also >>> had the effect (whether by design or collaterally) of simplifying i/o, >>> data structure management, symbol table processing etc., but those may >>> have suffered in the subsequent revision exercise we've just been >>> practising. Given the consensus we are now approaching, would the code >>> builders now be prepared to incur the addition expense of handling >>> "dangerous" data names? >>> >>> I really don't want to spark off a long discussion on this - if a >>> quick round of response shows that there's no appetite to allow >>> the additional punctuation characters in data names, I'll accept that >>> gracefully. >>> >>> *** >>> >>> One last comment while I have the floor, though it is related in part >>> to the above question. A concern raised in the editorial office was >>> that there would be circumstances where users didn't know if they were >>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting >>> to the vi editor - and we're imagining most of them are dealing with >>> small-molecule/inorganic CIFs). My supposition is that the IUCr >>> editorial offices would only want to use CIF2 seriously in association >>> with DDLm dictionaries, and that we would expect the revised core >>> dictionaries to use the dot component in data names to signal this >>> further evolution. So even a superficial glimpse of the middle of a >>> CIF would make it clear whether it was CIF1 or CIF2. >>> >>> Does that fit in with how others see this progressing? >>> >>> Cheers >>> Brian >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group -- ****************************************************************** John Westbrook, Ph.D. Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 e-mail: jwest@rcsb.rutgers.edu Ph: (732) 445-4290 Fax: (732) 445-4320 ****************************************************************** _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Data-name character restrictions - one last time (Brian McMahon)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)
- Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by Date: Re: [ddlm-group] Revised version of syntax change summary document
- Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Index(es):