[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Data-name character restrictions - one last time
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Data-name character restrictions - one last time
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Fri, 11 Dec 2009 09:08:21 +0800
- Authentication-Results: postfix;
- In-Reply-To: <a06240800c746ed907fcc@[192.168.2.104]>
Many of you need to read the dREL part of the dictionary much more closely. dREL extensively exploits access to matrix and vector types by index addressing at a programmatic level. That's how it gets done the things it is has to. So within the dREL programming language you will see littered everywhere a matrix which is accessed via standard indexing (as you would with any language supporting array structures). So lets have a matrix _atom_site.U - within dREL I have access to _atom_site.U[0][0] etc as part of the language (I'll stick with 0 initial indexing but this really is a trivial problem, solved many times over). But now you ALSO want a scalar data item called _atom_site.U[0][0] with in CIF. The dictionary says _atom_site.U[0][0] is a single scalar value. The dREL constructor method for _atom_site.U has _atom_site.U = Matrix([[atom_site.U[0][0] ...]...]) This obviously won't work. This is why the dictionary in DDLm uses the equivalent of _atom_site.U_0_0 for the scalar value so that the above constructor will make sense and still allows me to access _atom_site.U[0][0] from within dREL. It is why I am keen to restrict the syntax of the data names. On 11/12/09 2:46 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com> wrote: > Dear Colleagues, > > One very neat resolution to this problem would be to allow a > list or array-typed CIF2 tag to be referenced in a data file either > as a whole or element by element. > > Thus > > _a.vec > > being defined as an array or list in CIF2 would automatically make > the tags > > _a.vec[1] > _a.vec[2] > ... > > defined CIF2 tags. If the array or list were nested, the > > _a.vec[1][1] > _a.vec[1][2] > > etc. would be valid tags > > I would propose that this be general and automatic, applying to > all tags defined as list or arrays. In view of past practice in > CIF1, there is a slight conflict with respect to the default starting > index in dREL versus the common CIF1 practice in indexing arrays > from 0, but that can (and should be solved) with explicit specification > of a starting index, so we can carry over the tag name usage from > CIF1 without confusing people with an index shift. So, if _a.vec > were an array of dimension 5, starting from index 0, _a.vec[0] > through _a.vec[4] would be valid, but if the starting index were > specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching > CIF1 conventions. > > The aliasing mechanism might have to be extended or clarified to > handle the mapping against CIF1 tags in bulk for _a.vec as a whole, > but, to me, this has a very intuitive feel. > > Regards, > Herbert > > > At 3:29 PM -0500 12/9/09, John Westbrook wrote: >> Hi all - >> >> On the issue of reserved characters in mmCIF/PDBx data items, these >> generally have been inherited from the style of items from the core. The >> majority of items in this class are data items related to short >> matrices/tensors >> and vectors (e.g. items including []). Virtually all have a syntax which >> could reasonably be interpreted as a programmatic reference. For instance, >> >> >> _atom_sites.fract_transf_matrix[1][1] 0.007738 >> _atom_sites.fract_transf_matrix[1][2] 0.000000 >> _atom_sites.fract_transf_matrix[1][3] 0.004298 >> _atom_sites.fract_transf_matrix[2][1] 0.000000 >> _atom_sites.fract_transf_matrix[2][2] 0.016545 >> _atom_sites.fract_transf_matrix[2][3] 0.000000 >> _atom_sites.fract_transf_matrix[3][1] 0.000000 >> _atom_sites.fract_transf_matrix[3][2] 0.000000 >> _atom_sites.fract_transf_matrix[3][3] 0.020200 >> _atom_sites.fract_transf_vector[1] 0.00000 >> _atom_sites.fract_transf_vector[2] 0.00000 >> _atom_sites.fract_transf_vector[3] 0.00000 >> >> Are we close to being able to treat these as legal in the context of >> CIF2/DDL+? >> I suppose I am asking what will constitute a legal assignment for an element >> of a matrix/array - >> >> Only this - >> >> _a.vec [1,2,3] >> >> or also expanded assignment by element such as - >> >> _a.vec[1] 1 >> _a.vec[2] 2 >> _a.vec[3] 3 >> >> If the latter is to be considered, then this will solve most of the data name >> issues for our data. >> >> Regards, >> >> John >> >> Joe Krahn wrote: >>> In practice, CIF2 parsers should allow CIF1 data names within a CIF2 >>> formatted file. The question is whether these files should be allowed as >>> valid CIF2, or just for convenience as a non-standard CIF2. >>> >>> When CIF files are used as working data files, the restrictions should >>> be relaxed. For long-term archival files, it makes sense to be more >>> restrictive. I would just make the CIF1 names inaccessible to dREL. >>> Alternatively, an implementation could allow CIF1 names only on reading, >>> and require dictionary alias mappings to CIF2 names. >>> >>> One argument in favor of allowing them would be that someone wants to >>> convert all data files to CIF2 format, but they want to preserve the >>> original data as-is, without alias mapping. >>> >>> I think that the current CIF2 syntax makes it possible to use CIF1 names >>> without any ambiguities. The question is whether they should be >>> considered valid CIF2, or just a non-standard version that will be >>> useful for the transitional period. >>> >>> Joe >>> >>> >>> Herbert J. Bernstein wrote: >>>> Personally, I would greatly prefer to allow all data names that do not >>>> create a major lexer/parser conflict to appear in a data CIF and >>>> only apply the strong restrictions to data names that appear in CIF2 >>>> dictionaries as defined data names (not as aliases). -- Herbert >>>> >>>> >>>> At 2:40 PM +0000 12/9/09, Brian McMahon wrote: >>>>> I have one remaining niggle that I'd like to revisit before we put >>>>> this finally to bed. As has been mentioned a couple of times >>>>> recently, restricting the data-name character set does invalidate >>>>> syntactically many existing CIF 1 files (e.g. _refine_ls_shift/esd_max ). >>>>> We have discussed strategies for handling this, and I think these >>>>> are workable strategies, but will involve investment and hence expense >>>>> in workflow management in CIF archives. >>>>> >>>>> I understand the rationale behind this restriction is to simplify >>>>> future processing of data names in areas such as dREL >>>>> applications. The question really is whether we're choosing the right >>>>> trade-off in making things cleaner at that end of the processing >>>>> chain. I would suppose that a dREL or other application could ingest a >>>>> data name with dangerous characters, convert it internally into a >>>>> "safe" identifier that's used for all processing, and then restore the >>>>> original form upon output; but writing that intermediate layer of >>>>> processing is of course expensive (especially if there aren't readily >>>>> available libraries that will do this transparently). >>>>> >>>>> I suspect that some of the original proposed syntactic changes also >>>>> had the effect (whether by design or collaterally) of simplifying i/o, >>>>> data structure management, symbol table processing etc., but those may >>>>> have suffered in the subsequent revision exercise we've just been >>>>> practising. Given the consensus we are now approaching, would the code >>>>> builders now be prepared to incur the addition expense of handling >>>>> "dangerous" data names? >>>>> >>>>> I really don't want to spark off a long discussion on this - if a >>>>> quick round of response shows that there's no appetite to allow >>>>> the additional punctuation characters in data names, I'll accept that >>>>> gracefully. >>>>> >>>>> *** >>>>> >>>>> One last comment while I have the floor, though it is related in part >>>>> to the above question. A concern raised in the editorial office was >>>>> that there would be circumstances where users didn't know if they were >>>>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting >>>>> to the vi editor - and we're imagining most of them are dealing with >>>>> small-molecule/inorganic CIFs). My supposition is that the IUCr >>>>> editorial offices would only want to use CIF2 seriously in association >>>>> with DDLm dictionaries, and that we would expect the revised core >>>>> dictionaries to use the dot component in data names to signal this >>>>> further evolution. So even a superficial glimpse of the middle of a >>>>> CIF would make it clear whether it was CIF1 or CIF2. >>>>> >>>>> Does that fit in with how others see this progressing? >>>>> >>>>> Cheers >>>>> Brian >>>>> _______________________________________________ >>>>> ddlm-group mailing list >>>>> ddlm-group@iucr.org >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >>> >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> >> -- >> ****************************************************************** >> John Westbrook, Ph.D. >> Rutgers, The State University of New Jersey >> Department of Chemistry and Chemical Biology >> 610 Taylor Road >> Piscataway, NJ 08854-8087 >> e-mail: jwest@rcsb.rutgers.edu >> Ph: (732) 445-4290 Fax: (732) 445-4320 >> ****************************************************************** >> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group > cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Index(es):