[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Data-name character restrictions - one last time
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Data-name character restrictions - one last time
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Fri, 11 Dec 2009 20:27:08 +0800
- Authentication-Results: postfix;
- In-Reply-To: <alpine.BSF.2.00.0912110631210.4757@epsilon.pair.com>
I don't think a mechanism for specifying a starting index will work at the individual definition level. They will all have to start at the same address otherwise if I try to access within dREL some other object, how do I know what its starting index is? Best to decide on a starting index and fix it. There is an historical precedent in CIF that has it staring at 1. As wrong as I would argue that is, it is in stone so stick with it. In my code I will simply offset the index by -1 to get to the real storage point (I don't program in languages that index starting at 1) - it I easy enough to do. Seems a solution to me. The _xxx_yyy[] syntax is an ancient category like definition that never appears in data. On 11/12/09 7:37 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com> wrote: > I am saying that any declaration of an array or of a list would > make its individual elements available for use in any data CIF without > the need for any further declarations in the dictionary. This is > simple and clear and completely consistent with dREL. The only really > new thing would be some mechanism(s) to specify the starting index. > > I think this covers John's need. The only thing it would not cover > is something like _xxx_yyy[] which appears in some CIF1 dictionaries > but not in the data files, so I don't think there should be an > issue with not allowing those in CIF2. > > Does anyone see a problem with this? > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Fri, 11 Dec 2009, Nick Spadaccini wrote: > >> I can agree with that, if you are saying only the matrix object is available >> to the user. >> >> OR alternatively are you saying there will ONLY be one object defined in the >> dictionary, let's say the 3x3 matrix >> >> _atom_site.U >> >> But NEVER have definitions in the dictionary for the individual >> _atom_site.U[i][j] elements. >> >> As we parse a CIF data file, if we detect _atom_site.U[i][j], it isn't in >> the defined dictionary so this would normally raise an error. BUT because of >> the specific trailing syntax [i][j] this informs the parser there must be an >> object of matching rank with the name _atom_site.U (ie the >> _atom_site.U[i][j] with the [i][j] truncated) in the dictionary - and >> therefore populate the appropriate element of _atom_site.U with that value. >> >> This would circumvent the problem of two different identifiers called >> _atom_site.U[i][j] in the dictionary BUT would necessarily mean that [i][j] >> syntax in a data name was reserved for objects that are defined in the >> dictionary as, in this case, a 2D matrix. They can't (shouldn't?) be used >> for general data names. >> >> Does this cover what John wanted also? >> >> >> On 11/12/09 10:12 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com> >> wrote: >> >>> Actually, the suggestion comes from reading the dREL documentation and the >>> DDLm documentation and noticing how clumsy the access to array elements in >>> DDLm is compared to the access in dREL. What I am suggesting is to >>> promote the dREL access making it fully available at the DDLm level, >>> replacing the clumsy element-by-element definitions with one automatic >>> definition that looks and works just the way one might expect. >>> >>> Regards, >>> Herbert >>> >>> ===================================================== >>> Herbert J. Bernstein, Professor of Computer Science >>> Dowling College, Kramer Science Center, KSC 121 >>> Idle Hour Blvd, Oakdale, NY, 11769 >>> >>> +1-631-244-3035 >>> yaya@dowling.edu >>> ===================================================== >>> >>> On Fri, 11 Dec 2009, Nick Spadaccini wrote: >>> >>>> Many of you need to read the dREL part of the dictionary much more closely. >>>> >>>> dREL extensively exploits access to matrix and vector types by index >>>> addressing at a programmatic level. That's how it gets done the things it >>>> is >>>> has to. So within the dREL programming language you will see littered >>>> everywhere a matrix which is accessed via standard indexing (as you would >>>> with any language supporting array structures). >>>> >>>> So lets have a matrix _atom_site.U - within dREL I have access to >>>> _atom_site.U[0][0] etc as part of the language (I'll stick with 0 initial >>>> indexing but this really is a trivial problem, solved many times over). >>>> >>>> But now you ALSO want a scalar data item called _atom_site.U[0][0] with in >>>> CIF. The dictionary says _atom_site.U[0][0] is a single scalar value. >>>> >>>> The dREL constructor method for _atom_site.U has >>>> >>>> _atom_site.U = Matrix([[atom_site.U[0][0] ...]...]) >>>> >>>> This obviously won't work. This is why the dictionary in DDLm uses the >>>> equivalent of _atom_site.U_0_0 for the scalar value so that the above >>>> constructor will make sense and still allows me to access >>>> _atom_site.U[0][0] >>>> from within dREL. It is why I am keen to restrict the syntax of the data >>>> names. >>>> >>>> >>>> On 11/12/09 2:46 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com> >>>> wrote: >>>> >>>>> Dear Colleagues, >>>>> >>>>> One very neat resolution to this problem would be to allow a >>>>> list or array-typed CIF2 tag to be referenced in a data file either >>>>> as a whole or element by element. >>>>> >>>>> Thus >>>>> >>>>> _a.vec >>>>> >>>>> being defined as an array or list in CIF2 would automatically make >>>>> the tags >>>>> >>>>> _a.vec[1] >>>>> _a.vec[2] >>>>> ... >>>>> >>>>> defined CIF2 tags. If the array or list were nested, the >>>>> >>>>> _a.vec[1][1] >>>>> _a.vec[1][2] >>>>> >>>>> etc. would be valid tags >>>>> >>>>> I would propose that this be general and automatic, applying to >>>>> all tags defined as list or arrays. In view of past practice in >>>>> CIF1, there is a slight conflict with respect to the default starting >>>>> index in dREL versus the common CIF1 practice in indexing arrays >>>>> from 0, but that can (and should be solved) with explicit specification >>>>> of a starting index, so we can carry over the tag name usage from >>>>> CIF1 without confusing people with an index shift. So, if _a.vec >>>>> were an array of dimension 5, starting from index 0, _a.vec[0] >>>>> through _a.vec[4] would be valid, but if the starting index were >>>>> specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching >>>>> CIF1 conventions. >>>>> >>>>> The aliasing mechanism might have to be extended or clarified to >>>>> handle the mapping against CIF1 tags in bulk for _a.vec as a whole, >>>>> but, to me, this has a very intuitive feel. >>>>> >>>>> Regards, >>>>> Herbert >>>>> >>>>> >>>>> At 3:29 PM -0500 12/9/09, John Westbrook wrote: >>>>>> Hi all - >>>>>> >>>>>> On the issue of reserved characters in mmCIF/PDBx data items, these >>>>>> generally have been inherited from the style of items from the core. The >>>>>> majority of items in this class are data items related to short >>>>>> matrices/tensors >>>>>> and vectors (e.g. items including []). Virtually all have a syntax >>>>>> which >>>>>> could reasonably be interpreted as a programmatic reference. For >>>>>> instance, >>>>>> >>>>>> >>>>>> _atom_sites.fract_transf_matrix[1][1] 0.007738 >>>>>> _atom_sites.fract_transf_matrix[1][2] 0.000000 >>>>>> _atom_sites.fract_transf_matrix[1][3] 0.004298 >>>>>> _atom_sites.fract_transf_matrix[2][1] 0.000000 >>>>>> _atom_sites.fract_transf_matrix[2][2] 0.016545 >>>>>> _atom_sites.fract_transf_matrix[2][3] 0.000000 >>>>>> _atom_sites.fract_transf_matrix[3][1] 0.000000 >>>>>> _atom_sites.fract_transf_matrix[3][2] 0.000000 >>>>>> _atom_sites.fract_transf_matrix[3][3] 0.020200 >>>>>> _atom_sites.fract_transf_vector[1] 0.00000 >>>>>> _atom_sites.fract_transf_vector[2] 0.00000 >>>>>> _atom_sites.fract_transf_vector[3] 0.00000 >>>>>> >>>>>> Are we close to being able to treat these as legal in the context of >>>>>> CIF2/DDL+? >>>>>> I suppose I am asking what will constitute a legal assignment for an >>>>>> element >>>>>> of a matrix/array - >>>>>> >>>>>> Only this - >>>>>> >>>>>> _a.vec [1,2,3] >>>>>> >>>>>> or also expanded assignment by element such as - >>>>>> >>>>>> _a.vec[1] 1 >>>>>> _a.vec[2] 2 >>>>>> _a.vec[3] 3 >>>>>> >>>>>> If the latter is to be considered, then this will solve most of the data >>>>>> name >>>>>> issues for our data. >>>>>> >>>>>> Regards, >>>>>> >>>>>> John >>>>>> >>>>>> Joe Krahn wrote: >>>>>>> In practice, CIF2 parsers should allow CIF1 data names within a CIF2 >>>>>>> formatted file. The question is whether these files should be allowed >>>>>>> as >>>>>>> valid CIF2, or just for convenience as a non-standard CIF2. >>>>>>> >>>>>>> When CIF files are used as working data files, the restrictions should >>>>>>> be relaxed. For long-term archival files, it makes sense to be more >>>>>>> restrictive. I would just make the CIF1 names inaccessible to dREL. >>>>>>> Alternatively, an implementation could allow CIF1 names only on >>>>>>> reading, >>>>>>> and require dictionary alias mappings to CIF2 names. >>>>>>> >>>>>>> One argument in favor of allowing them would be that someone wants to >>>>>>> convert all data files to CIF2 format, but they want to preserve the >>>>>>> original data as-is, without alias mapping. >>>>>>> >>>>>>> I think that the current CIF2 syntax makes it possible to use CIF1 >>>>>>> names >>>>>>> without any ambiguities. The question is whether they should be >>>>>>> considered valid CIF2, or just a non-standard version that will be >>>>>>> useful for the transitional period. >>>>>>> >>>>>>> Joe >>>>>>> >>>>>>> >>>>>>> Herbert J. Bernstein wrote: >>>>>>>> Personally, I would greatly prefer to allow all data names that do not >>>>>>>> create a major lexer/parser conflict to appear in a data CIF and >>>>>>>> only apply the strong restrictions to data names that appear in CIF2 >>>>>>>> dictionaries as defined data names (not as aliases). -- Herbert >>>>>>>> >>>>>>>> >>>>>>>> At 2:40 PM +0000 12/9/09, Brian McMahon wrote: >>>>>>>>> I have one remaining niggle that I'd like to revisit before we put >>>>>>>>> this finally to bed. As has been mentioned a couple of times >>>>>>>>> recently, restricting the data-name character set does invalidate >>>>>>>>> syntactically many existing CIF 1 files (e.g. >>>>>>>>> _refine_ls_shift/esd_max >>>>>>>>> ). >>>>>>>>> We have discussed strategies for handling this, and I think these >>>>>>>>> are workable strategies, but will involve investment and hence >>>>>>>>> expense >>>>>>>>> in workflow management in CIF archives. >>>>>>>>> >>>>>>>>> I understand the rationale behind this restriction is to simplify >>>>>>>>> future processing of data names in areas such as dREL >>>>>>>>> applications. The question really is whether we're choosing the right >>>>>>>>> trade-off in making things cleaner at that end of the processing >>>>>>>>> chain. I would suppose that a dREL or other application could ingest >>>>>>>>> a >>>>>>>>> data name with dangerous characters, convert it internally into a >>>>>>>>> "safe" identifier that's used for all processing, and then restore >>>>>>>>> the >>>>>>>>> original form upon output; but writing that intermediate layer of >>>>>>>>> processing is of course expensive (especially if there aren't readily >>>>>>>>> available libraries that will do this transparently). >>>>>>>>> >>>>>>>>> I suspect that some of the original proposed syntactic changes also >>>>>>>>> had the effect (whether by design or collaterally) of simplifying >>>>>>>>> i/o, >>>>>>>>> data structure management, symbol table processing etc., but those >>>>>>>>> may >>>>>>>>> have suffered in the subsequent revision exercise we've just been >>>>>>>>> practising. Given the consensus we are now approaching, would the >>>>>>>>> code >>>>>>>>> builders now be prepared to incur the addition expense of handling >>>>>>>>> "dangerous" data names? >>>>>>>>> >>>>>>>>> I really don't want to spark off a long discussion on this - if a >>>>>>>>> quick round of response shows that there's no appetite to allow >>>>>>>>> the additional punctuation characters in data names, I'll accept that >>>>>>>>> gracefully. >>>>>>>>> >>>>>>>>> *** >>>>>>>>> >>>>>>>>> One last comment while I have the floor, though it is related in part >>>>>>>>> to the above question. A concern raised in the editorial office was >>>>>>>>> that there would be circumstances where users didn't know if they >>>>>>>>> were >>>>>>>>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting >>>>>>>>> to the vi editor - and we're imagining most of them are dealing with >>>>>>>>> small-molecule/inorganic CIFs). My supposition is that the IUCr >>>>>>>>> editorial offices would only want to use CIF2 seriously in >>>>>>>>> association >>>>>>>>> with DDLm dictionaries, and that we would expect the revised core >>>>>>>>> dictionaries to use the dot component in data names to signal this >>>>>>>>> further evolution. So even a superficial glimpse of the middle of a >>>>>>>>> CIF would make it clear whether it was CIF1 or CIF2. >>>>>>>>> >>>>>>>>> Does that fit in with how others see this progressing? >>>>>>>>> >>>>>>>>> Cheers >>>>>>>>> Brian >>>>>>>>> _______________________________________________ >>>>>>>>> ddlm-group mailing list >>>>>>>>> ddlm-group@iucr.org >>>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ddlm-group mailing list >>>>>>> ddlm-group@iucr.org >>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>> >>>>>> -- >>>>>> ****************************************************************** >>>>>> John Westbrook, Ph.D. >>>>>> Rutgers, The State University of New Jersey >>>>>> Department of Chemistry and Chemical Biology >>>>>> 610 Taylor Road >>>>>> Piscataway, NJ 08854-8087 >>>>>> e-mail: jwest@rcsb.rutgers.edu >>>>>> Ph: (732) 445-4290 Fax: (732) 445-4320 >>>>>> ****************************************************************** >>>>>> >>>>>> _______________________________________________ >>>>>> ddlm-group mailing list >>>>>> ddlm-group@iucr.org >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>> >>>> >>>> cheers >>>> >>>> Nick >>>> >>>> -------------------------------- >>>> Associate Professor N. Spadaccini, PhD >>>> School of Computer Science & Software Engineering >>>> >>>> The University of Western Australia t: +61 (0)8 6488 3452 >>>> 35 Stirling Highway f: +61 (0)8 6488 1089 >>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >>>> MBDP M002 >>>> >>>> CRICOS Provider Code: 00126G >>>> >>>> e: Nick.Spadaccini@uwa.edu.au >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >> >> cheers >> >> Nick >> >> -------------------------------- >> Associate Professor N. Spadaccini, PhD >> School of Computer Science & Software Engineering >> >> The University of Western Australia t: +61 (0)8 6488 3452 >> 35 Stirling Highway f: +61 (0)8 6488 1089 >> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >> MBDP M002 >> >> CRICOS Provider Code: 00126G >> >> e: Nick.Spadaccini@uwa.edu.au >> >> >> >> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)
- Re: [ddlm-group] Data-name character restrictions - one last time (David Brown)
- References:
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Index(es):