[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Data-name character restrictions - one last time
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Data-name character restrictions - one last time
- From: Joe Krahn <krahn@niehs.nih.gov>
- Date: Fri, 11 Dec 2009 14:53:33 -0500
- In-Reply-To: <C7485B9C.128AD%nick@csse.uwa.edu.au>
- References: <C7485B9C.128AD%nick@csse.uwa.edu.au>
Instead of a start index, it would be possible to define an offset, if/when a different base index is really needed. It would be more clear in expressions that a non-standard base is used, because it has to be written explicitly, such as "_array[index+offset]". The other advantage is it works just as well with a negative starting index, if that is ever needed. Joe Nick Spadaccini wrote: > I don't think a mechanism for specifying a starting index will work at the > individual definition level. They will all have to start at the same address > otherwise if I try to access within dREL some other object, how do I know > what its starting index is? > > Best to decide on a starting index and fix it. There is an historical > precedent in CIF that has it staring at 1. As wrong as I would argue that > is, it is in stone so stick with it. > > In my code I will simply offset the index by -1 to get to the real storage > point (I don't program in languages that index starting at 1) - it I easy > enough to do. > > Seems a solution to me. > > The _xxx_yyy[] syntax is an ancient category like definition that never > appears in data. > > > On 11/12/09 7:37 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com> > wrote: > >> I am saying that any declaration of an array or of a list would >> make its individual elements available for use in any data CIF without >> the need for any further declarations in the dictionary. This is >> simple and clear and completely consistent with dREL. The only really >> new thing would be some mechanism(s) to specify the starting index. >> >> I think this covers John's need. The only thing it would not cover >> is something like _xxx_yyy[] which appears in some CIF1 dictionaries >> but not in the data files, so I don't think there should be an >> issue with not allowing those in CIF2. >> >> Does anyone see a problem with this? >> ===================================================== >> Herbert J. Bernstein, Professor of Computer Science >> Dowling College, Kramer Science Center, KSC 121 >> Idle Hour Blvd, Oakdale, NY, 11769 >> >> +1-631-244-3035 >> yaya@dowling.edu >> ===================================================== >> >> On Fri, 11 Dec 2009, Nick Spadaccini wrote: >> >>> I can agree with that, if you are saying only the matrix object is available >>> to the user. >>> >>> OR alternatively are you saying there will ONLY be one object defined in the >>> dictionary, let's say the 3x3 matrix >>> >>> _atom_site.U >>> >>> But NEVER have definitions in the dictionary for the individual >>> _atom_site.U[i][j] elements. >>> >>> As we parse a CIF data file, if we detect _atom_site.U[i][j], it isn't in >>> the defined dictionary so this would normally raise an error. BUT because of >>> the specific trailing syntax [i][j] this informs the parser there must be an >>> object of matching rank with the name _atom_site.U (ie the >>> _atom_site.U[i][j] with the [i][j] truncated) in the dictionary - and >>> therefore populate the appropriate element of _atom_site.U with that value. >>> >>> This would circumvent the problem of two different identifiers called >>> _atom_site.U[i][j] in the dictionary BUT would necessarily mean that [i][j] >>> syntax in a data name was reserved for objects that are defined in the >>> dictionary as, in this case, a 2D matrix. They can't (shouldn't?) be used >>> for general data names. >>> >>> Does this cover what John wanted also? >>> >>> >>> On 11/12/09 10:12 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com> >>> wrote: >>> >>>> Actually, the suggestion comes from reading the dREL documentation and the >>>> DDLm documentation and noticing how clumsy the access to array elements in >>>> DDLm is compared to the access in dREL. What I am suggesting is to >>>> promote the dREL access making it fully available at the DDLm level, >>>> replacing the clumsy element-by-element definitions with one automatic >>>> definition that looks and works just the way one might expect. >>>> >>>> Regards, >>>> Herbert >>>> >>>> ===================================================== >>>> Herbert J. Bernstein, Professor of Computer Science >>>> Dowling College, Kramer Science Center, KSC 121 >>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>> >>>> +1-631-244-3035 >>>> yaya@dowling.edu >>>> ===================================================== >>>> >>>> On Fri, 11 Dec 2009, Nick Spadaccini wrote: >>>> >>>>> Many of you need to read the dREL part of the dictionary much more closely. >>>>> >>>>> dREL extensively exploits access to matrix and vector types by index >>>>> addressing at a programmatic level. That's how it gets done the things it >>>>> is >>>>> has to. So within the dREL programming language you will see littered >>>>> everywhere a matrix which is accessed via standard indexing (as you would >>>>> with any language supporting array structures). >>>>> >>>>> So lets have a matrix _atom_site.U - within dREL I have access to >>>>> _atom_site.U[0][0] etc as part of the language (I'll stick with 0 initial >>>>> indexing but this really is a trivial problem, solved many times over). >>>>> >>>>> But now you ALSO want a scalar data item called _atom_site.U[0][0] with in >>>>> CIF. The dictionary says _atom_site.U[0][0] is a single scalar value. >>>>> >>>>> The dREL constructor method for _atom_site.U has >>>>> >>>>> _atom_site.U = Matrix([[atom_site.U[0][0] ...]...]) >>>>> >>>>> This obviously won't work. This is why the dictionary in DDLm uses the >>>>> equivalent of _atom_site.U_0_0 for the scalar value so that the above >>>>> constructor will make sense and still allows me to access >>>>> _atom_site.U[0][0] >>>>> from within dREL. It is why I am keen to restrict the syntax of the data >>>>> names. >>>>> >>>>> >>>>> On 11/12/09 2:46 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com> >>>>> wrote: >>>>> >>>>>> Dear Colleagues, >>>>>> >>>>>> One very neat resolution to this problem would be to allow a >>>>>> list or array-typed CIF2 tag to be referenced in a data file either >>>>>> as a whole or element by element. >>>>>> >>>>>> Thus >>>>>> >>>>>> _a.vec >>>>>> >>>>>> being defined as an array or list in CIF2 would automatically make >>>>>> the tags >>>>>> >>>>>> _a.vec[1] >>>>>> _a.vec[2] >>>>>> ... >>>>>> >>>>>> defined CIF2 tags. If the array or list were nested, the >>>>>> >>>>>> _a.vec[1][1] >>>>>> _a.vec[1][2] >>>>>> >>>>>> etc. would be valid tags >>>>>> >>>>>> I would propose that this be general and automatic, applying to >>>>>> all tags defined as list or arrays. In view of past practice in >>>>>> CIF1, there is a slight conflict with respect to the default starting >>>>>> index in dREL versus the common CIF1 practice in indexing arrays >>>>>> from 0, but that can (and should be solved) with explicit specification >>>>>> of a starting index, so we can carry over the tag name usage from >>>>>> CIF1 without confusing people with an index shift. So, if _a.vec >>>>>> were an array of dimension 5, starting from index 0, _a.vec[0] >>>>>> through _a.vec[4] would be valid, but if the starting index were >>>>>> specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching >>>>>> CIF1 conventions. >>>>>> >>>>>> The aliasing mechanism might have to be extended or clarified to >>>>>> handle the mapping against CIF1 tags in bulk for _a.vec as a whole, >>>>>> but, to me, this has a very intuitive feel. >>>>>> >>>>>> Regards, >>>>>> Herbert >>>>>> >>>>>> >>>>>> At 3:29 PM -0500 12/9/09, John Westbrook wrote: >>>>>>> Hi all - >>>>>>> >>>>>>> On the issue of reserved characters in mmCIF/PDBx data items, these >>>>>>> generally have been inherited from the style of items from the core. The >>>>>>> majority of items in this class are data items related to short >>>>>>> matrices/tensors >>>>>>> and vectors (e.g. items including []). Virtually all have a syntax >>>>>>> which >>>>>>> could reasonably be interpreted as a programmatic reference. For >>>>>>> instance, >>>>>>> >>>>>>> >>>>>>> _atom_sites.fract_transf_matrix[1][1] 0.007738 >>>>>>> _atom_sites.fract_transf_matrix[1][2] 0.000000 >>>>>>> _atom_sites.fract_transf_matrix[1][3] 0.004298 >>>>>>> _atom_sites.fract_transf_matrix[2][1] 0.000000 >>>>>>> _atom_sites.fract_transf_matrix[2][2] 0.016545 >>>>>>> _atom_sites.fract_transf_matrix[2][3] 0.000000 >>>>>>> _atom_sites.fract_transf_matrix[3][1] 0.000000 >>>>>>> _atom_sites.fract_transf_matrix[3][2] 0.000000 >>>>>>> _atom_sites.fract_transf_matrix[3][3] 0.020200 >>>>>>> _atom_sites.fract_transf_vector[1] 0.00000 >>>>>>> _atom_sites.fract_transf_vector[2] 0.00000 >>>>>>> _atom_sites.fract_transf_vector[3] 0.00000 >>>>>>> >>>>>>> Are we close to being able to treat these as legal in the context of >>>>>>> CIF2/DDL+? >>>>>>> I suppose I am asking what will constitute a legal assignment for an >>>>>>> element >>>>>>> of a matrix/array - >>>>>>> >>>>>>> Only this - >>>>>>> >>>>>>> _a.vec [1,2,3] >>>>>>> >>>>>>> or also expanded assignment by element such as - >>>>>>> >>>>>>> _a.vec[1] 1 >>>>>>> _a.vec[2] 2 >>>>>>> _a.vec[3] 3 >>>>>>> >>>>>>> If the latter is to be considered, then this will solve most of the data >>>>>>> name >>>>>>> issues for our data. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> John >>>>>>> >>>>>>> Joe Krahn wrote: >>>>>>>> In practice, CIF2 parsers should allow CIF1 data names within a CIF2 >>>>>>>> formatted file. The question is whether these files should be allowed >>>>>>>> as >>>>>>>> valid CIF2, or just for convenience as a non-standard CIF2. >>>>>>>> >>>>>>>> When CIF files are used as working data files, the restrictions should >>>>>>>> be relaxed. For long-term archival files, it makes sense to be more >>>>>>>> restrictive. I would just make the CIF1 names inaccessible to dREL. >>>>>>>> Alternatively, an implementation could allow CIF1 names only on >>>>>>>> reading, >>>>>>>> and require dictionary alias mappings to CIF2 names. >>>>>>>> >>>>>>>> One argument in favor of allowing them would be that someone wants to >>>>>>>> convert all data files to CIF2 format, but they want to preserve the >>>>>>>> original data as-is, without alias mapping. >>>>>>>> >>>>>>>> I think that the current CIF2 syntax makes it possible to use CIF1 >>>>>>>> names >>>>>>>> without any ambiguities. The question is whether they should be >>>>>>>> considered valid CIF2, or just a non-standard version that will be >>>>>>>> useful for the transitional period. >>>>>>>> >>>>>>>> Joe >>>>>>>> >>>>>>>> >>>>>>>> Herbert J. Bernstein wrote: >>>>>>>>> Personally, I would greatly prefer to allow all data names that do not >>>>>>>>> create a major lexer/parser conflict to appear in a data CIF and >>>>>>>>> only apply the strong restrictions to data names that appear in CIF2 >>>>>>>>> dictionaries as defined data names (not as aliases). -- Herbert >>>>>>>>> >>>>>>>>> >>>>>>>>> At 2:40 PM +0000 12/9/09, Brian McMahon wrote: >>>>>>>>>> I have one remaining niggle that I'd like to revisit before we put >>>>>>>>>> this finally to bed. As has been mentioned a couple of times >>>>>>>>>> recently, restricting the data-name character set does invalidate >>>>>>>>>> syntactically many existing CIF 1 files (e.g. >>>>>>>>>> _refine_ls_shift/esd_max >>>>>>>>>> ). >>>>>>>>>> We have discussed strategies for handling this, and I think these >>>>>>>>>> are workable strategies, but will involve investment and hence >>>>>>>>>> expense >>>>>>>>>> in workflow management in CIF archives. >>>>>>>>>> >>>>>>>>>> I understand the rationale behind this restriction is to simplify >>>>>>>>>> future processing of data names in areas such as dREL >>>>>>>>>> applications. The question really is whether we're choosing the right >>>>>>>>>> trade-off in making things cleaner at that end of the processing >>>>>>>>>> chain. I would suppose that a dREL or other application could ingest >>>>>>>>>> a >>>>>>>>>> data name with dangerous characters, convert it internally into a >>>>>>>>>> "safe" identifier that's used for all processing, and then restore >>>>>>>>>> the >>>>>>>>>> original form upon output; but writing that intermediate layer of >>>>>>>>>> processing is of course expensive (especially if there aren't readily >>>>>>>>>> available libraries that will do this transparently). >>>>>>>>>> >>>>>>>>>> I suspect that some of the original proposed syntactic changes also >>>>>>>>>> had the effect (whether by design or collaterally) of simplifying >>>>>>>>>> i/o, >>>>>>>>>> data structure management, symbol table processing etc., but those >>>>>>>>>> may >>>>>>>>>> have suffered in the subsequent revision exercise we've just been >>>>>>>>>> practising. Given the consensus we are now approaching, would the >>>>>>>>>> code >>>>>>>>>> builders now be prepared to incur the addition expense of handling >>>>>>>>>> "dangerous" data names? >>>>>>>>>> >>>>>>>>>> I really don't want to spark off a long discussion on this - if a >>>>>>>>>> quick round of response shows that there's no appetite to allow >>>>>>>>>> the additional punctuation characters in data names, I'll accept that >>>>>>>>>> gracefully. >>>>>>>>>> >>>>>>>>>> *** >>>>>>>>>> >>>>>>>>>> One last comment while I have the floor, though it is related in part >>>>>>>>>> to the above question. A concern raised in the editorial office was >>>>>>>>>> that there would be circumstances where users didn't know if they >>>>>>>>>> were >>>>>>>>>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting >>>>>>>>>> to the vi editor - and we're imagining most of them are dealing with >>>>>>>>>> small-molecule/inorganic CIFs). My supposition is that the IUCr >>>>>>>>>> editorial offices would only want to use CIF2 seriously in >>>>>>>>>> association >>>>>>>>>> with DDLm dictionaries, and that we would expect the revised core >>>>>>>>>> dictionaries to use the dot component in data names to signal this >>>>>>>>>> further evolution. So even a superficial glimpse of the middle of a >>>>>>>>>> CIF would make it clear whether it was CIF1 or CIF2. >>>>>>>>>> >>>>>>>>>> Does that fit in with how others see this progressing? >>>>>>>>>> >>>>>>>>>> Cheers >>>>>>>>>> Brian >>>>>>>>>> _______________________________________________ >>>>>>>>>> ddlm-group mailing list >>>>>>>>>> ddlm-group@iucr.org >>>>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>>>> _______________________________________________ >>>>>>>> ddlm-group mailing list >>>>>>>> ddlm-group@iucr.org >>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>>> -- >>>>>>> ****************************************************************** >>>>>>> John Westbrook, Ph.D. >>>>>>> Rutgers, The State University of New Jersey >>>>>>> Department of Chemistry and Chemical Biology >>>>>>> 610 Taylor Road >>>>>>> Piscataway, NJ 08854-8087 >>>>>>> e-mail: jwest@rcsb.rutgers.edu >>>>>>> Ph: (732) 445-4290 Fax: (732) 445-4320 >>>>>>> ****************************************************************** >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ddlm-group mailing list >>>>>>> ddlm-group@iucr.org >>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>> cheers >>>>> >>>>> Nick >>>>> >>>>> -------------------------------- >>>>> Associate Professor N. Spadaccini, PhD >>>>> School of Computer Science & Software Engineering >>>>> >>>>> The University of Western Australia t: +61 (0)8 6488 3452 >>>>> 35 Stirling Highway f: +61 (0)8 6488 1089 >>>>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >>>>> MBDP M002 >>>>> >>>>> CRICOS Provider Code: 00126G >>>>> >>>>> e: Nick.Spadaccini@uwa.edu.au >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> ddlm-group mailing list >>>>> ddlm-group@iucr.org >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>> >>> cheers >>> >>> Nick >>> >>> -------------------------------- >>> Associate Professor N. Spadaccini, PhD >>> School of Computer Science & Software Engineering >>> >>> The University of Western Australia t: +61 (0)8 6488 3452 >>> 35 Stirling Highway f: +61 (0)8 6488 1089 >>> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >>> MBDP M002 >>> >>> CRICOS Provider Code: 00126G >>> >>> e: Nick.Spadaccini@uwa.edu.au >>> >>> >>> >>> >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>> > > cheers > > Nick > > -------------------------------- > Associate Professor N. Spadaccini, PhD > School of Computer Science & Software Engineering > > The University of Western Australia t: +61 (0)8 6488 3452 > 35 Stirling Highway f: +61 (0)8 6488 1089 > CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick > MBDP M002 > > CRICOS Provider Code: 00126G > > e: Nick.Spadaccini@uwa.edu.au > > > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] Data-name character restrictions - one last time (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Index(es):