[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Data-name character restrictions - one last time
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Data-name character restrictions - one last time
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Thu, 10 Dec 2009 18:30:35 -0500
- In-Reply-To: <4B21791A.7020701@niehs.nih.gov>
- References: <20091209144035.GB29341@emerald.iucr.org><a06240801c74578ec8b59@[192.168.2.104]> <4B1FF3BB.8010601@niehs.nih.gov><4B2008B3.6090008@pdb-mail.rutgers.edu><a06240800c746ed907fcc@[192.168.2.104]> <4B214893.3080707@mcmaster.ca><a06240804c746f9b45839@[192.168.10.237]><4B21791A.7020701@niehs.nih.gov>
The dREL/DDLm documents explicitly use ":" for ranges. The main reason to forbid negative starting indices is for simplicity in a python implementation. At 5:41 PM -0500 12/10/09, Joe Krahn wrote: >The range notation "1:4" is familiar to Fortran90 programmers. Many >Fortran77 compilers also supported range notation. But, if ':' is >disallowed in unquoted strings, it would have to be written as: > > _type.dimension ['1:4','1:4'] > >Also, why not allow negative indices? It may make sense to allow only 0 >or 1, but why make it a mandatory restriction? > >Joe > >Herbert J. Bernstein wrote: >> No, the implicit zero comes from the dREL documentation in both the >> 2007 and 2008 versions. This is a very serious issue for people with >> a Fortran background, and causes many mistakes. Simply being able >> to specify the starting index would solve the problem. >> >> I agree that we need to keep in touch, but I am working from the >> dREL/DDLm documentation, and hope you are, too. What we need to do >> is to stop >> focusing in stylistic issues and work on getting the documentation to >> be clear and unambiguous with more examples, so we do not go another >> 3+ years without people being aware of such critical issues as the >> default starting >> index for arrays. >> >> You will find the statement about the default index for arrays in >>section 3.4 >> of dREL_spec_aug08.pdf. All we need to fix it is to adopt a new tag to >> identify the starting index, such as >> >> _type.starting_index >> >> or allow the dimensions of an array to be ranges. The only problem >> with that is that there is a strange python convention which would >> suggest that >> >> _type.dimension [1:5] >> >> would be declaring an array of dimension 4, starting at index 1. To avoid >> the confusion that would cause for Fortran programmers, I would suggest >> that we write dictionaries with >> >> _type.starting_index [1,1] >> _type.dimension [3,3] >> >> instead of >> >> _type.dimension [1:4,1:4] >> >> which would be natural in a python world, but not for Fortran programmers. >> To make implementation easy, I would not allow negative starting indices. >> >> -- Herbert >> >> At 2:14 PM -0500 12/10/09, David Brown wrote: >>> I was not aware that there was a default indexing of arrays. The >>> only place where this arises in DDL1 is in the list of symmetry >>> opertations where we originally failed to define a key for the symop >>> loop. But there, as far as I am aware, the assumed indexing always >>> starts at 1 for the first item. This is strictly a fix since CIF1 >>> specificly states that the order within a loop has no significance. >>> Later additions to the dictionary have corrected this oversight by >>> adding an explicit key, but it is not yet often used. Otherwise, in >>> DDL1 (and DDL2?) the elements of an array have explicit data names > >> that start at 1, not 0. The assumption that arrays are numbered >>>from zero must be an imgCIF convention. It would always be better >>> to include explicit indexing to avoid these problems. >>> >>> The DDLm dictionaries have methods for constructing arrays from >>> their elements, and methods for the reverse process could be added. >>> In this case it would not be necessary to decompose (or assemble) an >>> array on first resding as the necessary action would be taken as >>> soon as the array or its elements are invoked by a method or by a >>> list of items to output. >>> >>> This raises another concern. Herbert, if you are writing DDLm >>> dictionaries for imgCIF and I am writing them for coreCIF, we need >>> to keep in contact to make sure we are not introducing conficting >>> conventions. >>> >>> David >>> >>> Herbert J. Bernstein wrote: >>> >>>> Dear Colleagues, >>>> >>>> One very neat resolution to this problem would be to allow a > >>> list or array-typed CIF2 tag to be referenced in a data file either >>>> as a whole or element by element. >>>> >>>> Thus >>>> >>>> _a.vec >>>> >>>> being defined as an array or list in CIF2 would automatically make >>>> the tags >>>> >>>> _a.vec[1] >>>> _a.vec[2] >>>> ... >>>> >>>> defined CIF2 tags. If the array or list were nested, the >>>> >>>> _a.vec[1][1] >>>> _a.vec[1][2] >>>> >>>> etc. would be valid tags >>>> >>>> I would propose that this be general and automatic, applying to >>>> all tags defined as list or arrays. In view of past practice in >>>> CIF1, there is a slight conflict with respect to the default starting >>>> index in dREL versus the common CIF1 practice in indexing arrays >>> >from 0, but that can (and should be solved) with explicit specification >>>> of a starting index, so we can carry over the tag name usage from >>>> CIF1 without confusing people with an index shift. So, if _a.vec >>>> were an array of dimension 5, starting from index 0, _a.vec[0] >>>> through _a.vec[4] would be valid, but if the starting index were >>>> specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching >>>> CIF1 conventions. >>>> >>>> The aliasing mechanism might have to be extended or clarified to >>>> handle the mapping against CIF1 tags in bulk for _a.vec as a whole, >>>> but, to me, this has a very intuitive feel. >>>> >>>> Regards, >>>> Herbert >>>> >>>> >>>> At 3:29 PM -0500 12/9/09, John Westbrook wrote: >>>> >>>>> Hi all - >>>>> >>>>> On the issue of reserved characters in mmCIF/PDBx data items, these >>>>> generally have been inherited from the style of items from the core. The >>>>> majority of items in this class are data items related to short >>>>> matrices/tensors >>>>> and vectors (e.g. items including []). Virtually all have a >>>>>syntax which >>>>> could reasonably be interpreted as a programmatic reference. >>>>>For instance, >>>>> >>>>> >>>>> _atom_sites.fract_transf_matrix[1][1] 0.007738 >>>>> _atom_sites.fract_transf_matrix[1][2] 0.000000 >>>>> _atom_sites.fract_transf_matrix[1][3] 0.004298 >>>>> _atom_sites.fract_transf_matrix[2][1] 0.000000 >>>>> _atom_sites.fract_transf_matrix[2][2] 0.016545 >>>>> _atom_sites.fract_transf_matrix[2][3] 0.000000 >>>>> _atom_sites.fract_transf_matrix[3][1] 0.000000 >>>>> _atom_sites.fract_transf_matrix[3][2] 0.000000 >>>>> _atom_sites.fract_transf_matrix[3][3] 0.020200 >>>>> _atom_sites.fract_transf_vector[1] 0.00000 >>>>> _atom_sites.fract_transf_vector[2] 0.00000 >>>>> _atom_sites.fract_transf_vector[3] 0.00000 >>>>> >>>>> Are we close to being able to treat these as legal in the context of >>>>> CIF2/DDL+? >>>>> I suppose I am asking what will constitute a legal assignment >>>>>for an element >>>>> of a matrix/array - >>>>> >>>>> Only this - >>>>> >>>>> _a.vec [1,2,3] >>>>> >>>>> or also expanded assignment by element such as - >>>>> >>>>> _a.vec[1] 1 >>>>> _a.vec[2] 2 >>>>> _a.vec[3] 3 >>>>> >>>>> If the latter is to be considered, then this will solve most of >>>>> the data name >>>>> issues for our data. >>>>> >>>>> Regards, >>>>> >>>>> John >>>>> >>>>> Joe Krahn wrote: >>>>> >>>>>> In practice, CIF2 parsers should allow CIF1 data names within a CIF2 > >>>>> formatted file. The question is whether these files should >be allowed as >>>>>> valid CIF2, or just for convenience as a non-standard CIF2. >>>>>> >>>>>> When CIF files are used as working data files, the restrictions should >>>>>> be relaxed. For long-term archival files, it makes sense to be more >>>>>> restrictive. I would just make the CIF1 names inaccessible to dREL. >>>>>> Alternatively, an implementation could allow CIF1 names only >>>>>>on reading, >>>>>> and require dictionary alias mappings to CIF2 names. >>>>>> >>>>>> One argument in favor of allowing them would be that someone wants to >>>>>> convert all data files to CIF2 format, but they want to preserve the >>>>>> original data as-is, without alias mapping. >>>>>> >>>>>> I think that the current CIF2 syntax makes it possible to use >>>>>>CIF1 names >>>>>> without any ambiguities. The question is whether they should be >>>>>> considered valid CIF2, or just a non-standard version that will be > >>>>> useful for the transitional period. >>>>>> >>>>> > >>>>> >>>>>> Joe >>>>>> >>>>>> >>>>>> Herbert J. Bernstein wrote: >>>>>> >>>>>>> Personally, I would greatly prefer to allow all data names that do not >>>>>>> create a major lexer/parser conflict to appear in a data CIF and >>>>>>> only apply the strong restrictions to data names that appear in CIF2 >>>>>>> dictionaries as defined data names (not as aliases). -- Herbert >>>>>>> >>>>>>> >>>>>>> At 2:40 PM +0000 12/9/09, Brian McMahon wrote: >>>>>>> >>>>>>>> I have one remaining niggle that I'd like to revisit before we put >>>>>>>> this finally to bed. As has been mentioned a couple of times >>>>>>>> recently, restricting the data-name character set does invalidate >>>>>>>> syntactically many existing CIF 1 files (e.g. >>>>>>>> _refine_ls_shift/esd_max ). >>>>>>>> We have discussed strategies for handling this, and I think these >>>>>>>> are workable strategies, but will involve investment and >>>>>>>>hence expense >>>>>>>> in workflow management in CIF archives. >>>>>>>> >>>>>>>> I understand the rationale behind this restriction is to simplify >>>>>>>> future processing of data names in areas such as dREL >>>>>>>> applications. The question really is whether we're choosing the right >>>>>>>> trade-off in making things cleaner at that end of the processing >>>>>>>> chain. I would suppose that a dREL or other application >>>>>>>>could ingest a >>>>>>>> data name with dangerous characters, convert it internally into a >>>>>>>> "safe" identifier that's used for all processing, and then >>>>>>>>restore the >>>>>>>> original form upon output; but writing that intermediate layer of >>>>>>>> processing is of course expensive (especially if there aren't readily >>>>>>>> available libraries that will do this transparently). >>>>>>>> >>>>>>>> I suspect that some of the original proposed syntactic changes also >>>>>>>> had the effect (whether by design or collaterally) of >>>>>>>>simplifying i/o, >>>>>>>> data structure management, symbol table processing etc., >>>>>>>>but those may >>>>>>>> have suffered in the subsequent revision exercise we've just been >>>>>>>> practising. Given the consensus we are now approaching, >>>>>>>>would the code >>>>>>>> builders now be prepared to incur the addition expense of handling >>>>>>>> "dangerous" data names? >>>>>>>> >>>>>>>> I really don't want to spark off a long discussion on this - if a >>>>>>>> quick round of response shows that there's no appetite to allow >>>>>>>> the additional punctuation characters in data names, I'll accept that >>>>>>>> gracefully. >>>>>>>> >>>>>>>> *** >>>>>>>> >>>>>>>> One last comment while I have the floor, though it is related in part >>>>>>>> to the above question. A concern raised in the editorial office was >>>>>>>> that there would be circumstances where users didn't know >>>>>>>>if they were >>>>>>>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting >>>>>>>> to the vi editor - and we're imagining most of them are dealing with >>>>>>>> small-molecule/inorganic CIFs). My supposition is that the IUCr >>>>>>>> editorial offices would only want to use CIF2 seriously in >>>>>>>>association >>>>>>>> with DDLm dictionaries, and that we would expect the revised core > >>>>>>> dictionaries to use the dot component in data names to signal this >>>>>>>> further evolution. So even a superficial glimpse of the middle of a >>>>>>>> CIF would make it clear whether it was CIF1 or CIF2. >>>>>>>> >>>>>>>> Does that fit in with how others see this progressing? >>>>>>>> >>>>>>>> Cheers >>>>>>>> Brian >_______________________________________________ >ddlm-group mailing list >ddlm-group@iucr.org >http://scripts.iucr.org/mailman/listinfo/ddlm-group -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Data-name character restrictions - one last time (Brian McMahon)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)
- Re: [ddlm-group] Data-name character restrictions - one last time (John Westbrook)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (David Brown)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)
- Prev by Date: Re: [ddlm-group] Revised version of syntax change summary document
- Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Index(es):