[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Data-name character restrictions - one last time

To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Data-name character restrictions - one last time
From: David Brown <idbrown@McMaster.ca>
Date: Thu, 10 Dec 2009 14:14:27 -0500
In-Reply-To: <a06240800c746ed907fcc@[192.168.2.104]>
References: <20091209144035.GB29341@emerald.iucr.org> <a06240801c74578ec8b59@[192.168.2.104]> <4B1FF3BB.8010601@niehs.nih.gov> <4B2008B3.6090008@pdb-mail.rutgers.edu><a06240800c746ed907fcc@[192.168.2.104]>

Title:

I was not aware that there was a default indexing of arrays. The only place where this arises in DDL1 is in the list of symmetry opertations where we originally failed to define a key for the symop loop. But there, as far as I am aware, the assumed indexing always starts at 1 for the first item. This is strictly a fix since CIF1 specificly states that the order within a loop has no significance. Later additions to the dictionary have corrected this oversight by adding an explicit key, but it is not yet often used. Otherwise, in DDL1 (and DDL2?) the elements of an array have explicit data names that start at 1, not 0. The assumption that arrays are numbered from zero must be an imgCIF convention. It would always be better to include explicit indexing to avoid these problems.The DDLm dictionaries have methods for constructing arrays from their elements, and methods for the reverse process could be added. In this case it would not be necessary to decompose (or assemble) an array on first resding as the necessary action would be taken as soon as the array or its elements are invoked by a method or by a list of items to output. This raises another concern. Herbert, if you are writing DDLm dictionaries for imgCIF and I am writing them for coreCIF, we need to keep in contact to make sure we are not introducing conficting conventions. David Herbert J. Bernstein wrote: Dear Colleagues, One very neat resolution to this problem would be to allow a list or array-typed CIF2 tag to be referenced in a data file either as a whole or element by element. Thus _a.vec being defined as an array or list in CIF2 would automatically make the tags _a.vec[1] _a.vec[2] ... defined CIF2 tags. If the array or list were nested, the _a.vec[1][1] _a.vec[1][2] etc. would be valid tags I would propose that this be general and automatic, applying to all tags defined as list or arrays. In view of past practice in CIF1, there is a slight conflict with respect to the default starting index in dREL versus the common CIF1 practice in indexing arrays from 0, but that can (and should be solved) with explicit specification of a starting index, so we can carry over the tag name usage from CIF1 without confusing people with an index shift. So, if _a.vec were an array of dimension 5, starting from index 0, _a.vec[0] through _a.vec[4] would be valid, but if the starting index were specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching CIF1 conventions. The aliasing mechanism might have to be extended or clarified to handle the mapping against CIF1 tags in bulk for _a.vec as a whole, but, to me, this has a very intuitive feel. Regards, Herbert At 3:29 PM -0500 12/9/09, John Westbrook wrote: Hi all - On the issue of reserved characters in mmCIF/PDBx data items, these generally have been inherited from the style of items from the core. The majority of items in this class are data items related to short matrices/tensors and vectors (e.g. items including []). Virtually all have a syntax which could reasonably be interpreted as a programmatic reference. For instance, _atom_sites.fract_transf_matrix[1][1] 0.007738 _atom_sites.fract_transf_matrix[1][2] 0.000000 _atom_sites.fract_transf_matrix[1][3] 0.004298 _atom_sites.fract_transf_matrix[2][1] 0.000000 _atom_sites.fract_transf_matrix[2][2] 0.016545 _atom_sites.fract_transf_matrix[2][3] 0.000000 _atom_sites.fract_transf_matrix[3][1] 0.000000 _atom_sites.fract_transf_matrix[3][2] 0.000000 _atom_sites.fract_transf_matrix[3][3] 0.020200 _atom_sites.fract_transf_vector[1] 0.00000 _atom_sites.fract_transf_vector[2] 0.00000 _atom_sites.fract_transf_vector[3] 0.00000 Are we close to being able to treat these as legal in the context of CIF2/DDL+? I suppose I am asking what will constitute a legal assignment for an element of a matrix/array - Only this - _a.vec [1,2,3] or also expanded assignment by element such as - _a.vec[1] 1 _a.vec[2] 2 _a.vec[3] 3 If the latter is to be considered, then this will solve most of the data name issues for our data. Regards, John Joe Krahn wrote: In practice, CIF2 parsers should allow CIF1 data names within a CIF2 formatted file. The question is whether these files should be allowed as valid CIF2, or just for convenience as a non-standard CIF2. When CIF files are used as working data files, the restrictions should be relaxed. For long-term archival files, it makes sense to be more restrictive. I would just make the CIF1 names inaccessible to dREL. Alternatively, an implementation could allow CIF1 names only on reading, and require dictionary alias mappings to CIF2 names. One argument in favor of allowing them would be that someone wants to convert all data files to CIF2 format, but they want to preserve the original data as-is, without alias mapping. I think that the current CIF2 syntax makes it possible to use CIF1 names without any ambiguities. The question is whether they should be considered valid CIF2, or just a non-standard version that will be useful for the transitional period. > Joe Herbert J. Bernstein wrote: Personally, I would greatly prefer to allow all data names that do not create a major lexer/parser conflict to appear in a data CIF and only apply the strong restrictions to data names that appear in CIF2 dictionaries as defined data names (not as aliases). -- Herbert At 2:40 PM +0000 12/9/09, Brian McMahon wrote: I have one remaining niggle that I'd like to revisit before we put this finally to bed. As has been mentioned a couple of times recently, restricting the data-name character set does invalidate syntactically many existing CIF 1 files (e.g. _refine_ls_shift/esd_max ). We have discussed strategies for handling this, and I think these are workable strategies, but will involve investment and hence expense in workflow management in CIF archives. I understand the rationale behind this restriction is to simplify future processing of data names in areas such as dREL applications. The question really is whether we're choosing the right trade-off in making things cleaner at that end of the processing chain. I would suppose that a dREL or other application could ingest a data name with dangerous characters, convert it internally into a "safe" identifier that's used for all processing, and then restore the original form upon output; but writing that intermediate layer of processing is of course expensive (especially if there aren't readily available libraries that will do this transparently). I suspect that some of the original proposed syntactic changes also had the effect (whether by design or collaterally) of simplifying i/o, data structure management, symbol table processing etc., but those may have suffered in the subsequent revision exercise we've just been practising. Given the consensus we are now approaching, would the code builders now be prepared to incur the addition expense of handling "dangerous" data names? I really don't want to spark off a long discussion on this - if a quick round of response shows that there's no appetite to allow the additional punctuation characters in data names, I'll accept that gracefully. *** One last comment while I have the floor, though it is related in part to the above question. A concern raised in the editorial office was that there would be circumstances where users didn't know if they were dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting to the vi editor - and we're imagining most of them are dealing with small-molecule/inorganic CIFs). My supposition is that the IUCr editorial offices would only want to use CIF2 seriously in association with DDLm dictionaries, and that we would expect the revised core dictionaries to use the dot component in data names to signal this further evolution. So even a superficial glimpse of the middle of a CIF would make it clear whether it was CIF1 or CIF2. Does that fit in with how others see this progressing? Cheers Brian _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group -- ****************************************************************** John Westbrook, Ph.D. Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 e-mail: jwest@rcsb.rutgers.edu Ph: (732) 445-4290 Fax: (732) 445-4320 ****************************************************************** _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group

begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

References:

[ddlm-group] Data-name character restrictions - one last time (Brian McMahon)

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)

Re: [ddlm-group] Data-name character restrictions - one last time (John Westbrook)

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Data-name character restrictions - one last time