[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ddlm-group] Data-name character restrictions - one last time

To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Data-name character restrictions - one last time
From: Nick Spadaccini <nick@csse.uwa.edu.au>
Date: Fri, 11 Dec 2009 20:27:08 +0800
Authentication-Results: postfix;
In-Reply-To: <alpine.BSF.2.00.0912110631210.4757@epsilon.pair.com>
I don't think a mechanism for specifying a starting index will work at the
individual definition level. They will all have to start at the same address
otherwise if I try to access within dREL some other object, how do I know
what its starting index is?

Best to decide on a starting index and fix it. There is an historical
precedent in CIF that has it staring at 1. As wrong as I would argue that
is, it is in stone so stick with it.

In my code I will simply offset the index by -1 to get to the real storage
point (I don't program in languages that index starting at 1) - it I easy
enough to do.

Seems a solution to me.

The _xxx_yyy[] syntax is an ancient category like definition that never
appears in data.


On 11/12/09 7:37 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
wrote:

> I am saying that any declaration of an array or of a list would
> make its individual elements available for use in any data CIF without
> the need for any further declarations in the dictionary.  This is
> simple and clear and completely consistent with dREL.  The only really
> new thing would be some mechanism(s) to specify the starting index.
> 
> I think this covers John's need.  The only thing it would not cover
> is something like _xxx_yyy[] which appears in some CIF1 dictionaries
> but not in the data files, so I don't think there should be an
> issue with not allowing those in CIF2.
> 
> Does anyone see a problem with this?
> =====================================================
>   Herbert J. Bernstein, Professor of Computer Science
>     Dowling College, Kramer Science Center, KSC 121
>          Idle Hour Blvd, Oakdale, NY, 11769
> 
>                   +1-631-244-3035
>                   yaya@dowling.edu
> =====================================================
> 
> On Fri, 11 Dec 2009, Nick Spadaccini wrote:
> 
>> I can agree with that, if you are saying only the matrix object is available
>> to the user.
>> 
>> OR alternatively are you saying there will ONLY be one object defined in the
>> dictionary, let's say the 3x3 matrix
>> 
>> _atom_site.U
>> 
>> But NEVER have definitions in the dictionary for the individual
>> _atom_site.U[i][j] elements.
>> 
>> As we parse a CIF data file, if we detect _atom_site.U[i][j], it isn't in
>> the defined dictionary so this would normally raise an error. BUT because of
>> the specific trailing syntax [i][j] this informs the parser there must be an
>> object of matching rank with the name _atom_site.U (ie the
>> _atom_site.U[i][j] with the [i][j] truncated) in the dictionary - and
>> therefore populate the appropriate element of _atom_site.U with that value.
>> 
>> This would circumvent the problem of two different identifiers called
>> _atom_site.U[i][j] in the dictionary BUT would necessarily mean that [i][j]
>> syntax in a data name was reserved for objects that are defined in the
>> dictionary as, in this case, a 2D matrix. They can't (shouldn't?) be used
>> for general data names.
>> 
>> Does this cover what John wanted also?
>> 
>> 
>> On 11/12/09 10:12 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
>> wrote:
>> 
>>> Actually, the suggestion comes from reading the dREL documentation and the
>>> DDLm documentation and noticing how clumsy the access to array elements in
>>> DDLm is compared to the access in dREL.  What I am suggesting is to
>>> promote the dREL access making it fully available at the DDLm level,
>>> replacing the clumsy element-by-element definitions with one automatic
>>> definition that looks and works just the way one might expect.
>>> 
>>> Regards,
>>>    Herbert
>>> 
>>> =====================================================
>>>   Herbert J. Bernstein, Professor of Computer Science
>>>     Dowling College, Kramer Science Center, KSC 121
>>>          Idle Hour Blvd, Oakdale, NY, 11769
>>> 
>>>                   +1-631-244-3035
>>>                   yaya@dowling.edu
>>> =====================================================
>>> 
>>> On Fri, 11 Dec 2009, Nick Spadaccini wrote:
>>> 
>>>> Many of you need to read the dREL part of the dictionary much more closely.
>>>> 
>>>> dREL extensively exploits access to  matrix and vector types by index
>>>> addressing at a programmatic level. That's how it gets done the things it
>>>> is
>>>> has to. So within the dREL programming language you will see littered
>>>> everywhere a matrix which is accessed via standard indexing (as you would
>>>> with any language supporting array structures).
>>>> 
>>>> So lets have a matrix _atom_site.U - within dREL I have access to
>>>> _atom_site.U[0][0] etc as part of the language (I'll stick with 0 initial
>>>> indexing but this really is a trivial problem, solved many times over).
>>>> 
>>>> But now you ALSO want a scalar data item called _atom_site.U[0][0] with in
>>>> CIF. The dictionary says _atom_site.U[0][0] is a single scalar value.
>>>> 
>>>> The dREL constructor method for _atom_site.U has
>>>> 
>>>> _atom_site.U = Matrix([[atom_site.U[0][0] ...]...])
>>>> 
>>>> This obviously won't work. This is why the dictionary in DDLm uses the
>>>> equivalent of _atom_site.U_0_0 for the scalar value so that the above
>>>> constructor will make sense and still allows me to access
>>>> _atom_site.U[0][0]
>>>> from within dREL. It is why I am keen to restrict the syntax of the data
>>>> names.
>>>> 
>>>> 
>>>> On 11/12/09 2:46 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
>>>> wrote:
>>>> 
>>>>> Dear Colleagues,
>>>>> 
>>>>>    One very neat resolution to this problem would be to allow a
>>>>> list or array-typed CIF2 tag to be referenced in a data file either
>>>>> as a whole or element by element.
>>>>> 
>>>>>    Thus
>>>>> 
>>>>>    _a.vec
>>>>> 
>>>>> being defined as an array or list in CIF2 would automatically make
>>>>> the tags
>>>>> 
>>>>>    _a.vec[1]
>>>>>    _a.vec[2]
>>>>> ...
>>>>> 
>>>>> defined CIF2 tags.  If the array or list were nested, the
>>>>> 
>>>>>    _a.vec[1][1]
>>>>>    _a.vec[1][2]
>>>>> 
>>>>> etc. would be valid tags
>>>>> 
>>>>>    I would propose that this be general and automatic, applying to
>>>>> all tags defined as list or arrays.  In view of past practice in
>>>>> CIF1, there is a slight conflict with respect to the default starting
>>>>> index in dREL versus the common CIF1 practice in indexing arrays
>>>>> from 0, but that can (and should be solved) with explicit specification
>>>>> of a starting index, so we can carry over the tag name usage from
>>>>> CIF1 without confusing people with an index shift.  So, if _a.vec
>>>>> were an array of dimension 5, starting from index 0, _a.vec[0]
>>>>> through _a.vec[4] would be valid, but if the starting index were
>>>>> specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching
>>>>> CIF1 conventions.
>>>>> 
>>>>>    The aliasing mechanism might have to be extended or clarified to
>>>>> handle the mapping against CIF1 tags in bulk for _a.vec as a whole,
>>>>> but, to me, this has a very intuitive feel.
>>>>> 
>>>>>    Regards,
>>>>>      Herbert
>>>>> 
>>>>> 
>>>>> At 3:29 PM -0500 12/9/09, John Westbrook wrote:
>>>>>> Hi all -
>>>>>> 
>>>>>> On the issue of reserved characters in mmCIF/PDBx data items, these
>>>>>> generally have been inherited from the style of items from the core.  The
>>>>>> majority of items in this class are data items related to short
>>>>>> matrices/tensors
>>>>>> and vectors (e.g. items including []).    Virtually all have a syntax
>>>>>> which
>>>>>> could reasonably be interpreted as a programmatic reference.  For
>>>>>> instance,
>>>>>> 
>>>>>> 
>>>>>> _atom_sites.fract_transf_matrix[1][1]   0.007738
>>>>>> _atom_sites.fract_transf_matrix[1][2]   0.000000
>>>>>> _atom_sites.fract_transf_matrix[1][3]   0.004298
>>>>>> _atom_sites.fract_transf_matrix[2][1]   0.000000
>>>>>> _atom_sites.fract_transf_matrix[2][2]   0.016545
>>>>>> _atom_sites.fract_transf_matrix[2][3]   0.000000
>>>>>> _atom_sites.fract_transf_matrix[3][1]   0.000000
>>>>>> _atom_sites.fract_transf_matrix[3][2]   0.000000
>>>>>> _atom_sites.fract_transf_matrix[3][3]   0.020200
>>>>>> _atom_sites.fract_transf_vector[1]      0.00000
>>>>>> _atom_sites.fract_transf_vector[2]      0.00000
>>>>>> _atom_sites.fract_transf_vector[3]      0.00000
>>>>>> 
>>>>>> Are we close to being able to treat these as legal in the context of
>>>>>> CIF2/DDL+?
>>>>>> I suppose I am asking what will constitute a legal assignment for an
>>>>>> element
>>>>>> of a matrix/array -
>>>>>> 
>>>>>> Only this -
>>>>>> 
>>>>>> _a.vec [1,2,3]
>>>>>> 
>>>>>> or also expanded assignment by element such as -
>>>>>> 
>>>>>> _a.vec[1]  1
>>>>>> _a.vec[2]  2
>>>>>> _a.vec[3]  3
>>>>>> 
>>>>>> If the latter is to be considered, then this will solve most of the data
>>>>>> name
>>>>>> issues for our data.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> John
>>>>>> 
>>>>>> Joe Krahn wrote:
>>>>>>>  In practice, CIF2 parsers should allow CIF1 data names within a CIF2
>>>>>>>  formatted file. The question is whether these files should be allowed
>>>>>>> as
>>>>>>>  valid CIF2, or just for convenience as a non-standard CIF2.
>>>>>>> 
>>>>>>>  When CIF files are used as working data files, the restrictions should
>>>>>>>  be relaxed. For long-term archival files, it makes sense to be more
>>>>>>>  restrictive. I would just make the CIF1 names inaccessible to dREL.
>>>>>>>  Alternatively, an implementation could allow CIF1 names only on
>>>>>>> reading,
>>>>>>>  and require dictionary alias mappings to CIF2 names.
>>>>>>> 
>>>>>>>  One argument in favor of allowing them would be that someone wants to
>>>>>>>  convert all data files to CIF2 format, but they want to preserve the
>>>>>>>  original data as-is, without alias mapping.
>>>>>>> 
>>>>>>>  I think that the current CIF2 syntax makes it possible to use CIF1
>>>>>>> names
>>>>>>>  without any ambiguities. The question is whether they should be
>>>>>>>  considered valid CIF2, or just a non-standard version that will be
>>>>>>>  useful for the transitional period.
>>>>>>> 
>>>>>>>  Joe
>>>>>>> 
>>>>>>> 
>>>>>>>  Herbert J. Bernstein wrote:
>>>>>>>>  Personally, I would greatly prefer to allow all data names that do not
>>>>>>>>  create a major lexer/parser conflict to appear in a data CIF and
>>>>>>>>  only apply the strong restrictions to data names that appear in CIF2
>>>>>>>>  dictionaries as defined data names (not as aliases).  -- Herbert
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  At 2:40 PM +0000 12/9/09, Brian McMahon wrote:
>>>>>>>>>  I have one remaining niggle that I'd like to revisit before we put
>>>>>>>>>  this finally to bed. As has been mentioned a couple of times
>>>>>>>>>  recently, restricting the data-name character set does invalidate
>>>>>>>>>  syntactically many existing CIF 1 files (e.g.
>>>>>>>>> _refine_ls_shift/esd_max
>>>>>>>>> ).
>>>>>>>>>  We have discussed strategies for handling this, and I think these
>>>>>>>>>  are workable strategies, but will involve investment and hence
>>>>>>>>> expense
>>>>>>>>>  in workflow management in CIF archives.
>>>>>>>>> 
>>>>>>>>>  I understand the rationale behind this restriction is to simplify
>>>>>>>>>  future processing of data names in areas such as dREL
>>>>>>>>>  applications. The question really is whether we're choosing the right
>>>>>>>>>  trade-off in making things cleaner at that end of the processing
>>>>>>>>>  chain. I would suppose that a dREL or other application could ingest
>>>>>>>>> a
>>>>>>>>>  data name with dangerous characters, convert it internally into a
>>>>>>>>>  "safe" identifier that's used for all processing, and then restore
>>>>>>>>> the
>>>>>>>>>  original form upon output; but writing that intermediate layer of
>>>>>>>>>  processing is of course expensive (especially if there aren't readily
>>>>>>>>>  available libraries that will do this transparently).
>>>>>>>>> 
>>>>>>>>>  I suspect that some of the original proposed syntactic changes also
>>>>>>>>>  had the effect (whether by design or collaterally) of simplifying
>>>>>>>>> i/o,
>>>>>>>>>  data structure management, symbol table processing etc., but those
>>>>>>>>> may
>>>>>>>>>  have suffered in the subsequent revision exercise we've just been
>>>>>>>>>  practising. Given the consensus we are now approaching, would the
>>>>>>>>> code
>>>>>>>>>  builders now be prepared to incur the addition expense of handling
>>>>>>>>>  "dangerous" data names?
>>>>>>>>> 
>>>>>>>>>  I really don't want to spark off a long discussion on this - if a
>>>>>>>>>  quick round of response shows that there's no appetite to allow
>>>>>>>>>  the additional punctuation characters in data names, I'll accept that
>>>>>>>>>  gracefully.
>>>>>>>>> 
>>>>>>>>>  ***
>>>>>>>>> 
>>>>>>>>>  One last comment while I have the floor, though it is related in part
>>>>>>>>>  to the above question. A concern raised in the editorial office was
>>>>>>>>>  that there would be circumstances where users didn't know if they
>>>>>>>>> were
>>>>>>>>>  dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting
>>>>>>>>>  to the vi editor - and we're imagining most of them are dealing with
>>>>>>>>>  small-molecule/inorganic CIFs). My supposition is that the IUCr
>>>>>>>>>  editorial offices would only want to use CIF2 seriously in
>>>>>>>>> association
>>>>>>>>>  with DDLm dictionaries, and that we would expect the revised core
>>>>>>>>>  dictionaries to use the dot component in data names to signal this
>>>>>>>>>  further evolution. So even a superficial glimpse of the middle of a
>>>>>>>>>  CIF would make it clear whether it was CIF1 or CIF2.
>>>>>>>>> 
>>>>>>>>>  Does that fit in with how others see this progressing?
>>>>>>>>> 
>>>>>>>>>  Cheers
>>>>>>>>>  Brian
>>>>>>>>>  _______________________________________________
>>>>>>>>>  ddlm-group mailing list
>>>>>>>>>  ddlm-group@iucr.org
>>>>>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>> 
>>>>>>> 
>>>>>>>  _______________________________________________
>>>>>>>  ddlm-group mailing list
>>>>>>>  ddlm-group@iucr.org
>>>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>> 
>>>>>> --
>>>>>> ******************************************************************
>>>>>>    John Westbrook, Ph.D.
>>>>>>    Rutgers, The State University of New Jersey
>>>>>>    Department of Chemistry and Chemical Biology
>>>>>>    610 Taylor Road
>>>>>>    Piscataway, NJ 08854-8087
>>>>>>    e-mail: jwest@rcsb.rutgers.edu
>>>>>>    Ph:  (732) 445-4290  Fax: (732) 445-4320
>>>>>> ******************************************************************
>>>>>> 
>>>>>> _______________________________________________
>>>>>> ddlm-group mailing list
>>>>>> ddlm-group@iucr.org
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>> 
>>>> 
>>>> cheers
>>>> 
>>>> Nick
>>>> 
>>>> --------------------------------
>>>> Associate Professor N. Spadaccini, PhD
>>>> School of Computer Science & Software Engineering
>>>> 
>>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>>>> MBDP  M002
>>>> 
>>>> CRICOS Provider Code: 00126G
>>>> 
>>>> e: Nick.Spadaccini@uwa.edu.au
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>> 
>> 
>> cheers
>> 
>> Nick
>> 
>> --------------------------------
>> Associate Professor N. Spadaccini, PhD
>> School of Computer Science & Software Engineering
>> 
>> The University of Western Australia    t: +61 (0)8 6488 3452
>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>> MBDP  M002
>> 
>> CRICOS Provider Code: 00126G
>> 
>> e: Nick.Spadaccini@uwa.edu.au
>> 
>> 
>> 
>> 
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> 

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)

Re: [ddlm-group] Data-name character restrictions - one last time (David Brown)

References:

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Index(es):

Date

Thread
Discussion List Archives

Re: [ddlm-group] Data-name character restrictions - one last time