[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ddlm-group] Data-name character restrictions - one last time

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Data-name character restrictions - one last time
From: Joe Krahn <[email protected]>
Date: Fri, 11 Dec 2009 14:53:33 -0500
In-Reply-To: <C7485B9C.128AD%[email protected]>
References: <C7485B9C.128AD%[email protected]>
Instead of a start index, it would be possible to define an offset, 
if/when a different base index is really needed. It would be more clear 
in expressions that a non-standard base is used, because it has to be 
written explicitly, such as "_array[index+offset]". The other advantage 
is it works just as well with a negative starting index, if that is ever 
needed.

Joe


Nick Spadaccini wrote:
> I don't think a mechanism for specifying a starting index will work at the
> individual definition level. They will all have to start at the same address
> otherwise if I try to access within dREL some other object, how do I know
> what its starting index is?
> 
> Best to decide on a starting index and fix it. There is an historical
> precedent in CIF that has it staring at 1. As wrong as I would argue that
> is, it is in stone so stick with it.
> 
> In my code I will simply offset the index by -1 to get to the real storage
> point (I don't program in languages that index starting at 1) - it I easy
> enough to do.
> 
> Seems a solution to me.
> 
> The _xxx_yyy[] syntax is an ancient category like definition that never
> appears in data.
> 
> 
> On 11/12/09 7:37 PM, "Herbert J. Bernstein" <[email protected]>
> wrote:
> 
>> I am saying that any declaration of an array or of a list would
>> make its individual elements available for use in any data CIF without
>> the need for any further declarations in the dictionary.  This is
>> simple and clear and completely consistent with dREL.  The only really
>> new thing would be some mechanism(s) to specify the starting index.
>>
>> I think this covers John's need.  The only thing it would not cover
>> is something like _xxx_yyy[] which appears in some CIF1 dictionaries
>> but not in the data files, so I don't think there should be an
>> issue with not allowing those in CIF2.
>>
>> Does anyone see a problem with this?
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   [email protected]
>> =====================================================
>>
>> On Fri, 11 Dec 2009, Nick Spadaccini wrote:
>>
>>> I can agree with that, if you are saying only the matrix object is available
>>> to the user.
>>>
>>> OR alternatively are you saying there will ONLY be one object defined in the
>>> dictionary, let's say the 3x3 matrix
>>>
>>> _atom_site.U
>>>
>>> But NEVER have definitions in the dictionary for the individual
>>> _atom_site.U[i][j] elements.
>>>
>>> As we parse a CIF data file, if we detect _atom_site.U[i][j], it isn't in
>>> the defined dictionary so this would normally raise an error. BUT because of
>>> the specific trailing syntax [i][j] this informs the parser there must be an
>>> object of matching rank with the name _atom_site.U (ie the
>>> _atom_site.U[i][j] with the [i][j] truncated) in the dictionary - and
>>> therefore populate the appropriate element of _atom_site.U with that value.
>>>
>>> This would circumvent the problem of two different identifiers called
>>> _atom_site.U[i][j] in the dictionary BUT would necessarily mean that [i][j]
>>> syntax in a data name was reserved for objects that are defined in the
>>> dictionary as, in this case, a 2D matrix. They can't (shouldn't?) be used
>>> for general data names.
>>>
>>> Does this cover what John wanted also?
>>>
>>>
>>> On 11/12/09 10:12 AM, "Herbert J. Bernstein" <[email protected]>
>>> wrote:
>>>
>>>> Actually, the suggestion comes from reading the dREL documentation and the
>>>> DDLm documentation and noticing how clumsy the access to array elements in
>>>> DDLm is compared to the access in dREL.  What I am suggesting is to
>>>> promote the dREL access making it fully available at the DDLm level,
>>>> replacing the clumsy element-by-element definitions with one automatic
>>>> definition that looks and works just the way one might expect.
>>>>
>>>> Regards,
>>>>    Herbert
>>>>
>>>> =====================================================
>>>>   Herbert J. Bernstein, Professor of Computer Science
>>>>     Dowling College, Kramer Science Center, KSC 121
>>>>          Idle Hour Blvd, Oakdale, NY, 11769
>>>>
>>>>                   +1-631-244-3035
>>>>                   [email protected]
>>>> =====================================================
>>>>
>>>> On Fri, 11 Dec 2009, Nick Spadaccini wrote:
>>>>
>>>>> Many of you need to read the dREL part of the dictionary much more closely.
>>>>>
>>>>> dREL extensively exploits access to  matrix and vector types by index
>>>>> addressing at a programmatic level. That's how it gets done the things it
>>>>> is
>>>>> has to. So within the dREL programming language you will see littered
>>>>> everywhere a matrix which is accessed via standard indexing (as you would
>>>>> with any language supporting array structures).
>>>>>
>>>>> So lets have a matrix _atom_site.U - within dREL I have access to
>>>>> _atom_site.U[0][0] etc as part of the language (I'll stick with 0 initial
>>>>> indexing but this really is a trivial problem, solved many times over).
>>>>>
>>>>> But now you ALSO want a scalar data item called _atom_site.U[0][0] with in
>>>>> CIF. The dictionary says _atom_site.U[0][0] is a single scalar value.
>>>>>
>>>>> The dREL constructor method for _atom_site.U has
>>>>>
>>>>> _atom_site.U = Matrix([[atom_site.U[0][0] ...]...])
>>>>>
>>>>> This obviously won't work. This is why the dictionary in DDLm uses the
>>>>> equivalent of _atom_site.U_0_0 for the scalar value so that the above
>>>>> constructor will make sense and still allows me to access
>>>>> _atom_site.U[0][0]
>>>>> from within dREL. It is why I am keen to restrict the syntax of the data
>>>>> names.
>>>>>
>>>>>
>>>>> On 11/12/09 2:46 AM, "Herbert J. Bernstein" <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Dear Colleagues,
>>>>>>
>>>>>>    One very neat resolution to this problem would be to allow a
>>>>>> list or array-typed CIF2 tag to be referenced in a data file either
>>>>>> as a whole or element by element.
>>>>>>
>>>>>>    Thus
>>>>>>
>>>>>>    _a.vec
>>>>>>
>>>>>> being defined as an array or list in CIF2 would automatically make
>>>>>> the tags
>>>>>>
>>>>>>    _a.vec[1]
>>>>>>    _a.vec[2]
>>>>>> ...
>>>>>>
>>>>>> defined CIF2 tags.  If the array or list were nested, the
>>>>>>
>>>>>>    _a.vec[1][1]
>>>>>>    _a.vec[1][2]
>>>>>>
>>>>>> etc. would be valid tags
>>>>>>
>>>>>>    I would propose that this be general and automatic, applying to
>>>>>> all tags defined as list or arrays.  In view of past practice in
>>>>>> CIF1, there is a slight conflict with respect to the default starting
>>>>>> index in dREL versus the common CIF1 practice in indexing arrays
>>>>>> from 0, but that can (and should be solved) with explicit specification
>>>>>> of a starting index, so we can carry over the tag name usage from
>>>>>> CIF1 without confusing people with an index shift.  So, if _a.vec
>>>>>> were an array of dimension 5, starting from index 0, _a.vec[0]
>>>>>> through _a.vec[4] would be valid, but if the starting index were
>>>>>> specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching
>>>>>> CIF1 conventions.
>>>>>>
>>>>>>    The aliasing mechanism might have to be extended or clarified to
>>>>>> handle the mapping against CIF1 tags in bulk for _a.vec as a whole,
>>>>>> but, to me, this has a very intuitive feel.
>>>>>>
>>>>>>    Regards,
>>>>>>      Herbert
>>>>>>
>>>>>>
>>>>>> At 3:29 PM -0500 12/9/09, John Westbrook wrote:
>>>>>>> Hi all -
>>>>>>>
>>>>>>> On the issue of reserved characters in mmCIF/PDBx data items, these
>>>>>>> generally have been inherited from the style of items from the core.  The
>>>>>>> majority of items in this class are data items related to short
>>>>>>> matrices/tensors
>>>>>>> and vectors (e.g. items including []).    Virtually all have a syntax
>>>>>>> which
>>>>>>> could reasonably be interpreted as a programmatic reference.  For
>>>>>>> instance,
>>>>>>>
>>>>>>>
>>>>>>> _atom_sites.fract_transf_matrix[1][1]   0.007738
>>>>>>> _atom_sites.fract_transf_matrix[1][2]   0.000000
>>>>>>> _atom_sites.fract_transf_matrix[1][3]   0.004298
>>>>>>> _atom_sites.fract_transf_matrix[2][1]   0.000000
>>>>>>> _atom_sites.fract_transf_matrix[2][2]   0.016545
>>>>>>> _atom_sites.fract_transf_matrix[2][3]   0.000000
>>>>>>> _atom_sites.fract_transf_matrix[3][1]   0.000000
>>>>>>> _atom_sites.fract_transf_matrix[3][2]   0.000000
>>>>>>> _atom_sites.fract_transf_matrix[3][3]   0.020200
>>>>>>> _atom_sites.fract_transf_vector[1]      0.00000
>>>>>>> _atom_sites.fract_transf_vector[2]      0.00000
>>>>>>> _atom_sites.fract_transf_vector[3]      0.00000
>>>>>>>
>>>>>>> Are we close to being able to treat these as legal in the context of
>>>>>>> CIF2/DDL+?
>>>>>>> I suppose I am asking what will constitute a legal assignment for an
>>>>>>> element
>>>>>>> of a matrix/array -
>>>>>>>
>>>>>>> Only this -
>>>>>>>
>>>>>>> _a.vec [1,2,3]
>>>>>>>
>>>>>>> or also expanded assignment by element such as -
>>>>>>>
>>>>>>> _a.vec[1]  1
>>>>>>> _a.vec[2]  2
>>>>>>> _a.vec[3]  3
>>>>>>>
>>>>>>> If the latter is to be considered, then this will solve most of the data
>>>>>>> name
>>>>>>> issues for our data.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>> Joe Krahn wrote:
>>>>>>>>  In practice, CIF2 parsers should allow CIF1 data names within a CIF2
>>>>>>>>  formatted file. The question is whether these files should be allowed
>>>>>>>> as
>>>>>>>>  valid CIF2, or just for convenience as a non-standard CIF2.
>>>>>>>>
>>>>>>>>  When CIF files are used as working data files, the restrictions should
>>>>>>>>  be relaxed. For long-term archival files, it makes sense to be more
>>>>>>>>  restrictive. I would just make the CIF1 names inaccessible to dREL.
>>>>>>>>  Alternatively, an implementation could allow CIF1 names only on
>>>>>>>> reading,
>>>>>>>>  and require dictionary alias mappings to CIF2 names.
>>>>>>>>
>>>>>>>>  One argument in favor of allowing them would be that someone wants to
>>>>>>>>  convert all data files to CIF2 format, but they want to preserve the
>>>>>>>>  original data as-is, without alias mapping.
>>>>>>>>
>>>>>>>>  I think that the current CIF2 syntax makes it possible to use CIF1
>>>>>>>> names
>>>>>>>>  without any ambiguities. The question is whether they should be
>>>>>>>>  considered valid CIF2, or just a non-standard version that will be
>>>>>>>>  useful for the transitional period.
>>>>>>>>
>>>>>>>>  Joe
>>>>>>>>
>>>>>>>>
>>>>>>>>  Herbert J. Bernstein wrote:
>>>>>>>>>  Personally, I would greatly prefer to allow all data names that do not
>>>>>>>>>  create a major lexer/parser conflict to appear in a data CIF and
>>>>>>>>>  only apply the strong restrictions to data names that appear in CIF2
>>>>>>>>>  dictionaries as defined data names (not as aliases).  -- Herbert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  At 2:40 PM +0000 12/9/09, Brian McMahon wrote:
>>>>>>>>>>  I have one remaining niggle that I'd like to revisit before we put
>>>>>>>>>>  this finally to bed. As has been mentioned a couple of times
>>>>>>>>>>  recently, restricting the data-name character set does invalidate
>>>>>>>>>>  syntactically many existing CIF 1 files (e.g.
>>>>>>>>>> _refine_ls_shift/esd_max
>>>>>>>>>> ).
>>>>>>>>>>  We have discussed strategies for handling this, and I think these
>>>>>>>>>>  are workable strategies, but will involve investment and hence
>>>>>>>>>> expense
>>>>>>>>>>  in workflow management in CIF archives.
>>>>>>>>>>
>>>>>>>>>>  I understand the rationale behind this restriction is to simplify
>>>>>>>>>>  future processing of data names in areas such as dREL
>>>>>>>>>>  applications. The question really is whether we're choosing the right
>>>>>>>>>>  trade-off in making things cleaner at that end of the processing
>>>>>>>>>>  chain. I would suppose that a dREL or other application could ingest
>>>>>>>>>> a
>>>>>>>>>>  data name with dangerous characters, convert it internally into a
>>>>>>>>>>  "safe" identifier that's used for all processing, and then restore
>>>>>>>>>> the
>>>>>>>>>>  original form upon output; but writing that intermediate layer of
>>>>>>>>>>  processing is of course expensive (especially if there aren't readily
>>>>>>>>>>  available libraries that will do this transparently).
>>>>>>>>>>
>>>>>>>>>>  I suspect that some of the original proposed syntactic changes also
>>>>>>>>>>  had the effect (whether by design or collaterally) of simplifying
>>>>>>>>>> i/o,
>>>>>>>>>>  data structure management, symbol table processing etc., but those
>>>>>>>>>> may
>>>>>>>>>>  have suffered in the subsequent revision exercise we've just been
>>>>>>>>>>  practising. Given the consensus we are now approaching, would the
>>>>>>>>>> code
>>>>>>>>>>  builders now be prepared to incur the addition expense of handling
>>>>>>>>>>  "dangerous" data names?
>>>>>>>>>>
>>>>>>>>>>  I really don't want to spark off a long discussion on this - if a
>>>>>>>>>>  quick round of response shows that there's no appetite to allow
>>>>>>>>>>  the additional punctuation characters in data names, I'll accept that
>>>>>>>>>>  gracefully.
>>>>>>>>>>
>>>>>>>>>>  ***
>>>>>>>>>>
>>>>>>>>>>  One last comment while I have the floor, though it is related in part
>>>>>>>>>>  to the above question. A concern raised in the editorial office was
>>>>>>>>>>  that there would be circumstances where users didn't know if they
>>>>>>>>>> were
>>>>>>>>>>  dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting
>>>>>>>>>>  to the vi editor - and we're imagining most of them are dealing with
>>>>>>>>>>  small-molecule/inorganic CIFs). My supposition is that the IUCr
>>>>>>>>>>  editorial offices would only want to use CIF2 seriously in
>>>>>>>>>> association
>>>>>>>>>>  with DDLm dictionaries, and that we would expect the revised core
>>>>>>>>>>  dictionaries to use the dot component in data names to signal this
>>>>>>>>>>  further evolution. So even a superficial glimpse of the middle of a
>>>>>>>>>>  CIF would make it clear whether it was CIF1 or CIF2.
>>>>>>>>>>
>>>>>>>>>>  Does that fit in with how others see this progressing?
>>>>>>>>>>
>>>>>>>>>>  Cheers
>>>>>>>>>>  Brian
>>>>>>>>>>  _______________________________________________
>>>>>>>>>>  ddlm-group mailing list
>>>>>>>>>>  [email protected]
>>>>>>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>>  _______________________________________________
>>>>>>>>  ddlm-group mailing list
>>>>>>>>  [email protected]
>>>>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>> --
>>>>>>> ******************************************************************
>>>>>>>    John Westbrook, Ph.D.
>>>>>>>    Rutgers, The State University of New Jersey
>>>>>>>    Department of Chemistry and Chemical Biology
>>>>>>>    610 Taylor Road
>>>>>>>    Piscataway, NJ 08854-8087
>>>>>>>    e-mail: [email protected]
>>>>>>>    Ph:  (732) 445-4290  Fax: (732) 445-4320
>>>>>>> ******************************************************************
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ddlm-group mailing list
>>>>>>> [email protected]
>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>> cheers
>>>>>
>>>>> Nick
>>>>>
>>>>> --------------------------------
>>>>> Associate Professor N. Spadaccini, PhD
>>>>> School of Computer Science & Software Engineering
>>>>>
>>>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>>>>> MBDP  M002
>>>>>
>>>>> CRICOS Provider Code: 00126G
>>>>>
>>>>> e: [email protected]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> [email protected]
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>
>>> cheers
>>>
>>> Nick
>>>
>>> --------------------------------
>>> Associate Professor N. Spadaccini, PhD
>>> School of Computer Science & Software Engineering
>>>
>>> The University of Western Australia    t: +61 (0)8 6488 3452
>>> 35 Stirling Highway                    f: +61 (0)8 6488 1089
>>> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
>>> MBDP  M002
>>>
>>> CRICOS Provider Code: 00126G
>>>
>>> e: [email protected]
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> [email protected]
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
> 
> cheers
> 
> Nick
> 
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
> 
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
> 
> CRICOS Provider Code: 00126G
> 
> e: [email protected]
> 
> 
> 
> 
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]

References:

Re: [ddlm-group] Data-name character restrictions - one last time (Nick Spadaccini)

Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Index(es):

Date

Thread
Discussion List Archives

Re: [ddlm-group] Data-name character restrictions - one last time