[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Data-name character restrictions - one last time

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Data-name character restrictions - one last time
From: Nick Spadaccini <[email protected]>
Date: Fri, 11 Dec 2009 09:08:21 +0800
Authentication-Results: postfix;
In-Reply-To: <a06240800c746ed907fcc@[192.168.2.104]>

Many of you need to read the dREL part of the dictionary much more closely.

dREL extensively exploits access to  matrix and vector types by index
addressing at a programmatic level. That's how it gets done the things it is
has to. So within the dREL programming language you will see littered
everywhere a matrix which is accessed via standard indexing (as you would
with any language supporting array structures).

So lets have a matrix _atom_site.U - within dREL I have access to
_atom_site.U[0][0] etc as part of the language (I'll stick with 0 initial
indexing but this really is a trivial problem, solved many times over).

But now you ALSO want a scalar data item called _atom_site.U[0][0] with in
CIF. The dictionary says _atom_site.U[0][0] is a single scalar value.

The dREL constructor method for _atom_site.U has

_atom_site.U = Matrix([[atom_site.U[0][0] ...]...])

This obviously won't work. This is why the dictionary in DDLm uses the
equivalent of _atom_site.U_0_0 for the scalar value so that the above
constructor will make sense and still allows me to access _atom_site.U[0][0]
from within dREL. It is why I am keen to restrict the syntax of the data
names.


On 11/12/09 2:46 AM, "Herbert J. Bernstein" <[email protected]>
wrote:

> Dear Colleagues,
> 
>    One very neat resolution to this problem would be to allow a
> list or array-typed CIF2 tag to be referenced in a data file either
> as a whole or element by element.
> 
>    Thus
> 
>    _a.vec
> 
> being defined as an array or list in CIF2 would automatically make
> the tags
> 
>    _a.vec[1]
>    _a.vec[2]
> ...
> 
> defined CIF2 tags.  If the array or list were nested, the
> 
>    _a.vec[1][1]
>    _a.vec[1][2]
> 
> etc. would be valid tags
> 
>    I would propose that this be general and automatic, applying to
> all tags defined as list or arrays.  In view of past practice in
> CIF1, there is a slight conflict with respect to the default starting
> index in dREL versus the common CIF1 practice in indexing arrays
> from 0, but that can (and should be solved) with explicit specification
> of a starting index, so we can carry over the tag name usage from
> CIF1 without confusing people with an index shift.  So, if _a.vec
> were an array of dimension 5, starting from index 0, _a.vec[0]
> through _a.vec[4] would be valid, but if the starting index were
> specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching
> CIF1 conventions.
> 
>    The aliasing mechanism might have to be extended or clarified to
> handle the mapping against CIF1 tags in bulk for _a.vec as a whole,
> but, to me, this has a very intuitive feel.
> 
>    Regards,
>      Herbert
> 
> 
> At 3:29 PM -0500 12/9/09, John Westbrook wrote:
>> Hi all -
>> 
>> On the issue of reserved characters in mmCIF/PDBx data items, these
>> generally have been inherited from the style of items from the core.  The
>> majority of items in this class are data items related to short
>> matrices/tensors
>> and vectors (e.g. items including []).    Virtually all have a syntax which
>> could reasonably be interpreted as a programmatic reference.  For instance,
>> 
>> 
>> _atom_sites.fract_transf_matrix[1][1]   0.007738
>> _atom_sites.fract_transf_matrix[1][2]   0.000000
>> _atom_sites.fract_transf_matrix[1][3]   0.004298
>> _atom_sites.fract_transf_matrix[2][1]   0.000000
>> _atom_sites.fract_transf_matrix[2][2]   0.016545
>> _atom_sites.fract_transf_matrix[2][3]   0.000000
>> _atom_sites.fract_transf_matrix[3][1]   0.000000
>> _atom_sites.fract_transf_matrix[3][2]   0.000000
>> _atom_sites.fract_transf_matrix[3][3]   0.020200
>> _atom_sites.fract_transf_vector[1]      0.00000
>> _atom_sites.fract_transf_vector[2]      0.00000
>> _atom_sites.fract_transf_vector[3]      0.00000
>> 
>> Are we close to being able to treat these as legal in the context of
>> CIF2/DDL+?
>> I suppose I am asking what will constitute a legal assignment for an element
>> of a matrix/array -
>> 
>> Only this -
>> 
>> _a.vec [1,2,3]
>> 
>> or also expanded assignment by element such as -
>> 
>> _a.vec[1]  1
>> _a.vec[2]  2
>> _a.vec[3]  3
>> 
>> If the latter is to be considered, then this will solve most of the data name
>> issues for our data.
>> 
>> Regards,
>> 
>> John
>> 
>> Joe Krahn wrote:
>>>  In practice, CIF2 parsers should allow CIF1 data names within a CIF2
>>>  formatted file. The question is whether these files should be allowed as
>>>  valid CIF2, or just for convenience as a non-standard CIF2.
>>> 
>>>  When CIF files are used as working data files, the restrictions should
>>>  be relaxed. For long-term archival files, it makes sense to be more
>>>  restrictive. I would just make the CIF1 names inaccessible to dREL.
>>>  Alternatively, an implementation could allow CIF1 names only on reading,
>>>  and require dictionary alias mappings to CIF2 names.
>>> 
>>>  One argument in favor of allowing them would be that someone wants to
>>>  convert all data files to CIF2 format, but they want to preserve the
>>>  original data as-is, without alias mapping.
>>> 
>>>  I think that the current CIF2 syntax makes it possible to use CIF1 names
>>>  without any ambiguities. The question is whether they should be
>>>  considered valid CIF2, or just a non-standard version that will be
>>>  useful for the transitional period.
>>> 
>>>  Joe
>>> 
>>> 
>>>  Herbert J. Bernstein wrote:
>>>>  Personally, I would greatly prefer to allow all data names that do not
>>>>  create a major lexer/parser conflict to appear in a data CIF and
>>>>  only apply the strong restrictions to data names that appear in CIF2
>>>>  dictionaries as defined data names (not as aliases).  -- Herbert
>>>> 
>>>> 
>>>>  At 2:40 PM +0000 12/9/09, Brian McMahon wrote:
>>>>>  I have one remaining niggle that I'd like to revisit before we put
>>>>>  this finally to bed. As has been mentioned a couple of times
>>>>>  recently, restricting the data-name character set does invalidate
>>>>>  syntactically many existing CIF 1 files (e.g. _refine_ls_shift/esd_max ).
>>>>>  We have discussed strategies for handling this, and I think these
>>>>>  are workable strategies, but will involve investment and hence expense
>>>>>  in workflow management in CIF archives.
>>>>> 
>>>>>  I understand the rationale behind this restriction is to simplify
>>>>>  future processing of data names in areas such as dREL
>>>>>  applications. The question really is whether we're choosing the right
>>>>>  trade-off in making things cleaner at that end of the processing
>>>>>  chain. I would suppose that a dREL or other application could ingest a
>>>>>  data name with dangerous characters, convert it internally into a
>>>>>  "safe" identifier that's used for all processing, and then restore the
>>>>>  original form upon output; but writing that intermediate layer of
>>>>>  processing is of course expensive (especially if there aren't readily
>>>>>  available libraries that will do this transparently).
>>>>> 
>>>>>  I suspect that some of the original proposed syntactic changes also
>>>>>  had the effect (whether by design or collaterally) of simplifying i/o,
>>>>>  data structure management, symbol table processing etc., but those may
>>>>>  have suffered in the subsequent revision exercise we've just been
>>>>>  practising. Given the consensus we are now approaching, would the code
>>>>>  builders now be prepared to incur the addition expense of handling
>>>>>  "dangerous" data names?
>>>>> 
>>>>>  I really don't want to spark off a long discussion on this - if a
>>>>>  quick round of response shows that there's no appetite to allow
>>>>>  the additional punctuation characters in data names, I'll accept that
>>>>>  gracefully.
>>>>> 
>>>>>  ***
>>>>> 
>>>>>  One last comment while I have the floor, though it is related in part
>>>>>  to the above question. A concern raised in the editorial office was
>>>>>  that there would be circumstances where users didn't know if they were
>>>>>  dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting
>>>>>  to the vi editor - and we're imagining most of them are dealing with
>>>>>  small-molecule/inorganic CIFs). My supposition is that the IUCr
>>>>>  editorial offices would only want to use CIF2 seriously in association
>>>>>  with DDLm dictionaries, and that we would expect the revised core
>>>>>  dictionaries to use the dot component in data names to signal this
>>>>>  further evolution. So even a superficial glimpse of the middle of a
>>>>>  CIF would make it clear whether it was CIF1 or CIF2.
>>>>> 
>>>>>  Does that fit in with how others see this progressing?
>>>>> 
>>>>>  Cheers
>>>>>  Brian
>>>>>  _______________________________________________
>>>>>  ddlm-group mailing list
>>>>>  [email protected]
>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>> 
>>> 
>>>  _______________________________________________
>>>  ddlm-group mailing list
>>>  [email protected]
>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> 
>> --
>> ******************************************************************
>>    John Westbrook, Ph.D.
>>    Rutgers, The State University of New Jersey
>>    Department of Chemistry and Chemical Biology
>>    610 Taylor Road
>>    Piscataway, NJ 08854-8087
>>    e-mail: [email protected]
>>    Ph:  (732) 445-4290  Fax: (732) 445-4320
>> ******************************************************************
>> 
>> _______________________________________________
>> ddlm-group mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: [email protected]




_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

References:

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Data-name character restrictions - one last time