Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Data-name character restrictions - one last time

Title:
Data names of the form _xxx[i] only appear in DDL2.  In DDL1 tne corrsponding items are call _atom_site_aniso_U_11 etc.  In these cases it will still be necessary to have explict dictionary entries but presumably they can be aliased to _atom_site.aniso[1][1] which dREL etc. would recognize.

Davod

Nick Spadaccini wrote:
I don't think a mechanism for specifying a starting index will work at the
individual definition level. They will all have to start at the same address
otherwise if I try to access within dREL some other object, how do I know
what its starting index is?

Best to decide on a starting index and fix it. There is an historical
precedent in CIF that has it staring at 1. As wrong as I would argue that
is, it is in stone so stick with it.

In my code I will simply offset the index by -1 to get to the real storage
point (I don't program in languages that index starting at 1) - it I easy
enough to do.

Seems a solution to me.

The _xxx_yyy[] syntax is an ancient category like definition that never
appears in data.


On 11/12/09 7:37 PM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
wrote:

I am saying that any declaration of an array or of a list would
make its individual elements available for use in any data CIF without
the need for any further declarations in the dictionary.  This is
simple and clear and completely consistent with dREL.  The only really
new thing would be some mechanism(s) to specify the starting index.

I think this covers John's need.  The only thing it would not cover
is something like _xxx_yyy[] which appears in some CIF1 dictionaries
but not in the data files, so I don't think there should be an
issue with not allowing those in CIF2.

Does anyone see a problem with this?
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Fri, 11 Dec 2009, Nick Spadaccini wrote:

I can agree with that, if you are saying only the matrix object is available
to the user.

OR alternatively are you saying there will ONLY be one object defined in the
dictionary, let's say the 3x3 matrix

_atom_site.U

But NEVER have definitions in the dictionary for the individual
_atom_site.U[i][j] elements.

As we parse a CIF data file, if we detect _atom_site.U[i][j], it isn't in
the defined dictionary so this would normally raise an error. BUT because of
the specific trailing syntax [i][j] this informs the parser there must be an
object of matching rank with the name _atom_site.U (ie the
_atom_site.U[i][j] with the [i][j] truncated) in the dictionary - and
therefore populate the appropriate element of _atom_site.U with that value.

This would circumvent the problem of two different identifiers called
_atom_site.U[i][j] in the dictionary BUT would necessarily mean that [i][j]
syntax in a data name was reserved for objects that are defined in the
dictionary as, in this case, a 2D matrix. They can't (shouldn't?) be used
for general data names.

Does this cover what John wanted also?


On 11/12/09 10:12 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
wrote:

Actually, the suggestion comes from reading the dREL documentation and the
DDLm documentation and noticing how clumsy the access to array elements in
DDLm is compared to the access in dREL.  What I am suggesting is to
promote the dREL access making it fully available at the DDLm level,
replacing the clumsy element-by-element definitions with one automatic
definition that looks and works just the way one might expect.

Regards,
   Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Fri, 11 Dec 2009, Nick Spadaccini wrote:

Many of you need to read the dREL part of the dictionary much more closely.

dREL extensively exploits access to  matrix and vector types by index
addressing at a programmatic level. That's how it gets done the things it
is
has to. So within the dREL programming language you will see littered
everywhere a matrix which is accessed via standard indexing (as you would
with any language supporting array structures).

So lets have a matrix _atom_site.U - within dREL I have access to
_atom_site.U[0][0] etc as part of the language (I'll stick with 0 initial
indexing but this really is a trivial problem, solved many times over).

But now you ALSO want a scalar data item called _atom_site.U[0][0] with in
CIF. The dictionary says _atom_site.U[0][0] is a single scalar value.

The dREL constructor method for _atom_site.U has

_atom_site.U = Matrix([[atom_site.U[0][0] ...]...])

This obviously won't work. This is why the dictionary in DDLm uses the
equivalent of _atom_site.U_0_0 for the scalar value so that the above
constructor will make sense and still allows me to access
_atom_site.U[0][0]
from within dREL. It is why I am keen to restrict the syntax of the data
names.


On 11/12/09 2:46 AM, "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
wrote:

Dear Colleagues,

   One very neat resolution to this problem would be to allow a
list or array-typed CIF2 tag to be referenced in a data file either
as a whole or element by element.

   Thus

   _a.vec

being defined as an array or list in CIF2 would automatically make
the tags

   _a.vec[1]
   _a.vec[2]
...

defined CIF2 tags.  If the array or list were nested, the

   _a.vec[1][1]
   _a.vec[1][2]

etc. would be valid tags

   I would propose that this be general and automatic, applying to
all tags defined as list or arrays.  In view of past practice in
CIF1, there is a slight conflict with respect to the default starting
index in dREL versus the common CIF1 practice in indexing arrays
from 0, but that can (and should be solved) with explicit specification
of a starting index, so we can carry over the tag name usage from
CIF1 without confusing people with an index shift.  So, if _a.vec
were an array of dimension 5, starting from index 0, _a.vec[0]
through _a.vec[4] would be valid, but if the starting index were
specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching
CIF1 conventions.

   The aliasing mechanism might have to be extended or clarified to
handle the mapping against CIF1 tags in bulk for _a.vec as a whole,
but, to me, this has a very intuitive feel.

   Regards,
     Herbert


At 3:29 PM -0500 12/9/09, John Westbrook wrote:
Hi all -

On the issue of reserved characters in mmCIF/PDBx data items, these
generally have been inherited from the style of items from the core.  The
majority of items in this class are data items related to short
matrices/tensors
and vectors (e.g. items including []).    Virtually all have a syntax
which
could reasonably be interpreted as a programmatic reference.  For
instance,


_atom_sites.fract_transf_matrix[1][1]   0.007738
_atom_sites.fract_transf_matrix[1][2]   0.000000
_atom_sites.fract_transf_matrix[1][3]   0.004298
_atom_sites.fract_transf_matrix[2][1]   0.000000
_atom_sites.fract_transf_matrix[2][2]   0.016545
_atom_sites.fract_transf_matrix[2][3]   0.000000
_atom_sites.fract_transf_matrix[3][1]   0.000000
_atom_sites.fract_transf_matrix[3][2]   0.000000
_atom_sites.fract_transf_matrix[3][3]   0.020200
_atom_sites.fract_transf_vector[1]      0.00000
_atom_sites.fract_transf_vector[2]      0.00000
_atom_sites.fract_transf_vector[3]      0.00000

Are we close to being able to treat these as legal in the context of
CIF2/DDL+?
I suppose I am asking what will constitute a legal assignment for an
element
of a matrix/array -

Only this -

_a.vec [1,2,3]

or also expanded assignment by element such as -

_a.vec[1]  1
_a.vec[2]  2
_a.vec[3]  3

If the latter is to be considered, then this will solve most of the data
name
issues for our data.

Regards,

John

Joe Krahn wrote:
 In practice, CIF2 parsers should allow CIF1 data names within a CIF2
 formatted file. The question is whether these files should be allowed
as
 valid CIF2, or just for convenience as a non-standard CIF2.

 When CIF files are used as working data files, the restrictions should
 be relaxed. For long-term archival files, it makes sense to be more
 restrictive. I would just make the CIF1 names inaccessible to dREL.
 Alternatively, an implementation could allow CIF1 names only on
reading,
 and require dictionary alias mappings to CIF2 names.

 One argument in favor of allowing them would be that someone wants to
 convert all data files to CIF2 format, but they want to preserve the
 original data as-is, without alias mapping.

 I think that the current CIF2 syntax makes it possible to use CIF1
names
 without any ambiguities. The question is whether they should be
 considered valid CIF2, or just a non-standard version that will be
 useful for the transitional period.

 Joe


 Herbert J. Bernstein wrote:
 Personally, I would greatly prefer to allow all data names that do not
 create a major lexer/parser conflict to appear in a data CIF and
 only apply the strong restrictions to data names that appear in CIF2
 dictionaries as defined data names (not as aliases).  -- Herbert


 At 2:40 PM +0000 12/9/09, Brian McMahon wrote:
 I have one remaining niggle that I'd like to revisit before we put
 this finally to bed. As has been mentioned a couple of times
 recently, restricting the data-name character set does invalidate
 syntactically many existing CIF 1 files (e.g.
_refine_ls_shift/esd_max
).
 We have discussed strategies for handling this, and I think these
 are workable strategies, but will involve investment and hence
expense
 in workflow management in CIF archives.

 I understand the rationale behind this restriction is to simplify
 future processing of data names in areas such as dREL
 applications. The question really is whether we're choosing the right
 trade-off in making things cleaner at that end of the processing
 chain. I would suppose that a dREL or other application could ingest
a
 data name with dangerous characters, convert it internally into a
 "safe" identifier that's used for all processing, and then restore
the
 original form upon output; but writing that intermediate layer of
 processing is of course expensive (especially if there aren't readily
 available libraries that will do this transparently).

 I suspect that some of the original proposed syntactic changes also
 had the effect (whether by design or collaterally) of simplifying
i/o,
 data structure management, symbol table processing etc., but those
may
 have suffered in the subsequent revision exercise we've just been
 practising. Given the consensus we are now approaching, would the
code
 builders now be prepared to incur the addition expense of handling
 "dangerous" data names?

 I really don't want to spark off a long discussion on this - if a
 quick round of response shows that there's no appetite to allow
 the additional punctuation characters in data names, I'll accept that
 gracefully.

 ***

 One last comment while I have the floor, though it is related in part
 to the above question. A concern raised in the editorial office was
 that there would be circumstances where users didn't know if they
were
 dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting
 to the vi editor - and we're imagining most of them are dealing with
 small-molecule/inorganic CIFs). My supposition is that the IUCr
 editorial offices would only want to use CIF2 seriously in
association
 with DDLm dictionaries, and that we would expect the revised core
 dictionaries to use the dot component in data names to signal this
 further evolution. So even a superficial glimpse of the middle of a
 CIF would make it clear whether it was CIF1 or CIF2.

 Does that fit in with how others see this progressing?

 Cheers
 Brian
 _______________________________________________
 ddlm-group mailing list
 ddlm-group@iucr.org
 http://scripts.iucr.org/mailman/listinfo/ddlm-group
 _______________________________________________
 ddlm-group mailing list
 ddlm-group@iucr.org
 http://scripts.iucr.org/mailman/listinfo/ddlm-group
--
******************************************************************
   John Westbrook, Ph.D.
   Rutgers, The State University of New Jersey
   Department of Chemistry and Chemical Biology
   610 Taylor Road
   Piscataway, NJ 08854-8087
   e-mail: jwest@rcsb.rutgers.edu
   Ph:  (732) 445-4290  Fax: (732) 445-4320
******************************************************************

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.