[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Data-name character restrictions - one last time

To: John Westbrook <[email protected]>
Subject: Re: [ddlm-group] Data-name character restrictions - one last time
From: "Herbert J. Bernstein" <[email protected]>
Date: Thu, 10 Dec 2009 14:12:42 -0500
Cc: Group finalising DDLm and associated dictionaries <[email protected]>
In-Reply-To: <[email protected]>
References: <[email protected]><a06240801c74578ec8b59@[192.168.2.104]> <[email protected]><[email protected]><a06240800c746ed907fcc@[192.168.2.104]><[email protected]>

Can we please forget the "non-negotiable" nonsense?  We should start
from the external user requirements and go where they lead us.
We need to support validating existing data files with [] embedded
tags by some workable mechanism.


At 1:59 PM -0500 12/10/09, John Westbrook wrote:
>Hi all --
>
>To follow-up on this again.    If we can support element level assignments in
>data files then virtually all of our data item character set issues 
>will be sorted.
>The treatment of the 0/1 and index ordering can be handled at the dictionary
>level.  I believe that all of this is incorporated in the DDL.  
>
>I would curious if Simon or Brian can comment on the actual usage of
>other potentially reserved characters that are currently among the
>those that are in the nonnegotiable category.
>Regards,
>
>John
>
>
>Herbert J. Bernstein wrote:
>>Dear Colleagues,
>>
>>   One very neat resolution to this problem would be to allow a
>>list or array-typed CIF2 tag to be referenced in a data file either
>>as a whole or element by element.
>>
>>   Thus
>>
>>   _a.vec
>>
>>being defined as an array or list in CIF2 would automatically make
>>the tags
>>
>>   _a.vec[1]
>>   _a.vec[2]
>>...
>>
>>defined CIF2 tags.  If the array or list were nested, the
>>
>>   _a.vec[1][1]
>>   _a.vec[1][2]
>>
>>etc. would be valid tags
>>
>>   I would propose that this be general and automatic, applying to
>>all tags defined as list or arrays.  In view of past practice in
>>CIF1, there is a slight conflict with respect to the default starting
>>index in dREL versus the common CIF1 practice in indexing arrays
>>from 0, but that can (and should be solved) with explicit specification
>>of a starting index, so we can carry over the tag name usage from
>>CIF1 without confusing people with an index shift.  So, if _a.vec
>>were an array of dimension 5, starting from index 0, _a.vec[0]
>>through _a.vec[4] would be valid, but if the starting index were
>>specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching
>>CIF1 conventions.
>>
>>   The aliasing mechanism might have to be extended or clarified to
>>handle the mapping against CIF1 tags in bulk for _a.vec as a whole,
>>but, to me, this has a very intuitive feel.
>>
>>   Regards,
>>     Herbert
>>
>>
>>At 3:29 PM -0500 12/9/09, John Westbrook wrote:
>>>Hi all -
>>>
>>>On the issue of reserved characters in mmCIF/PDBx data items, these
>>>generally have been inherited from the style of items from the core.  The
>>>majority of items in this class are data items related to short 
>>>matrices/tensors
>>>and vectors (e.g. items including []).    Virtually all have a syntax which
>>>could reasonably be interpreted as a programmatic reference.  For instance,
>>>
>>>
>>>_atom_sites.fract_transf_matrix[1][1]   0.007738
>>>_atom_sites.fract_transf_matrix[1][2]   0.000000
>>>_atom_sites.fract_transf_matrix[1][3]   0.004298
>>>_atom_sites.fract_transf_matrix[2][1]   0.000000
>>>_atom_sites.fract_transf_matrix[2][2]   0.016545
>>>_atom_sites.fract_transf_matrix[2][3]   0.000000
>>>_atom_sites.fract_transf_matrix[3][1]   0.000000
>>>_atom_sites.fract_transf_matrix[3][2]   0.000000
>>>_atom_sites.fract_transf_matrix[3][3]   0.020200
>>>_atom_sites.fract_transf_vector[1]      0.00000
>>>_atom_sites.fract_transf_vector[2]      0.00000
>>>_atom_sites.fract_transf_vector[3]      0.00000
>>>
>>>Are we close to being able to treat these as legal in the context 
>>>of CIF2/DDL+?
>>>I suppose I am asking what will constitute a legal assignment for an element
>>>of a matrix/array -
>>>
>>>Only this -
>>>
>>>_a.vec [1,2,3]
>>>
>>>or also expanded assignment by element such as -
>>>
>>>_a.vec[1]  1
>>>_a.vec[2]  2
>>>_a.vec[3]  3
>>>
>>>If the latter is to be considered, then this will solve most of 
>>>the data name
>>>issues for our data.
>>>
>>>Regards,
>>>
>>>John
>>>
>>>Joe Krahn wrote:
>>>>  In practice, CIF2 parsers should allow CIF1 data names within a CIF2
>>>>  formatted file. The question is whether these files should be allowed as
>>>>  valid CIF2, or just for convenience as a non-standard CIF2.
>>>>
>>>>  When CIF files are used as working data files, the restrictions should
>>>>  be relaxed. For long-term archival files, it makes sense to be more
>>>>  restrictive. I would just make the CIF1 names inaccessible to dREL.
>>>>  Alternatively, an implementation could allow CIF1 names only on reading,
>>>>  and require dictionary alias mappings to CIF2 names.
>>>>
>>>>  One argument in favor of allowing them would be that someone wants to
>>>>  convert all data files to CIF2 format, but they want to preserve the
>>>>  original data as-is, without alias mapping.
>>>>
>>>>  I think that the current CIF2 syntax makes it possible to use CIF1 names
>>>>  without any ambiguities. The question is whether they should be
>>>>  considered valid CIF2, or just a non-standard version that will be
>>>>  useful for the transitional period.
>>>  >
>>>>  Joe
>>>>
>>>>
>>>>  Herbert J. Bernstein wrote:
>>>>>  Personally, I would greatly prefer to allow all data names that do not
>>>>>  create a major lexer/parser conflict to appear in a data CIF and
>>>>>  only apply the strong restrictions to data names that appear in CIF2
>>>>>  dictionaries as defined data names (not as aliases).  -- Herbert
>>>>>
>>>>>
>>>>>  At 2:40 PM +0000 12/9/09, Brian McMahon wrote:
>>>>>>  I have one remaining niggle that I'd like to revisit before we put
>>>>>>  this finally to bed. As has been mentioned a couple of times
>>>>>>  recently, restricting the data-name character set does invalidate
>>>>>>  syntactically many existing CIF 1 files (e.g. 
>>>>>>_refine_ls_shift/esd_max ).
>>>>>>  We have discussed strategies for handling this, and I think these
>>>>>>  are workable strategies, but will involve investment and hence expense
>>>>>>  in workflow management in CIF archives.
>>>>>>
>>>>>>  I understand the rationale behind this restriction is to simplify
>>>>>>  future processing of data names in areas such as dREL
>>>>>>  applications. The question really is whether we're choosing the right
>>>>>>  trade-off in making things cleaner at that end of the processing
>>>>>>  chain. I would suppose that a dREL or other application could ingest a
>>>>>>  data name with dangerous characters, convert it internally into a
>>>>>>  "safe" identifier that's used for all processing, and then restore the
>>>>>>  original form upon output; but writing that intermediate layer of
>>>>>>  processing is of course expensive (especially if there aren't readily
>>>>>>  available libraries that will do this transparently).
>>>>>>
>>>>>>  I suspect that some of the original proposed syntactic changes also
>>>>>>  had the effect (whether by design or collaterally) of simplifying i/o,
>>>>>>  data structure management, symbol table processing etc., but those may
>>>>>>  have suffered in the subsequent revision exercise we've just been
>>>>>>  practising. Given the consensus we are now approaching, would the code
>>>>>>  builders now be prepared to incur the addition expense of handling
>>>>>>  "dangerous" data names?
>>>>>>
>>>>>>  I really don't want to spark off a long discussion on this - if a
>>>>>>  quick round of response shows that there's no appetite to allow
>>>>>>  the additional punctuation characters in data names, I'll accept that
>>>>>>  gracefully.
>>>>>>
>>>>>>  ***
>>>>>>
>>>>>>  One last comment while I have the floor, though it is related in part
>>>>>>  to the above question. A concern raised in the editorial office was
>>>>>>  that there would be circumstances where users didn't know if they were
>>>>>>  dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting
>>>>>>  to the vi editor - and we're imagining most of them are dealing with
>>>>>>  small-molecule/inorganic CIFs). My supposition is that the IUCr
>>>>>>  editorial offices would only want to use CIF2 seriously in association
>>>>>>  with DDLm dictionaries, and that we would expect the revised core
>>>>>>  dictionaries to use the dot component in data names to signal this
>>>>>>  further evolution. So even a superficial glimpse of the middle of a
>>>>>>  CIF would make it clear whether it was CIF1 or CIF2.
>>>>>>
>>>>>>  Does that fit in with how others see this progressing?
>>>>>>
>>>>>>  Cheers
>>>>>>  Brian
>>>>>>  _______________________________________________
>>>>>>  ddlm-group mailing list
>>>>>>  [email protected]
>>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>
>>>>
>>>>  _______________________________________________
>>>>  ddlm-group mailing list
>>>>  [email protected]
>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>>--
>>>******************************************************************
>>>    John Westbrook, Ph.D.
>>>    Rutgers, The State University of New Jersey
>>>    Department of Chemistry and Chemical Biology
>>>    610 Taylor Road
>>>    Piscataway, NJ 08854-8087
>>>    e-mail: [email protected]
>>>    Ph:  (732) 445-4290  Fax: (732) 445-4320
>>>******************************************************************
>>>
>>>_______________________________________________
>>>ddlm-group mailing list
>>>[email protected]
>>>http://scripts.iucr.org/mailman/listinfo/ddlm-group


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  [email protected]
=====================================================
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

[ddlm-group] Data-name character restrictions - one last time (Brian McMahon)

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)

Re: [ddlm-group] Data-name character restrictions - one last time (John Westbrook)

Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)

Re: [ddlm-group] Data-name character restrictions - one last time (John Westbrook)

Prev by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time

Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Data-name character restrictions - one last time