[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Data-name character restrictions - one last time

Hi all -

On the issue of reserved characters in mmCIF/PDBx data items, these
generally have been inherited from the style of items from the core.  The
majority of items in this class are data items related to short matrices/tensors
and vectors (e.g. items including []).    Virtually all have a syntax which
could reasonably be interpreted as a programmatic reference.  For instance,


_atom_sites.fract_transf_matrix[1][1]   0.007738
_atom_sites.fract_transf_matrix[1][2]   0.000000
_atom_sites.fract_transf_matrix[1][3]   0.004298
_atom_sites.fract_transf_matrix[2][1]   0.000000
_atom_sites.fract_transf_matrix[2][2]   0.016545
_atom_sites.fract_transf_matrix[2][3]   0.000000
_atom_sites.fract_transf_matrix[3][1]   0.000000
_atom_sites.fract_transf_matrix[3][2]   0.000000
_atom_sites.fract_transf_matrix[3][3]   0.020200
_atom_sites.fract_transf_vector[1]      0.00000
_atom_sites.fract_transf_vector[2]      0.00000
_atom_sites.fract_transf_vector[3]      0.00000

Are we close to being able to treat these as legal in the context of CIF2/DDL+?
I suppose I am asking what will constitute a legal assignment for an element
of a matrix/array -

Only this -

_a.vec [1,2,3]

or also expanded assignment by element such as -

_a.vec[1]  1
_a.vec[2]  2
_a.vec[3]  3

If the latter is to be considered, then this will solve most of the data name
issues for our data.

Regards,

John

Joe Krahn wrote:
> In practice, CIF2 parsers should allow CIF1 data names within a CIF2 
> formatted file. The question is whether these files should be allowed as 
> valid CIF2, or just for convenience as a non-standard CIF2.
> 
> When CIF files are used as working data files, the restrictions should 
> be relaxed. For long-term archival files, it makes sense to be more 
> restrictive. I would just make the CIF1 names inaccessible to dREL. 
> Alternatively, an implementation could allow CIF1 names only on reading, 
> and require dictionary alias mappings to CIF2 names.
> 
> One argument in favor of allowing them would be that someone wants to 
> convert all data files to CIF2 format, but they want to preserve the 
> original data as-is, without alias mapping.
> 
> I think that the current CIF2 syntax makes it possible to use CIF1 names 
> without any ambiguities. The question is whether they should be 
> considered valid CIF2, or just a non-standard version that will be 
> useful for the transitional period.
> 
> Joe
> 
> 
> Herbert J. Bernstein wrote:
>> Personally, I would greatly prefer to allow all data names that do not
>> create a major lexer/parser conflict to appear in a data CIF and
>> only apply the strong restrictions to data names that appear in CIF2
>> dictionaries as defined data names (not as aliases).  -- Herbert
>>
>>
>> At 2:40 PM +0000 12/9/09, Brian McMahon wrote:
>>> I have one remaining niggle that I'd like to revisit before we put
>>> this finally to bed. As has been mentioned a couple of times
>>> recently, restricting the data-name character set does invalidate
>>> syntactically many existing CIF 1 files (e.g. _refine_ls_shift/esd_max ).
>>> We have discussed strategies for handling this, and I think these
>>> are workable strategies, but will involve investment and hence expense
>>> in workflow management in CIF archives.
>>>
>>> I understand the rationale behind this restriction is to simplify
>>> future processing of data names in areas such as dREL
>>> applications. The question really is whether we're choosing the right
>>> trade-off in making things cleaner at that end of the processing
>>> chain. I would suppose that a dREL or other application could ingest a
>>> data name with dangerous characters, convert it internally into a
>>> "safe" identifier that's used for all processing, and then restore the
>>> original form upon output; but writing that intermediate layer of
>>> processing is of course expensive (especially if there aren't readily
>>> available libraries that will do this transparently).
>>>
>>> I suspect that some of the original proposed syntactic changes also
>>> had the effect (whether by design or collaterally) of simplifying i/o,
>>> data structure management, symbol table processing etc., but those may
>>> have suffered in the subsequent revision exercise we've just been
>>> practising. Given the consensus we are now approaching, would the code
>>> builders now be prepared to incur the addition expense of handling
>>> "dangerous" data names?
>>>
>>> I really don't want to spark off a long discussion on this - if a
>>> quick round of response shows that there's no appetite to allow
>>> the additional punctuation characters in data names, I'll accept that
>>> gracefully.
>>>
>>> ***
>>>
>>> One last comment while I have the floor, though it is related in part
>>> to the above question. A concern raised in the editorial office was
>>> that there would be circumstances where users didn't know if they were
>>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting
>>> to the vi editor - and we're imagining most of them are dealing with
>>> small-molecule/inorganic CIFs). My supposition is that the IUCr
>>> editorial offices would only want to use CIF2 seriously in association
>>> with DDLm dictionaries, and that we would expect the revised core
>>> dictionaries to use the dot component in data names to signal this
>>> further evolution. So even a superficial glimpse of the middle of a
>>> CIF would make it clear whether it was CIF1 or CIF2.
>>>
>>> Does that fit in with how others see this progressing?
>>>
>>> Cheers
>>> Brian
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

-- 
******************************************************************
   John Westbrook, Ph.D.
   Rutgers, The State University of New Jersey
   Department of Chemistry and Chemical Biology
   610 Taylor Road
   Piscataway, NJ 08854-8087
   e-mail: jwest@rcsb.rutgers.edu
   Ph:  (732) 445-4290  Fax: (732) 445-4320
******************************************************************
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]