[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Data-name character restrictions - one last time
- To: Group finalising DDLm and associated dictionaries <[email protected]>
- Subject: Re: [ddlm-group] Data-name character restrictions - one last time
- From: "Herbert J. Bernstein" <[email protected]>
- Date: Thu, 10 Dec 2009 18:30:35 -0500
- In-Reply-To: <[email protected]>
- References: <[email protected]><a06240801c74578ec8b59@[192.168.2.104]> <[email protected]><[email protected]><a06240800c746ed907fcc@[192.168.2.104]> <[email protected]><a06240804c746f9b45839@[192.168.10.237]><[email protected]>
The dREL/DDLm documents explicitly use ":" for ranges.
The main reason to forbid negative starting indices is for simplicity
in a python implementation.
At 5:41 PM -0500 12/10/09, Joe Krahn wrote:
>The range notation "1:4" is familiar to Fortran90 programmers. Many
>Fortran77 compilers also supported range notation. But, if ':' is
>disallowed in unquoted strings, it would have to be written as:
>
> _type.dimension ['1:4','1:4']
>
>Also, why not allow negative indices? It may make sense to allow only 0
>or 1, but why make it a mandatory restriction?
>
>Joe
>
>Herbert J. Bernstein wrote:
>> No, the implicit zero comes from the dREL documentation in both the
>> 2007 and 2008 versions. This is a very serious issue for people with
>> a Fortran background, and causes many mistakes. Simply being able
>> to specify the starting index would solve the problem.
>>
>> I agree that we need to keep in touch, but I am working from the
>> dREL/DDLm documentation, and hope you are, too. What we need to do
>> is to stop
>> focusing in stylistic issues and work on getting the documentation to
>> be clear and unambiguous with more examples, so we do not go another
>> 3+ years without people being aware of such critical issues as the
>> default starting
>> index for arrays.
>>
>> You will find the statement about the default index for arrays in
>>section 3.4
>> of dREL_spec_aug08.pdf. All we need to fix it is to adopt a new tag to
>> identify the starting index, such as
>>
>> _type.starting_index
>>
>> or allow the dimensions of an array to be ranges. The only problem
>> with that is that there is a strange python convention which would
>> suggest that
>>
>> _type.dimension [1:5]
>>
>> would be declaring an array of dimension 4, starting at index 1. To avoid
>> the confusion that would cause for Fortran programmers, I would suggest
>> that we write dictionaries with
>>
>> _type.starting_index [1,1]
>> _type.dimension [3,3]
>>
>> instead of
>>
>> _type.dimension [1:4,1:4]
>>
>> which would be natural in a python world, but not for Fortran programmers.
>> To make implementation easy, I would not allow negative starting indices.
>>
>> -- Herbert
>>
>> At 2:14 PM -0500 12/10/09, David Brown wrote:
>>> I was not aware that there was a default indexing of arrays. The
>>> only place where this arises in DDL1 is in the list of symmetry
>>> opertations where we originally failed to define a key for the symop
>>> loop. But there, as far as I am aware, the assumed indexing always
>>> starts at 1 for the first item. This is strictly a fix since CIF1
>>> specificly states that the order within a loop has no significance.
>>> Later additions to the dictionary have corrected this oversight by
>>> adding an explicit key, but it is not yet often used. Otherwise, in
>>> DDL1 (and DDL2?) the elements of an array have explicit data names
> >> that start at 1, not 0. The assumption that arrays are numbered
>>>from zero must be an imgCIF convention. It would always be better
>>> to include explicit indexing to avoid these problems.
>>>
>>> The DDLm dictionaries have methods for constructing arrays from
>>> their elements, and methods for the reverse process could be added.
>>> In this case it would not be necessary to decompose (or assemble) an
>>> array on first resding as the necessary action would be taken as
>>> soon as the array or its elements are invoked by a method or by a
>>> list of items to output.
>>>
>>> This raises another concern. Herbert, if you are writing DDLm
>>> dictionaries for imgCIF and I am writing them for coreCIF, we need
>>> to keep in contact to make sure we are not introducing conficting
>>> conventions.
>>>
>>> David
>>>
>>> Herbert J. Bernstein wrote:
>>>
>>>> Dear Colleagues,
>>>>
>>>> One very neat resolution to this problem would be to allow a
> >>> list or array-typed CIF2 tag to be referenced in a data file either
>>>> as a whole or element by element.
>>>>
>>>> Thus
>>>>
>>>> _a.vec
>>>>
>>>> being defined as an array or list in CIF2 would automatically make
>>>> the tags
>>>>
>>>> _a.vec[1]
>>>> _a.vec[2]
>>>> ...
>>>>
>>>> defined CIF2 tags. If the array or list were nested, the
>>>>
>>>> _a.vec[1][1]
>>>> _a.vec[1][2]
>>>>
>>>> etc. would be valid tags
>>>>
>>>> I would propose that this be general and automatic, applying to
>>>> all tags defined as list or arrays. In view of past practice in
>>>> CIF1, there is a slight conflict with respect to the default starting
>>>> index in dREL versus the common CIF1 practice in indexing arrays
>>> >from 0, but that can (and should be solved) with explicit specification
>>>> of a starting index, so we can carry over the tag name usage from
>>>> CIF1 without confusing people with an index shift. So, if _a.vec
>>>> were an array of dimension 5, starting from index 0, _a.vec[0]
>>>> through _a.vec[4] would be valid, but if the starting index were
>>>> specified as 1, _a.vec[1] through _a.vec[5] would be valid, matching
>>>> CIF1 conventions.
>>>>
>>>> The aliasing mechanism might have to be extended or clarified to
>>>> handle the mapping against CIF1 tags in bulk for _a.vec as a whole,
>>>> but, to me, this has a very intuitive feel.
>>>>
>>>> Regards,
>>>> Herbert
>>>>
>>>>
>>>> At 3:29 PM -0500 12/9/09, John Westbrook wrote:
>>>>
>>>>> Hi all -
>>>>>
>>>>> On the issue of reserved characters in mmCIF/PDBx data items, these
>>>>> generally have been inherited from the style of items from the core. The
>>>>> majority of items in this class are data items related to short
>>>>> matrices/tensors
>>>>> and vectors (e.g. items including []). Virtually all have a
>>>>>syntax which
>>>>> could reasonably be interpreted as a programmatic reference.
>>>>>For instance,
>>>>>
>>>>>
>>>>> _atom_sites.fract_transf_matrix[1][1] 0.007738
>>>>> _atom_sites.fract_transf_matrix[1][2] 0.000000
>>>>> _atom_sites.fract_transf_matrix[1][3] 0.004298
>>>>> _atom_sites.fract_transf_matrix[2][1] 0.000000
>>>>> _atom_sites.fract_transf_matrix[2][2] 0.016545
>>>>> _atom_sites.fract_transf_matrix[2][3] 0.000000
>>>>> _atom_sites.fract_transf_matrix[3][1] 0.000000
>>>>> _atom_sites.fract_transf_matrix[3][2] 0.000000
>>>>> _atom_sites.fract_transf_matrix[3][3] 0.020200
>>>>> _atom_sites.fract_transf_vector[1] 0.00000
>>>>> _atom_sites.fract_transf_vector[2] 0.00000
>>>>> _atom_sites.fract_transf_vector[3] 0.00000
>>>>>
>>>>> Are we close to being able to treat these as legal in the context of
>>>>> CIF2/DDL+?
>>>>> I suppose I am asking what will constitute a legal assignment
>>>>>for an element
>>>>> of a matrix/array -
>>>>>
>>>>> Only this -
>>>>>
>>>>> _a.vec [1,2,3]
>>>>>
>>>>> or also expanded assignment by element such as -
>>>>>
>>>>> _a.vec[1] 1
>>>>> _a.vec[2] 2
>>>>> _a.vec[3] 3
>>>>>
>>>>> If the latter is to be considered, then this will solve most of
>>>>> the data name
>>>>> issues for our data.
>>>>>
>>>>> Regards,
>>>>>
>>>>> John
>>>>>
>>>>> Joe Krahn wrote:
>>>>>
>>>>>> In practice, CIF2 parsers should allow CIF1 data names within a CIF2
> >>>>> formatted file. The question is whether these files should
>be allowed as
>>>>>> valid CIF2, or just for convenience as a non-standard CIF2.
>>>>>>
>>>>>> When CIF files are used as working data files, the restrictions should
>>>>>> be relaxed. For long-term archival files, it makes sense to be more
>>>>>> restrictive. I would just make the CIF1 names inaccessible to dREL.
>>>>>> Alternatively, an implementation could allow CIF1 names only
>>>>>>on reading,
>>>>>> and require dictionary alias mappings to CIF2 names.
>>>>>>
>>>>>> One argument in favor of allowing them would be that someone wants to
>>>>>> convert all data files to CIF2 format, but they want to preserve the
>>>>>> original data as-is, without alias mapping.
>>>>>>
>>>>>> I think that the current CIF2 syntax makes it possible to use
>>>>>>CIF1 names
>>>>>> without any ambiguities. The question is whether they should be
>>>>>> considered valid CIF2, or just a non-standard version that will be
> >>>>> useful for the transitional period.
>>>>>>
>>>>> >
>>>>>
>>>>>> Joe
>>>>>>
>>>>>>
>>>>>> Herbert J. Bernstein wrote:
>>>>>>
>>>>>>> Personally, I would greatly prefer to allow all data names that do not
>>>>>>> create a major lexer/parser conflict to appear in a data CIF and
>>>>>>> only apply the strong restrictions to data names that appear in CIF2
>>>>>>> dictionaries as defined data names (not as aliases). -- Herbert
>>>>>>>
>>>>>>>
>>>>>>> At 2:40 PM +0000 12/9/09, Brian McMahon wrote:
>>>>>>>
>>>>>>>> I have one remaining niggle that I'd like to revisit before we put
>>>>>>>> this finally to bed. As has been mentioned a couple of times
>>>>>>>> recently, restricting the data-name character set does invalidate
>>>>>>>> syntactically many existing CIF 1 files (e.g.
>>>>>>>> _refine_ls_shift/esd_max ).
>>>>>>>> We have discussed strategies for handling this, and I think these
>>>>>>>> are workable strategies, but will involve investment and
>>>>>>>>hence expense
>>>>>>>> in workflow management in CIF archives.
>>>>>>>>
>>>>>>>> I understand the rationale behind this restriction is to simplify
>>>>>>>> future processing of data names in areas such as dREL
>>>>>>>> applications. The question really is whether we're choosing the right
>>>>>>>> trade-off in making things cleaner at that end of the processing
>>>>>>>> chain. I would suppose that a dREL or other application
>>>>>>>>could ingest a
>>>>>>>> data name with dangerous characters, convert it internally into a
>>>>>>>> "safe" identifier that's used for all processing, and then
>>>>>>>>restore the
>>>>>>>> original form upon output; but writing that intermediate layer of
>>>>>>>> processing is of course expensive (especially if there aren't readily
>>>>>>>> available libraries that will do this transparently).
>>>>>>>>
>>>>>>>> I suspect that some of the original proposed syntactic changes also
>>>>>>>> had the effect (whether by design or collaterally) of
>>>>>>>>simplifying i/o,
>>>>>>>> data structure management, symbol table processing etc.,
>>>>>>>>but those may
>>>>>>>> have suffered in the subsequent revision exercise we've just been
>>>>>>>> practising. Given the consensus we are now approaching,
>>>>>>>>would the code
>>>>>>>> builders now be prepared to incur the addition expense of handling
>>>>>>>> "dangerous" data names?
>>>>>>>>
>>>>>>>> I really don't want to spark off a long discussion on this - if a
>>>>>>>> quick round of response shows that there's no appetite to allow
>>>>>>>> the additional punctuation characters in data names, I'll accept that
>>>>>>>> gracefully.
>>>>>>>>
>>>>>>>> ***
>>>>>>>>
>>>>>>>> One last comment while I have the floor, though it is related in part
>>>>>>>> to the above question. A concern raised in the editorial office was
>>>>>>>> that there would be circumstances where users didn't know
>>>>>>>>if they were
>>>>>>>> dealing with a CIF 1 or 2 ("users" meaning authors, perhaps resorting
>>>>>>>> to the vi editor - and we're imagining most of them are dealing with
>>>>>>>> small-molecule/inorganic CIFs). My supposition is that the IUCr
>>>>>>>> editorial offices would only want to use CIF2 seriously in
>>>>>>>>association
>>>>>>>> with DDLm dictionaries, and that we would expect the revised core
> >>>>>>> dictionaries to use the dot component in data names to signal this
>>>>>>>> further evolution. So even a superficial glimpse of the middle of a
>>>>>>>> CIF would make it clear whether it was CIF1 or CIF2.
>>>>>>>>
>>>>>>>> Does that fit in with how others see this progressing?
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>> Brian
>_______________________________________________
>ddlm-group mailing list
>[email protected]
>http://scripts.iucr.org/mailman/listinfo/ddlm-group
--
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[email protected]
=====================================================
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Data-name character restrictions - one last time (Brian McMahon)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)
- Re: [ddlm-group] Data-name character restrictions - one last time (John Westbrook)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (David Brown)
- Re: [ddlm-group] Data-name character restrictions - one last time (Herbert J. Bernstein)
- Re: [ddlm-group] Data-name character restrictions - one last time (Joe Krahn)
- Prev by Date: Re: [ddlm-group] Revised version of syntax change summary document
- Next by Date: Re: [ddlm-group] Data-name character restrictions - one last time
- Prev by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Next by thread: Re: [ddlm-group] Data-name character restrictions - one last time
- Index(es):

