Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Finalizing DDLm

Herbert, if I can transform your suggestion into a DDLm context: datanames in DDLm are currently either 'List' (loopable) or 'Set' (probably unloopable on most readings, we need to clarify this).  You are in favour of allowing 'List' items to be potentially unlooped if only a single packet is present, that is, become equivalent to all DDL2 datanames, which do not have an explicit loopability status.

Regarding your previous email:

On Fri, Mar 12, 2010 at 11:44 PM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote:

>  For a validating parser that consults the dictionary, a programmer
> should see no difference with the interpreation I suggest.  With the
> dictionary available, a validating parser would present a single row
> loop that the dictionary prefers to have be treated as single values
> as those single values, and a collection of single values that the
> dictionary prefers to have treated as a loop as a loop.  Nothing
> is lost, and the user is saved from dealing with an error that
> need nit be treated as an error,  but at most as a warning.

Absolutely: this is trivially true for a *validating* parser, which by definition has access to the dictionary information.  I am however keen that we allow the CIF standards to be implemented in a modular way, so the parsing and validation steps can be separated.  Therefore, under the unlooping suggestion the validation/dictionary use stage has to take an extra step of reconstructing any one-packet loops that were not presented as loops in the parsing stage, which was my original point.

> Inasmuch as, under the current rules, all DDLm methods have to come
> from dictionaries and not from data files, if the parser is working
> on a data file for which no dictionary is available, then the approach
> I propose allows for at least some useful interpretation of a
> data file -- the same interpretation used by current DDL2 parsers
> -- treat any collection of single values from the same category
> as if they came from a single row loop.  I am not saying that every
> CIF2 data file will be properly readable this way without a dictionary,
> but many of them will be.

If a parser is working on a data file for which no dictionary is available, you can do whatever nefarious thing you wish to do (including 'useful work') with the results of that parse, as the concepts of category and loopability have no meaning without a DDL.  All syntactically correct CIF2 data files will most definitely be properly readable, as a violation of 'loopability' rules is a violation of the DDLm rules, not the syntax rules.  Of course, without a dictionary you will be unable to identify what items belong to the same 'category' (especially if you unloop single-packet loops, and whatever 'category' might mean without a DDL).

> This is jst another matter of provding a reasonable default coercion
> for a case that need not be a fatal error to allow more people to get
> useful work done.

I hope I have demonstrated in the previous paragraph that not allowing single-packet DDLm loops to be presented outside a loop in no way hinders useful work that does not rely on a dictionary.  So, moving back to the original issue of changing the DDLm standard to allow single packet loops to be unlooped: I cannot imagine what useful work is being prevented by not allowing one-packet loops to be presented as plain key-value pairs.   I would have thought that, far from preventing useful work, the fact that a programmer can assume that a given dataname must only ever appear in a loop is a simplifying assumption for both reader and writer.  Offering looping choice makes the programmer's task more difficult, with no demonstrated benefit (but feel free to provide an example of the useful work that is being prevented).

Furthermore, this extra complexity for implementors is compounded because of the extra significance that DDLm ascribes to categories, and the fact that there are now relationships between categories similar to database table joins.  Efficiently programming the loop-traversing dREL constructs will require that all 'List' category packets are held in the same datastructure, and allowing one-packet loops to be 'unlooped' will mean that at some stage these loops will have to be internally reconstructed.  I have not seen any concrete example of an offsetting benefit that would compensate this extra work.

But there's more! If we go down the path of allowing unrolling of single-packet loops, the next step is to say, well, what if I represent the values taken by my looped datanames as lists?  Then I can express *any* looped dataname as a key-value pair! For example, we can turn

loop_
item_a
item_b
item_c

1 2 3
q r s
x y z

into

...
item_a [1 q x]
...
item_c [3 s z]
...
item_b [2 r y]

And why stop at either/or?  Couldn't I equally well allow that some of my looped datanames appear outside the loop with list values, and the rest appear in the loop, depending on which bits of my program produce them? Should we allow these presentations as well, with concomitant extra burden on implementors?  I for one think not.

all the best,
James.

> Regards,
>  Herbert
>
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
>
> On Fri, 12 Mar 2010, James Hester wrote:
>
>> Herbert: I think David was more concerned about the dictionary
>> structure rather than the datafile presentation, but you raise an
>> interesting point regarding the equivalence of one-row loops and a set
>> of key-value pairs.  Semantic information *is* lost by turning a
>> one-row loop into key-value pairs:  we lose the information that these
>> dataitems 'belong together', as the key-value pairs are semantically
>> on the same level as all the other datablock key-value pairs. This
>> information can, of course, be recovered by reference to a dictionary.
>>
>> The practical effect of this semantic loss is that programmers will
>> not be able to count on the parser pre-packaging loops for them, and
>> will thus need to allow for reconstruction of loop datastructures when
>> performing per-loop operations (such as in dREL methods).   I don't
>> think this is a big problem, but it is an increase in complexity, and
>> I do not see what we gain by allowing such an equivalence.
>>
>> On Fri, Mar 12, 2010 at 4:37 AM, Herbert J. Bernstein
>> <yaya@bernstein-plus-sons.com> wrote:
>>>
>>> I would favor treating a looped presentation of a single row of items as
>>> valid in all cases, and treating the presentation as individual tags
>>> and value as equally valid and equivalent.  I also like David's
>>> suggestion
>>> of allowing a individual tag and value to be distributed over a loop
>>> for the same category.  This would start to put us into a parallel
>>> position
>>> to the handling of XML attributes.
>>
>> Regarding distributing individual tags and values over a loop, I think
>> the current DDLm approach of 'Set' and 'List' categories combined with
>> parent/child relationships is adequate for our needs.
>>
>>>
>>> At 10:35 AM -0500 3/11/10, David Brown wrote:
>>>>
>>>> Dear Colleagues,
>>>>
>>>> I assume that we are essentially finished in resolving syntax
>>>> problems, but in that discussion some items were identified as being
>>>> related to DDLm rather than syntax, so before we settle into serious
>>>> dictionary writing we need to understand the DDLm rules.
>>>>
>>>> One item that I believe was raised under this heading was whether,
>>>> if a loop contained a single set of items, it was necessary to
>>>> formally include this in a loop structure.  If this is deemed to be
>>>> necessary, then there has to be some way of identifying the items
>>>> that must appear in a loop.  The presence in the dictionary of a
>>>> _category_key.* item would seem to flag this, but it is applied at
>>>> the level of the category rather than at the level of an individual
>>>> item.  If the requirement that the loop structure must always be
>>>> used, then all the items in the category must be loopable, i.e., the
>>>> category cannot include items that would not normally be included in
>>>> the loop, items for example that apply equally to all the listed
>>>> items such as a scale factor that is the same for all the structure
>>>> factors in a loop.  This seems to be workable, but I am not sure how
>>>> the legacy CIFs would fit in, since categories may include some
>>>> listable item and some non-listable items, and I am sure the
>>>> listable items do not always appear in a loop if there is only one
>>>> set of such items reported in the CIF.
>>>>
>>>> Is this something that can be clarified fairly easily?  It has an
>>>> important bearing on how the CIF dictionaries are written.
>>>>
>>>> David
>>>>
>>>> Attachment converted: Macintosh HD:idbrown 55.vcf (TEXT/ttxt) (0046DFC7)
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>>
>>> --
>>> =====================================================
>>>  Herbert J. Bernstein, Professor of Computer Science
>>>    Dowling College, Kramer Science Center, KSC 121
>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                  +1-631-244-3035
>>>                  yaya@dowling.edu
>>> =====================================================
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.