Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Finalizing DDLm

I have not had any time to respond to David¹s original email or the
subsequent discussions. However I have discussed it with Syd.

DDLm defines loopable categories strictly and sub-categories of those have
strictly enforced outer joins (given we have ONLY considered sub-categories
that are List). This is how we overcome the split versus non-split versions
of atom_site and atom_site_aniso loops. List is strictly looped, Set is
strictly non-looped ­ contrary to your reading, and possibly Syd¹s original
text. He has since re-read what he wrote and clarified the ambiguity. I will
send you that re-write shortly.

We had not considered a SET category being a sub-category of a List
category, but if it is allowed then it would not be an outer-join as you
suggested in your previous contribution but a relational Cartesian product
(which is very different).

The DDLm specification has that List category data MUST appear in a loop
(irrespective of how many rows there are), and SET categories are strictly
singular (non-looped data). The formal specification will formally remain
that way.

HOWEVER the IUCr is free to ³extend² the specification of the DDLm for its
own internal and private use, so long as you appreciate that the FORMAL
published specification of what Syd and I have created can¹t include it.

You might explain as to why you feel you have trouble with

loop_
 _atom_site.label 
 _atom_site.frac_x 
 _atom_site.frac_y 
 _atom_site.frac_z
 Cu 0 0 0

And yet 

 _atom_site.label  Cu
 _atom_site.frac_x  0.
 _atom_site.frac_y  0.
 _atom_site.frac_z  0.

Is so much more obvious? Given that people understand what the loop is, I
can't see what they would gain from the unrolled version (apart from
confusion). The real danger is those less experienced who DON'T read a
dictionary and read the latter form may be encouraged to replicate it when
there is more that 1 atom (thus corrupting the CIF structure).

However these are just personal observations, and if the IUCr wants to
qualify the use of DDLm with its own tweaks there is nothing stopping them
from doing so.



On 25/03/10 11:44 AM, "James Hester" <jamesrhester@gmail.com> wrote:

> Herbert, if I can transform your suggestion into a DDLm context: datanames in
> DDLm are currently either 'List' (loopable) or 'Set' (probably unloopable on
> most readings, we need to clarify this).  You are in favour of allowing 'List'
> items to be potentially unlooped if only a single packet is present, that is,
> become equivalent to all DDL2 datanames, which do not have an explicit
> loopability status.
> 
> Regarding your previous email:
> 
> On Fri, Mar 12, 2010 at 11:44 PM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
> 
>>  For a validating parser that consults the dictionary, a programmer
>> should see no difference with the interpreation I suggest.  With the
>> dictionary available, a validating parser would present a single row
>> loop that the dictionary prefers to have be treated as single values
>> as those single values, and a collection of single values that the
>> dictionary prefers to have treated as a loop as a loop.  Nothing
>> is lost, and the user is saved from dealing with an error that
>> need nit be treated as an error,  but at most as a warning.
> 
> Absolutely: this is trivially true for a *validating* parser, which by
> definition has access to the dictionary information.  I am however keen that
> we allow the CIF standards to be implemented in a modular way, so the parsing
> and validation steps can be separated.  Therefore, under the unlooping
> suggestion the validation/dictionary use stage has to take an extra step of
> reconstructing any one-packet loops that were not presented as loops in the
> parsing stage, which was my original point.
> 
>> Inasmuch as, under the current rules, all DDLm methods have to come
>> from dictionaries and not from data files, if the parser is working
>> on a data file for which no dictionary is available, then the approach
>> I propose allows for at least some useful interpretation of a
>> data file -- the same interpretation used by current DDL2 parsers
>> -- treat any collection of single values from the same category
>> as if they came from a single row loop.  I am not saying that every
>> CIF2 data file will be properly readable this way without a dictionary,
>> but many of them will be.
> 
> If a parser is working on a data file for which no dictionary is available,
> you can do whatever nefarious thing you wish to do (including 'useful work')
> with the results of that parse, as the concepts of category and loopability
> have no meaning without a DDL.  All syntactically correct CIF2 data files will
> most definitely be properly readable, as a violation of 'loopability' rules is
> a violation of the DDLm rules, not the syntax rules.  Of course, without a
> dictionary you will be unable to identify what items belong to the same
> 'category' (especially if you unloop single-packet loops, and whatever
> 'category' might mean without a DDL).
> 
>> This is jst another matter of provding a reasonable default coercion
>> for a case that need not be a fatal error to allow more people to get
>> useful work done.
> 
> I hope I have demonstrated in the previous paragraph that not allowing
> single-packet DDLm loops to be presented outside a loop in no way hinders
> useful work that does not rely on a dictionary.  So, moving back to the
> original issue of changing the DDLm standard to allow single packet loops to
> be unlooped: I cannot imagine what useful work is being prevented by not
> allowing one-packet loops to be presented as plain key-value pairs.   I would
> have thought that, far from preventing useful work, the fact that a programmer
> can assume that a given dataname must only ever appear in a loop is a
> simplifying assumption for both reader and writer.  Offering looping choice
> makes the programmer's task more difficult, with no demonstrated benefit (but
> feel free to provide an example of the useful work that is being prevented).
> 
> Furthermore, this extra complexity for implementors is compounded because of
> the extra significance that DDLm ascribes to categories, and the fact that
> there are now relationships between categories similar to database table
> joins.  Efficiently programming the loop-traversing dREL constructs will
> require that all 'List' category packets are held in the same datastructure,
> and allowing one-packet loops to be 'unlooped' will mean that at some stage
> these loops will have to be internally reconstructed.  I have not seen any
> concrete example of an offsetting benefit that would compensate this extra
> work.
> 
> But there's more! If we go down the path of allowing unrolling of
> single-packet loops, the next step is to say, well, what if I represent the
> values taken by my looped datanames as lists?  Then I can express *any* looped
> dataname as a key-value pair! For example, we can turn
> 
> loop_
>  item_a
>  item_b
>  item_c
> 
> 1 2 3
> q r s
> x y z
> 
> into
> 
> ...
> item_a [1 q x]
> ...
> item_c [3 s z]
> ...
> item_b [2 r y]
> 
> And why stop at either/or?  Couldn't I equally well allow that some of my
> looped datanames appear outside the loop with list values, and the rest appear
> in the loop, depending on which bits of my program produce them? Should we
> allow these presentations as well, with concomitant extra burden on
> implementors?  I for one think not.
> 
> all the best,
> James.
> 
>> Regards,
>>  Herbert
>> 
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>   Dowling College, Kramer Science Center, KSC 121
>>        Idle Hour Blvd, Oakdale, NY, 11769
>> 
>>                 +1-631-244-3035
>>                 yaya@dowling.edu
>> =====================================================
>> 
>> On Fri, 12 Mar 2010, James Hester wrote:
>> 
>>> Herbert: I think David was more concerned about the dictionary
>>> structure rather than the datafile presentation, but you raise an
>>> interesting point regarding the equivalence of one-row loops and a set
>>> of key-value pairs.  Semantic information *is* lost by turning a
>>> one-row loop into key-value pairs:  we lose the information that these
>>> dataitems 'belong together', as the key-value pairs are semantically
>>> on the same level as all the other datablock key-value pairs. This
>>> information can, of course, be recovered by reference to a dictionary.
>>> 
>>> The practical effect of this semantic loss is that programmers will
>>> not be able to count on the parser pre-packaging loops for them, and
>>> will thus need to allow for reconstruction of loop datastructures when
>>> performing per-loop operations (such as in dREL methods).   I don't
>>> think this is a big problem, but it is an increase in complexity, and
>>> I do not see what we gain by allowing such an equivalence.
>>> 
>>> On Fri, Mar 12, 2010 at 4:37 AM, Herbert J. Bernstein
>>> <yaya@bernstein-plus-sons.com> wrote:
>>>> 
>>>> I would favor treating a looped presentation of a single row of items as
>>>> valid in all cases, and treating the presentation as individual tags
>>>> and value as equally valid and equivalent.  I also like David's
>>>> suggestion
>>>> of allowing a individual tag and value to be distributed over a loop
>>>> for the same category.  This would start to put us into a parallel
>>>> position
>>>> to the handling of XML attributes.
>>> 
>>> Regarding distributing individual tags and values over a loop, I think
>>> the current DDLm approach of 'Set' and 'List' categories combined with
>>> parent/child relationships is adequate for our needs.
>>> 
>>>> 
>>>> At 10:35 AM -0500 3/11/10, David Brown wrote:
>>>>> 
>>>>> Dear Colleagues,
>>>>> 
>>>>> I assume that we are essentially finished in resolving syntax
>>>>> problems, but in that discussion some items were identified as being
>>>>> related to DDLm rather than syntax, so before we settle into serious
>>>>> dictionary writing we need to understand the DDLm rules.
>>>>> 
>>>>> One item that I believe was raised under this heading was whether,
>>>>> if a loop contained a single set of items, it was necessary to
>>>>> formally include this in a loop structure.  If this is deemed to be
>>>>> necessary, then there has to be some way of identifying the items
>>>>> that must appear in a loop.  The presence in the dictionary of a
>>>>> _category_key.* item would seem to flag this, but it is applied at
>>>>> the level of the category rather than at the level of an individual
>>>>> item.  If the requirement that the loop structure must always be
>>>>> used, then all the items in the category must be loopable, i.e., the
>>>>> category cannot include items that would not normally be included in
>>>>> the loop, items for example that apply equally to all the listed
>>>>> items such as a scale factor that is the same for all the structure
>>>>> factors in a loop.  This seems to be workable, but I am not sure how
>>>>> the legacy CIFs would fit in, since categories may include some
>>>>> listable item and some non-listable items, and I am sure the
>>>>> listable items do not always appear in a loop if there is only one
>>>>> set of such items reported in the CIF.
>>>>> 
>>>>> Is this something that can be clarified fairly easily?  It has an
>>>>> important bearing on how the CIF dictionaries are written.
>>>>> 
>>>>> David
>>>>> 
>>>>> Attachment converted: Macintosh HD:idbrown 55.vcf (TEXT/ttxt) (0046DFC7)
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>> 
>>>> 
>>>> --
>>>> =====================================================
>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>> 
>>>>                  +1-631-244-3035
>>>>                  yaya@dowling.edu
>>>> =====================================================
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> T +61 (02) 9717 9907
>>> F +61 (02) 9717 3145
>>> M +61 (04) 0249 4148
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> 
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> 
>> 
> 
> 

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.