Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] import and loops in ddlm

An addition to my previous email:  the _import.if_dupl flag can be used to fine-tune the behaviour of the import process in the case of duplicated datanames.  The current spec doesn't specify the use of this flag for 'Att' type imports but by extension we have the choice of 'Ignore' 'Replace' 'Exit'.  Herbert and my suggested behaviour would correspond to adding a new alternative: 'Merge'.  This is not a crucial point, as we are talking about dictionaries over which we have control and so we could leave the addition of the 'Merge' option to a later point, when the need arises.

On Wed, Jul 28, 2010 at 10:29 AM, James Hester <jamesrhester@gmail.com> wrote:
I have had time to think about the looping issue, and hope we can now resolve it.   The question has been: will it be legitimate for DDLm data files and dictionaries to unroll loops that have only a single packet and present them as a series of tag-value pairs?

Background

In DDL1, individual data items could be looped, unlooped or both.  In DDL2, no such attribute was defined, and so the behaviour was the same as DDL1 'both'.  In DDLm, the current draft *requires* looping for 'List' categories. 

Previous discussion

In our previous discussion, James and Nick advocated sticking with the current draft.  James pointed out that some semantic information is lost when unrolling, ie the fact that a set of datanames belong together. Nick and James also pointed out that loops could be partially unrolled as well, which didn't look appealing. Herbert and John W were keen to maintain DDL2-style behaviour for backwards compatibility and to avoid presenting users with unnecessary warnings.

And now....

In the previous discussion I had failed to appreciate that DDL1 often behaved in the way that Herbert was advocating, and that therefore DDLm presented a significant restriction compared to DDL1 (and DDL2).  I am not aware of any issues with DDL1 working in this way. I will therefore withdraw my objection to the proposed unrolling behaviour.  Note, of course, that the DDLm and dREL data model requires that Lists are loop structures, but as loop unrolling is essentially a presentational change this model is not affected. 

As a side note for the record, the argument that an unrolled loop is easier to read contains an implicit assumption that the tag-value pairs of an unrolled loop will appear next to each other in the CIF file.  This will indeed be the case for a dictionary, which is carefully edited by humans.  This seems also to have been the case for DDL1 and DDL2 data files, perhaps because software has to be prepared to output these items together in a loop, so outputting them together as tag-value pairs is the most convenient (and logical) choice when unrolling the loop.  So, even though there is no strict requirement that the unrolled tag-values appear together, they most likely will.

If anybody has an objection to this loop unrolling behaviour, now is the time to speak.

So: moving on to David's point 1: assuming that loop unrolling is approved, the dictionary writer can choose freely whether or not to loop DDLm List categories with only one packet.

Point 2: _import statement.  I believe the following: DDLm attributes in the "import" category behave syntactically like any other DDLm attributes.  In particular: (i) they are not significant inside datavalues or comments (ii) they cannot occur in loops with attributes from other categories (iii) They may only appear once in each definition block (iv) if multiple imports are required in a single frame, attributes from the import category should be looped (v) The order that imports are placed in such a loop is not significant

In a logical sense, the dictionary expansion is performed only after the parsing stage.

There remains the question of what to do when attributes imported into a definition contain loops in categories that are already present in the definition.  I think I am agreeing with Herbert when I propose the following behaviour: if a loop that is imported into a definition using 'ATT'-type importation belongs to a category for which a loop already exists in that definition frame, then the loops should be merged according to the CIF1 dictionary merging protocol.  An example is given below (I'm not guaranteeing that I've remembered the attribute names exactly right): a definition that has looped some example cases, but this definition has also imported a template which itself contains some looped examples.  The correct behaviour would be to merge the two example loops into a single long loop.  Please note that this example is *not* the best way to use CIF for an astronomical ontology.

Example
------------
The main dictionary contains:

save_planet.mass
    _name.object_id 'mass'
    _name.category_id 'planet'
    _import_list.id [['Att','mass','template.dic']]
    loop_
        _description_example.case
        _description_example.detail
        '20M'   'A planet twenty times earth mass'
        '0.1M'  'A planet one tenth of earth mass'


And file 'template.dic' contains:

save_mass
    _name.object_id 'mass'
    _type.contents   code
    _type.container   Single
    _type.purpose Observed
    _description.text 'A large mass'
    loop_
        _description_example.case
        _description_example.detail
    1M           'Earth mass'
     1000000M   'A million times earth mass'
  

Resulting in:
save_planet.mass
    _name.object_id 'mass'
    _name.category_id 'planet'
    _type.contents   number
    _description.text 'A large mass'
    loop_
        _description_example.case  
        _description_example.detail
        '20M'   'A planet twenty times earth mass'
        '0.1M'  'A planet one tenth of earth mass'
    1M           'Earth mass'
     1000000M   'A million times earth mass'



On Thu, Jul 22, 2010 at 4:24 AM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote:
Dear Colleagues,

 David raises interesting points.

 1.  I think is makes dictionaries much harder to read when tags for
a single row category are required to be looped instead of being
presented as tag-value pairs, and urge a relaxation of this DDLm
rule to be more sonsistent with DDL1 and DDL2.  It is trivial to
make an automatice converter to impose the more stringent rule,
but I think it to be a mistake.

 2.  This then relates to the second issue that David raises.
If we allow unlooped presentation of tag-value pairs from looped
categories, then we will very rarely, if ever, need to loop imports.
I would urge that the few cases that might justify looping
imports be avoided.  Let us consider the specific scopes of
possible imports:

Dic 'all saveframes in the source file'
Cat 'all saveframes in the specific category'
Grp 'all saveframes in the category with children'
Def 'one saveframe containing a definition'
Att 'import attributes within a saveframe'
Sta 'import enumeration state list only'
Val 'import enumeration default value list only'

Only the last three can raise issues of looping.

Consider ATT first.  From the ddl_import_aug08.pdf presentation, slide
15, it is clear that attributes are to be brought in as tag value
pairs, and it is also clear that this example could not work if
the NAME category were required to be looped.  Fortunately for
this example, the NAME category is classified as a set, but what
are we to do if we wish to be able to write a similar example
for a dictionary item that does have a key item?  David is
suggesting that we should include the _import within the loop,
but what are we then to import -- a complete loop with the loop_?
a partial loop without the loop_ and without some of the tags?
first an import of some tags and then an import of some value?
...?  Of all of these, importing a complete loop with the loop_
seems the safest easiest to read, but if we do that we need to
permit the implicit merging or joining of the loop we import
with whatever is already in the dictionary or has been previously
imported.  That would be simple and clear and consistent with
the later handling of STA and VAL imports as complete loops
on slides 16 and 17.

Bottom line:  What I am suggesting is that the import mechaanism
automatically merge or join multiple categories/loops that
obviously should be merged or joined.  This would allow us to
pull togehter material for a single category from multiple
dictionary imports, and would use precisely the same mechanism
as has long been used in DDL1 cifs and DDL2 cifs and dcitionaries
to merge individual tag value pairs into common categories when
needed.

I think this would be a lot simpler than looping imports.

Regards,
 Herbert =====================================================

 Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
       Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Wed, 21 Jul 2010, David Brown wrote:

I will try again to start a discussion on the DDLm dictionary as opposed to CIF3.

The two items below indicate my interpretation of the way DDLm will deal with a couple
of items where the practice may change from CIF1 or we need to agree on how a new
feature is used.  I don't believe these to be controversial, but I would like to
receive some feedback, even it you think my interpretation is correct.  I have other
more contentious questions to raise later.

1. Are loops required to be present explicitly even when they contain only one set of
items?
-------------------------------------------------------------------------------------

Checking the DDLm dictionary and the proof-of-concept CIF dictionary prepared by Syd,
it seems that the answer is 'yes'.  Therefore I intend to include explicit loops for
all dictionary items where a category key is defined.  Whether this rule can be
enforced in data files at the CIF level is something we might discuss.  In DDL1 such
loops have been optional, but that need not deter us from requiring them in CIFm data
files.  However, in the dictionaries, over which we have some control, there is no
reason why the rule should not be enforced.

2. Use of the IMPORT statement
----------------------------------------
Since the _import statement is used to expand a CIFm dictionary by inserting the same
piece of text in different places, _import must be executed before the dictionary can
be used.  Therefore the _import dataname has a different function from all the other
datanames.  During expansion, the dictionary is read as text with the reading routine
only acting when it encounters an _import data item which it replaces with the
designated text.  '_import' can appear anywhere in the unexpanded dictionary,
including in a loop even though it belongs to a different category since '_import'
will never be encountered when the loop is executed.  It can also appear in delimited
text or as part of a list, even in a comment (though this would not normally make much
sense).  I can see no reason why import statements may not be nested though I have not
yet come across a situation where this might be useful.

Comments please.

David





_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group




--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.