Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DDLm, dREL, images and NeXus

  • To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
  • Subject: Re: DDLm, dREL, images and NeXus
  • From: Nick Spadaccini <nick@csse.uwa.edu.au>
  • Date: Tue, 20 Jan 2009 10:07:50 +0900
  • Authentication-Results: postfix;
  • In-Reply-To: <C56D9039.10460%nick@csse.uwa.edu.au>
Title: Re: DDLm, dREL, images and NeXus
Here is something I sent last month regarding the discussions on NeXus. I suspect it was blocked at the server


On 16/12/08 5:04 PM, "Nick Spadaccini" <nick@csse.uwa.edu.au> wrote:

I am tracking this discussion but don’t have time at the moment for a long and considered response.   I am slowly getting something together though. I can see a way of doing much of what Herb suggests without making to great a change to the current form of DDLm/dREL and certainly avoiding the need to extend DDLm to deal with various alien attributes. It has to do with making the use of methods in a dictionary context sensitive (really just exploiting the import mechanism).

What I am thinking is that, for instance, _cell_volume has an evaluation method which will generate its value from _cell_vector_a etc. What is important is all the definition information associated with _cell_volume, that has to be consistent. But I can import all this in to another dictionary, where I have an overwrite of the method. In this dictionary a request for _cell_volume executes its method, which pokes in to the DOM representation of an imported NeXus file and extracts its value, if it is there. It is OK it isn’t there because I can take what I find back to the original dictionary and the method there will calculate the _cell_volume for me. I can have a method that takes a CIF formalised data item and injects in to a DOM representation ready for export OUT to NeXus. The essence is that I use imports to bring in the method I want, “fit for purpose”. The neatness of this approach is that most the dictionary is constant, consistent and correct, ONLY the methods change as needed.

The problem now is the API. The guts of the dREL parser will do most of what you want. We will need to develop an extension that takes a NeXus and reads it into its DOM formalism. This can be generalized and much of the Java (and Python) library already exists. But what about the complications of extending the API. Well in the newest form of DDLm we created a new category called _function where all the “functions” to be used in dREL are defined. Since we can access all of Python in our current implementation we should be able to build functions that connect dREL to a DOM trawler relatively easy (says the man who hasn’t had time to look at dREL in the last 6 months).

These are my initial thoughts, I will go a mull them over to see if I am making sense.


On 16/12/08 11:37 AM, "James Hester" <jamesrhester@gmail.com> wrote:

Before responding to Doug, I might comment that, although we are thinking about NeXuS in particular here, we should make sure that whatever scheme we come up with is generic enough to allow translations to be implemented from (and to) other data description schemes (e.g. data repositories).

On Mon, Dec 15, 2008 at 3:22 PM, Doug <doug.duboulay@gmail.com> wrote:

On Fri, 12 Dec 2008, James Hester wrote:
> Let me flesh out a proposal in some detail, so that holes can be picked in
> it.

The fact that the NeXus data model effectively supports infinite recursion on
some elements and also that NXdata can hold many things that will not
have CIF equivalents both suggest that NeXus -> CIF conversion could be lossy.

> First, an overall view.
>
> At the moment, a DDLm/dREL engine is initialised with a set of DDLm
> dictionaries.

To elaborate a little bit, a dictionary is compiled to jython/java byte
code as a set of classes, one for each category and containing
methods for get/set and evaluate.  Although DDLm goes to some
effort to express a hierarchy of categories, at least in the dREL prototype
engine, at the implementation level, those categories were flattened to the
two-level CIF model.

As an aside, there are currently two alternative implementations for dealing with dREL and DDLm.  One has been produced by Doug, Nick, Syd and Ian, which I would characterise as 'static': a DDLm dictionary is actually converted to executable code at compile time, allowing distribution to end users of an executable dictionary.   One alternative approach which I have been pursuing is to load the DDLm dictionary into memory at runtime and execute the dREL code as needed.  In either case I think my abstract description above gives the essential gist of what happens.

The nice part about the CIF + DDL way of working is that no particular implementation is mandated, but the correct behaviour is specified.  I think this is why Herbert would prefer to see as much of the NeXuS to CIF conversion logic in a DDLm/dREL form.

> When passed a CIF instance, it will return values of any
> datanames that are contained in the CIF instance or that it is capable of
> calculating from those datanames that are already in the instance.

An instance of the 2-level dictionary object is created and then populated
with the raw CIF data. Any items for which a "?" was recorded against them are
subsequently evaluated where possible.

Thereafter, to print the CIF, the 2-level dictionary object is walked/visited
and CIF tag/values are written to some output device.
To generate hierarchical NeXus from CIF, the dREL engine would have to be
reworked, if it hasn't been already.

To be honest, I was tackling only the 'from NeXuS to CIF' issues at this stage, as they are the most difficult.

> Now,
> what I envision as a 'translating' DDLm engine is initialised as before
> with the standard DDLm dictionaries, but also with two further
> dictionaries: a 'NeXuS dictionary' and a 'translation dictionary' (contents
> of these explained later).

Those two dictionaries would currently be precompiled and created as above.
I suspect the current dREL interpreter can not understand more than one
dictionary simultaneously. Concatentation of dictionaries at the compilation
stage might be possible, but probably isn't what you want, because that
would likely embed Nexus names and value in the result CIF.

My understanding is that the implementation of which you speak only fills in the question marks in the supplied CIF file: so presumably any NeXuS-specific names would not be output.

> Finally, it requires a 'NeXuS plugin'. Now,
> when passed a CIF instance the DDLm engine works as before.  When passed a
> NeXuS instance, it returns values of CIF datanames that it can calculate.
>
> Now for an explanation of these various extra bits.
>
> 1.  The 'NeXuS' dictionary is just another DDLm dictionary.  It contains
> definitions for datanames using a CIF namespace: e.g. _nexus.slit_height.
> The linkage to a NeXuS file is accomplished using a set of new DDLm
> attributes, which work like the current 'xref' attributes: in the header
> section of this 'NeXuS' dictionary file the various versions of the NeXuS
> standard are assigned a short code in a loop.  Each of the definitions in
> the body of the dictionary then contains two new DDLm attributes:
> _alien.code (referencing the version of the standard in the header) and
> _alien.location (where to find the dataname).  The syntax of the value of
> _alien.location might be borrowed from, for example, XPath in the case of
> NeXuS.

XPath can provide a mechanism to locate items in an XML document tree,
but it doesn't provide a mechanism to specify/generate the structure of that
tree.  e.g. //NXdata/@name  might get a nodeset corresponding to a list of
name attribute nodes for potential use as CIF tags, but says nothing about
the location of the NXdata elements.
i.e. this is helpful for NeXus -> CIF, but not for CIF -> NeXus

Yes, I was only aiming to solve the NeXuS -> CIF problem.

> The data definitions containing _alien.location attributes could be
> considered 'raw' NeXuS data, which may not map easily onto CIF datanames.
> Therefore, this dictionary could contain further DDLm definitions of
> dataitems (still in the CIF 'nexus' namespace) which contained dREL methods
> for manipulating the raw datanames into something that mapped more directly
> into CIF.  This is where one might foresee adding a few more builtin
> functions to dREL to ease e.g. image processing.

I get the feeling that somewhere there will need to be a list that says
something like:
nexus4:some_cat.some_item1   ?
nexus4:some_cat.some_item2   ?
...
- in order to trigger the evaluations. Though maybe they would be deduced
by a CIF full of "?" on the request side.

The idea would be to trigger all the evaluations as usual, and because you have loaded in the 'translate' DDLm dictionary over the top of the normal dictionary, at some stage the evaluation chain will access NeXuS-derived values instead of primitive values.

[example from previous email deleted]


Just as an alternative:

<xsl:stylesheet>
<xsl:output method="text"/>
<xsl:template match="NXuser">
   <xsl:if test="position()= 1">  <!-- if multiple NXuser elements -->
     <xsl:text>loop_&#xA;</xsl:text>   <!--  append newline char in hex -->
     <xsl:text>          audit_author.name <http://audit_author.name> &#xA;</xsl:text>
     <xsl:text>          audit_author.affiliation&#xA;</xsl:text>
     <xsl:text>          audit_author.phone&#xA;</xsl:text>
     <xsl:text>          audit_author.fax&#xA;</xsl:text>
     <xsl:text>          audit_author.email&#xA;</xsl:text>
   </xsl:if>

   <xsl:for-each select="./name">
      <xsl:variable name="audit_author" select="parent::node()"/>
        <xsl:call-template name="dumpItem">
           <xsl:with-param name="item" select="."/><!--i.e. name -->
        </xsl:call-template>
        <xsl:call-template name="dumpItem">
           <xsl:with-param name="item" select="$audit_author/affiliation"/>
        </xsl:call-template>
        <xsl:call-template name="dumpItem">
         <xsl:with-param name="item" select="$audit_author/telephone_number"/>
        </xsl:call-template>
        <xsl:call-template name="dumpItem">
           <xsl:with-param name="item" select="$audit_author/fax_number"/>
        </xsl:call-template>
        <xsl:call-template name="dumpItem">
           <xsl:with-param name="item" select="$audit_author/email"/>
        </xsl:call-template>
      <xsl:text>&#xA;</xsl:text>
   </xsl:for-each>
</xsl:template>

<xsl:template name="dumpItem">
  <xsl:param name="item"/>
  <xsl:text> </xsl:text>
  <xsl:choose>
    <xsl:when test="$item !=''">
        <!-- add space and parentheses checks here -->
       <xsl:value-of select="$item"/>
    </xsl:when>
    <xsl:otherwise>
       <xsl:text>.</xsl:text>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
</xsl:stylesheet>


- a simple (untested) XSLT stylesheet, usable by a significant number of
current XSLT processing engines that could transform NeXus/NXuser data in
XML format directly into CIF. Some XSLT engines provide extension options
for doing more complicated transformations when needed. NeXus HDF would need
transformation to XML first. A separate stylesheet would need to be defined
to do the reverse transformation, assuming that the CIF was first converted to
some XML format.

(not that XSLT is really what I would be looking for in a "mapping" file,
but its good to be aware of other possibilites - but maybe there is already
a CIF->CML->NeXus converter and vice versa?)

This is an intriguing example and I think if the actual values themselves don't need manipulation it would do a good job.  Perhaps the initial transformation to what I called previously a 'raw NeXuS' CIF could be best done by XSLT, using the conventions of that program to do the renormalisation.   Manipulations of data values could then be done by dREL routines in a 'translate' dictionary.  There is however an important practical limitation of this scheme, which is that trying to deal with XML files that have images in them is ridiculously slow even with current desktop processing power (that is our experience at the Bragg, anyway).

Also, Nick S. tells me that back in the late 90s he produced an XSLT-based transformation from CIF and DDL to XML, and was able to use standard XML tools to validate the CIF-derived XML file against the DDL-derived XML schema.  Maybe the time has come for this tool to be dusted off.

James.

cheers

Nick

--------------------------------
Dr N. Spadaccini
School of Computer Science & Software Engineering

The University of Western Australia    t: +(61 8) 6488 3452
35 Stirling Highway                    f: +(61 8) 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au



cheers

Nick

--------------------------------
Dr N. Spadaccini
School of Computer Science & Software Engineering

The University of Western Australia    t: +(61 8) 6488 3452
35 Stirling Highway                    f: +(61 8) 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au



Reply to: [list | sender only]