[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DDLm, dREL, images and NeXus

Before responding to Doug, I might comment that, although we are thinking about NeXuS in particular here, we should make sure that whatever scheme we come up with is generic enough to allow translations to be implemented from (and to) other data description schemes (e.g. data repositories).

On Mon, Dec 15, 2008 at 3:22 PM, Doug <doug.duboulay@gmail.com> wrote:

On Fri, 12 Dec 2008, James Hester wrote:
> Let me flesh out a proposal in some detail, so that holes can be picked in
> it.

The fact that the NeXus data model effectively supports infinite recursion on
some elements and also that NXdata can hold many things that will not
have CIF equivalents both suggest that NeXus -> CIF conversion could be lossy.

> First, an overall view.
>
> At the moment, a DDLm/dREL engine is initialised with a set of DDLm
> dictionaries.

To elaborate a little bit, a dictionary is compiled to jython/java byte
code as a set of classes, one for each category and containing
methods for get/set and evaluate.  Although DDLm goes to some
effort to express a hierarchy of categories, at least in the dREL prototype
engine, at the implementation level, those categories were flattened to the
two-level CIF model.

As an aside, there are currently two alternative implementations for dealing with dREL and DDLm.  One has been produced by Doug, Nick, Syd and Ian, which I would characterise as 'static': a DDLm dictionary is actually converted to executable code at compile time, allowing distribution to end users of an executable dictionary.   One alternative approach which I have been pursuing is to load the DDLm dictionary into memory at runtime and execute the dREL code as needed.  In either case I think my abstract description above gives the essential gist of what happens.

The nice part about the CIF + DDL way of working is that no particular implementation is mandated, but the correct behaviour is specified.  I think this is why Herbert would prefer to see as much of the NeXuS to CIF conversion logic in a DDLm/dREL form.

> When passed a CIF instance, it will return values of any
> datanames that are contained in the CIF instance or that it is capable of
> calculating from those datanames that are already in the instance.

An instance of the 2-level dictionary object is created and then populated
with the raw CIF data. Any items for which a "?" was recorded against them are
subsequently evaluated where possible.

Thereafter, to print the CIF, the 2-level dictionary object is walked/visited
and CIF tag/values are written to some output device.
To generate hierarchical NeXus from CIF, the dREL engine would have to be
reworked, if it hasn't been already.

To be honest, I was tackling only the 'from NeXuS to CIF' issues at this stage, as they are the most difficult.

> Now,
> what I envision as a 'translating' DDLm engine is initialised as before
> with the standard DDLm dictionaries, but also with two further
> dictionaries: a 'NeXuS dictionary' and a 'translation dictionary' (contents
> of these explained later).

Those two dictionaries would currently be precompiled and created as above.
I suspect the current dREL interpreter can not understand more than one
dictionary simultaneously. Concatentation of dictionaries at the compilation
stage might be possible, but probably isn't what you want, because that
would likely embed Nexus names and value in the result CIF.

My understanding is that the implementation of which you speak only fills in the question marks in the supplied CIF file: so presumably any NeXuS-specific names would not be output.

> Finally, it requires a 'NeXuS plugin'. Now,
> when passed a CIF instance the DDLm engine works as before.  When passed a
> NeXuS instance, it returns values of CIF datanames that it can calculate.
>
> Now for an explanation of these various extra bits.
>
> 1.  The 'NeXuS' dictionary is just another DDLm dictionary.  It contains
> definitions for datanames using a CIF namespace: e.g. _nexus.slit_height.
> The linkage to a NeXuS file is accomplished using a set of new DDLm
> attributes, which work like the current 'xref' attributes: in the header
> section of this 'NeXuS' dictionary file the various versions of the NeXuS
> standard are assigned a short code in a loop.  Each of the definitions in
> the body of the dictionary then contains two new DDLm attributes:
> _alien.code (referencing the version of the standard in the header) and
> _alien.location (where to find the dataname).  The syntax of the value of
> _alien.location might be borrowed from, for example, XPath in the case of
> NeXuS.

XPath can provide a mechanism to locate items in an XML document tree,
but it doesn't provide a mechanism to specify/generate the structure of that
tree.  e.g. //NXdata/@name  might get a nodeset corresponding to a list of
name attribute nodes for potential use as CIF tags, but says nothing about
the location of the NXdata elements.
i.e. this is helpful for NeXus -> CIF, but not for CIF -> NeXus

Yes, I was only aiming to solve the NeXuS -> CIF problem.

> The data definitions containing _alien.location attributes could be
> considered 'raw' NeXuS data, which may not map easily onto CIF datanames.
> Therefore, this dictionary could contain further DDLm definitions of
> dataitems (still in the CIF 'nexus' namespace) which contained dREL methods
> for manipulating the raw datanames into something that mapped more directly
> into CIF.  This is where one might foresee adding a few more builtin
> functions to dREL to ease e.g. image processing.

I get the feeling that somewhere there will need to be a list that says
something like:
nexus4:some_cat.some_item1   ?
nexus4:some_cat.some_item2   ?
...
- in order to trigger the evaluations. Though maybe they would be deduced
by a CIF full of "?" on the request side.

The idea would be to trigger all the evaluations as usual, and because you have loaded in the 'translate' DDLm dictionary over the top of the normal dictionary, at some stage the evaluation chain will access NeXuS-derived values instead of primitive values.

[example from previous email deleted]


Just as an alternative:

<xsl:stylesheet>
<xsl:output method="text"/>
<xsl:template match="NXuser">
  <xsl:if test="position()= 1">  <!-- if multiple NXuser elements -->
    <xsl:text>loop_&#xA;</xsl:text>   <!--  append newline char in hex -->
    <xsl:text>          audit_author.name&#xA;</xsl:text>
    <xsl:text>          audit_author.affiliation&#xA;</xsl:text>
    <xsl:text>          audit_author.phone&#xA;</xsl:text>
    <xsl:text>          audit_author.fax&#xA;</xsl:text>
    <xsl:text>          audit_author.email&#xA;</xsl:text>
  </xsl:if>

  <xsl:for-each select="./name">
     <xsl:variable name="audit_author" select="parent::node()"/>
       <xsl:call-template name="dumpItem">
          <xsl:with-param name="item" select="."/><!--i.e. name -->
       </xsl:call-template>
       <xsl:call-template name="dumpItem">
          <xsl:with-param name="item" select="$audit_author/affiliation"/>
       </xsl:call-template>
       <xsl:call-template name="dumpItem">
        <xsl:with-param name="item" select="$audit_author/telephone_number"/>
       </xsl:call-template>
       <xsl:call-template name="dumpItem">
          <xsl:with-param name="item" select="$audit_author/fax_number"/>
       </xsl:call-template>
       <xsl:call-template name="dumpItem">
          <xsl:with-param name="item" select="$audit_author/email"/>
       </xsl:call-template>
     <xsl:text>&#xA;</xsl:text>
  </xsl:for-each>
</xsl:template>

<xsl:template name="dumpItem">
 <xsl:param name="item"/>
 <xsl:text> </xsl:text>
 <xsl:choose>
   <xsl:when test="$item !=''">
       <!-- add space and parentheses checks here -->
      <xsl:value-of select="$item"/>
   </xsl:when>
   <xsl:otherwise>
      <xsl:text>.</xsl:text>
   </xsl:otherwise>
 </xsl:choose>
</xsl:template>
</xsl:stylesheet>


- a simple (untested) XSLT stylesheet, usable by a significant number of
current XSLT processing engines that could transform NeXus/NXuser data in
XML format directly into CIF. Some XSLT engines provide extension options
for doing more complicated transformations when needed. NeXus HDF would need
transformation to XML first. A separate stylesheet would need to be defined
to do the reverse transformation, assuming that the CIF was first converted to
some XML format.

(not that XSLT is really what I would be looking for in a "mapping" file,
but its good to be aware of other possibilites - but maybe there is already
a CIF->CML->NeXus converter and vice versa?)

This is an intriguing example and I think if the actual values themselves don't need manipulation it would do a good job.  Perhaps the initial transformation to what I called previously a 'raw NeXuS' CIF could be best done by XSLT, using the conventions of that program to do the renormalisation.   Manipulations of data values could then be done by dREL routines in a 'translate' dictionary.  There is however an important practical limitation of this scheme, which is that trying to deal with XML files that have images in them is ridiculously slow even with current desktop processing power (that is our experience at the Bragg, anyway).

Also, Nick S. tells me that back in the late 90s he produced an XSLT-based transformation from CIF and DDL to XML, and was able to use standard XML tools to validate the CIF-derived XML file against the DDL-derived XML schema.  Maybe the time has come for this tool to be dusted off.

James.

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://scripts.iucr.org/mailman/listinfo/comcifs

Reply to: [list | sender only]