Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DDLm, dREL, images and NeXus

Let me flesh out a proposal in some detail, so that holes can be picked in it.

First, an overall view. 

At the moment, a DDLm/dREL engine is initialised with a set of DDLm dictionaries.  When passed a CIF instance, it will return values of any datanames that are contained in the CIF instance or that it is capable of calculating from those datanames that are already in the instance.  Now, what I envision as a 'translating' DDLm engine is initialised as before with the standard DDLm dictionaries, but also with two further dictionaries: a 'NeXuS dictionary' and a 'translation dictionary' (contents of these explained later).  Finally, it requires a 'NeXuS plugin'. Now, when passed a CIF instance the DDLm engine works as before.  When passed a NeXuS instance, it returns values of CIF datanames that it can calculate.

Now for an explanation of these various extra bits.

1.  The 'NeXuS' dictionary is just another DDLm dictionary.  It contains definitions for datanames using a CIF namespace: e.g. _nexus.slit_height.  The linkage to a NeXuS file is accomplished using a set of new DDLm attributes, which work like the current 'xref' attributes: in the header section of this 'NeXuS' dictionary file the various versions of the NeXuS standard are assigned a short code in a loop.  Each of the definitions in the body of the dictionary then contains two new DDLm attributes: _alien.code (referencing the version of the standard in the header) and _alien.location (where to find the dataname).  The syntax of the value of _alien.location might be borrowed from, for example, XPath in the case of NeXuS.

The data definitions containing _alien.location attributes could be considered 'raw' NeXuS data, which may not map easily onto CIF datanames.  Therefore, this dictionary could contain further DDLm definitions of dataitems (still in the CIF 'nexus' namespace) which contained dREL methods for manipulating the raw datanames into something that mapped more directly into CIF.  This is where one might foresee adding a few more builtin functions to dREL to ease e.g. image processing.

Note also that one might, instead of using an '_alien.location' DDLm attribute, define a new dREL builtin (like 'nexus_locate(string)'), and provide a simple dREL expression calling this builtin.

2. The meaning of the 'NeXuS' plugin to the DDLm engine is that it enables the DDLm engine to understand the '_alien.location' attribute and use it to return a dREL-compatible value.  It would also supply extra builtin functions if necessary.

3. The 'translation dictionary' contains alternative definitions of items in the standard CIF dictionaries, where the dREL methods used to derive those items in the standard dictionaries are replaced by dREL manipulations of dataitems defined in the 'NeXuS' dictionary.  When it is read into the DDLm/dREL engine, these new dREL methods could either replace the old ones, or be added as alternative methods of type 'translation'.  This then allows lots of flexibility in terms of checking derived values in a NeXuS file against those values derived via standard CIF dREL methods from primitive values from that same NeXuS file.

So, in summary, in order to implement this scheme, one needs about 6 new DDLm attributes and a few more builtin dREL functions.  The crucial work is in designing the behaviour of the '_alien.location' attribute to properly capture the information.

One thing to think about is the introduction of an opaque data type into dREL, so that e.g. image data returned from a NeXuS file need not be specified as an array of numbers, but simply as an object, which can only be passed to a builtin function and not otherwise manipulated.

Advantages of this scheme:
(1) dREL simplicity is preserved (I think this is important);
(2) implemented as a modular addon to a normal DDLm engine;
(3) easily extensible to other data formats;

Here follows a simple example.  Note that a NeXuS file may contain multiple 'NXuser' groups, each of which may contain one or more names but only a single affiliation.  By specifying that the corresponding CIF category is a 'List' category, it can be deduced by the NeXuS plugin that each name in the NeXuS file is a new entry in the category loop.  I have accomplished renormalisation by walking the tree and returning the items in the order encountered.  The category key is used to determine when to duplicate an item; so in this case, every time the key is encountered, the associated value of the target item is returned.  Otherwise, there would be fewer affiliations than names in cases where more than one name belongs to a single affiliation.

There is no additional processing of the raw nexus datanames in this example.
==============================
The 'NeXuS' dictionary file:
==============================
data_nexus
 _dictionary.namespace       nexus
  loop_
   _dictionary_alien.code
   _dictionary_alien.type              
   _dictionary_alien.version         
   _dictionary_alien.uri          
   nexus4        nexus        4.0     www.nexusformat.org
...
save_RAW_AUTHOR
    _definition.class        List     #items go into a loop
    _category_key.generic   '_raw_author.user_name'  #one value per user_name
    ...
save_

save_raw_author.user_name
    _category.id       raw_author
    _type.container    Single
    _type.contents     Text
    _alien.code        nexus4
    _alien.location   "NXuser:name" #renormalise by returning all names in order
                                    #encountered when walking the tree

save_

save_raw_author.affiliation
    _category.id       raw_author
    _type.container    Single
    _type.contents     Text
    _alien.code        nexus4
    _alien.location    "NXuser:affiliation" #renormalise by returning affiliation
                                            #each time name encountered in tree
save_
...

================
The translation file
================
data_nexus_translate
    ...
save_audit_author.name
loop_
    _method.purpose
    _method.expression
    'translate'
;
with aa as audit_author
with rn as nexus:raw_author
aa.name = rn.user_name
;
save_


On Thu, Dec 11, 2008 at 11:32 AM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote:
Dear Colleagues,

  This is a distillation of an email conversation James Hester and I
have been having since the Osaka meeting.  We both feel that it
would be helpful if others were to join in and express their views.
We have been discussing the interaction among DDLm, dREL, images and
NeXus.  We agree on most points, and disagree on a few, and hope,
by opening up the discussion, to arrive at a consensus.

  What is driving this discussion is a need to understand how best to
manage image data in the context of both imgCIF and NeXus, and to do
so in a way that is consistent with the recent adoption of DDLm as
the target framework for new work on CIF dictionaries.

  It must be clearly understood that it is highly unlikely that a
single standard will ever be adopted for crystallographic diffraction
images, much less for the broader context of pixel-based data in
structural biology.  The best we can hope for right now is to have
some number of clearly defined image data frameworks, and agreed
algorithms for conversion among them.  There are many frameworks
to consider, but two that are very close to achieving the goal of becoming
inter-operable in the immediate future are imgCIF and NeXus.  What is
missing is a formal language within which to specify how to move between
them.

  We could, of course, just come up with a verbal description of how
to move between imgCIF and NeXus and a couple of example conversion
programs written ad hoc in whatever language might come to mind.  However,
the effort being expended on dREL, the supporting language for DDLm,
suggests the possibility of building on dREL as a base to do this job
by extending dREL to have the capability of working with NeXus (dREL
is already capable of dealing with CIF).  James has made the counter
proposal of leaving dREL as just a CIF-specific language and keeping
the CIF-NeXus conversion algorithm specification as a matter for a
different language and/or API.  James has also suggested the further
step of stripping out the built-in functions from dREL and dealing with
just a very stable dREL language specification in one instance and a
perhaps evolving API (list of builtin functions available in dREL) on the
other:

"My comment at this stage would be that defining a coupling mechanism
between CIF and a given language is not a large task, due to the
simplicity of the CIF syntax, whereas adding lots of stuff to dREL
would be a serious task and has some important downsides (loss of
simplicity being an important one). Apropros the
simplicity of the coupling mechanism, I (predictably) quite like my
Python model of a CIF file as a hash table of CIF data block objects
indexed by datablock name, and the datablock objects are themselves
hash tables of strings/lists of strings indexed by dataname.  This
model would appear to translate pretty easily into most other
languages.  What then remains is some syntactic sugar (the use of
square brackets to do key-based lookup is nice in dREL) which can be
replaced in another language by a few standard methods."

There was a lot more to the discussion, but let us try to settle a
direction:

Should we be trying to extend dREL to support more than just CIF,
specifically NeXus, making something we might call dREL++, or should the
language for this broader task be something distinct from dREL with a
distinct name.  In practice, in either case, I suspect all of this will be
built on a python base, or something similar, as James suggests, but
names do matter,

Comments please.

Regards,
  Herbert

=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 yaya@dowling.edu
=====================================================
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://scripts.iucr.org/mailman/listinfo/comcifs



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

Reply to: [list | sender only]