Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Interactions with methods

Title: Re: [ddlm-group] Interactions with methods
Conveniently I am sitting here at ANSTO with James, pondering many things about DDLm and we will be posting several different threads on this mail list in the next day. We are looking at tightening several aspects of the dictionary and the new syntax.

This is a timely revisit of this thread on “pathways through algorithms”.

I will remind everyone that the philosophy behind how methods are written in dREL is simplicity. What we are trying to do in general is encode the relationships between data entities as simply as possible. We are NOT interested in efficiency since dREL is not the calculation engine for a complex data run. It is a symbolic representation of the “correct algorithm” that can be used to check the answer and define relationships. While is can be done one would not use the dREL to calculate the structure factors of 10,000’s of reflections in a protein structure.

If the sequence of calculations can be represented by a linear chain, then I and James can see no need for more than 2 evaluation methods to be defined per data item. To date the methods we have implemented (or will eventually implement) are of this type. The two possible types of methods are: (1) the typical evaluation of a data item from more primitive data items, for example Fsquared from I, and (2) attempting to determine the value of more primitive data from derivative data that is in the data file, for example Fsqaure from F.

Many, if not most, data items will have only one evaluation method. Primitive data which has not been submitted (because derivative data has been submitted) can be reverse engineered by method (2).  

Example: the linear sequence for determining the [U] matrix or anything in this chain) is

Umatrix<->Uij->Uiso
 ^
 |
 v
Bmatrix<->Bij->Biso

The methods would be something like

Umatrix.METHOD1 {create array from Uij components}
Umatrix.METHOD2 {create array from Bmatrix*8pi^2 or divide or whatever}

Bmatrix.METHOD1 {create array from Bij components}
Bmatrix.METHOD2 {create array from Umatrix/8pi^2 or multiply or whatever}

Uij.METHOD1 {extract my values from the Umatrix}
Uij.METHOD2 {create array from Uiso if I was isotropically refined}

Lets say I request Umatrix and there are only Bij in the data file. The sequence will be this, try Umatrix.METHOD1. No Uij, invoke Uij.METHOD1, I detect a circular pathway (such a system will have to be implemented in every interpreter because you have no idea what the user has created in dREL – circular routes are a distinct possibility). Now invoke Uij.METHOD2. I am not an isotropically refined atom so that fails also. Back up the calling stack to invoke Umatrix.METHOD2. No Bmatrix, so invoke Bmatrix.METHOD1. The Bij’s are there. Bmatrix is created. Now Umatrix is created. Voila.

Say the same scenario, but I wanted Uij. You can see the same sequence would occur, so that from the Bij present in the data, I created the Uij. BUT there is no method directly linking Bij and Uij, yet they can be derived.

You can obviously find more efficient ways of calculating these numbers, but dREL is not designed to be an efficient calculating machine. It is designed to be a symbolic language that defines relationships.  

Note that in my dREL methods I continually create circular routes, example, the relationship between Umatrix.METHOD1 and Uij.METHOD1 but because there is a second available method I can continue. If that fails then it can only be that the data cannot be derived.

So far we (James and I) can’t see the need for more than two methods if the calculations can be represented in such a chain (I am sure this is formally provable). Moreover the order in which the two methods is executed is NOT relevant. Finally I am confident that the relationships within all the cifs can be represented in such a chain structure.

Conclusion (until you all contradict this example) is yes, more than one evaluation method is required, but the max is 2. AND there order is not important.

Finally I can inject the method to calculate the Beta_matrix at any point in this chain, while hiding it from the user community, so they never make the mistake of using that dastardly formalism that nearly brought crystallography to a halt :)

Nick

PS David wrt to previous discussions, we agree that Uiso and Uequiv are not the same and need there own data items. Uequiv can be derived from Uij, but NOT the reverse.




On 9/09/09 3:59 AM, "David Brown" <idbrown@mcmaster.ca> wrote:


Here is something to get the ddlm discussion group started.  It is a
question that arises out my attempts to produce a core CIF dictionary in
DDLm

Interaction between users and methods
-------------------------------------

The introduction of methods will result in a dramatic change in the
character of CIF.  The current versions of CIF  are static.  They are
the equivalent of a photograph that records a single event.  It can be
passed to others to study or it can be archived for future use.  The
addition of methods makes CIF dynamic.  To continue the analogy, it is
more like a video that can evolve under dictionary control as the CIF
expands itself by adding additional items.  It might be more accurate to
compare CIFm to a video game in which the user interacts with the CIF to
direct the way in which it evolves.  This raises some question that need
to be answered before we can complete the conversion of the dictionaries
to DDLm.  The way we write the dictionary will determine which route we
choose.

There are various levels at which interaction with the user can occur.  
We can provide for the widest possible interaction or we can restrict
the activities that the dictionary will permit.  Three different levels
of interaction are described and for the sake of having a concrete
example I consider the case of Fobs^2 which can be calculated either by
squaring Fobs or by applying absorption and Lp corrections to I.

1. We can limit each item to a single method.  In this case we have to
decide which is the best method to include.  In the structure factor
example should F^2 be calculated from F or from I.  An hierarchical tree
of items would be needed to make sure that all calculation routes were
covered..   We discussed this option in Osaka and the consensus was that
we should allow multiple methods.  However, allowing only one method
would simplify the interaction between the user and the CIF - the user
could ask for an item and if it were possible to do so,  it would be
calculated in a unique way from the other items present.  The user
interaction is minimal.  If a suitable application was written, the user
at this level could validate all the values in the CIF, or optionally
replace the existing derived values with newly calculated ones.  This
route would not require a change to DDLm but the dictionary manager
needs to ensure a proper hierarchy of data items.

2. If we allow multiple methods, we need a protocol for what happens if
the value can be calculated in more than one way.  We could either
arrange the methods in an order of priority, or we could allow the user
to select the method to be used (option 3 below).  In the first case the
user needs to know which method has been used even though the choice is
beyond his or her control.  It is not possible to use hindsight to infer
which method has been used  because the method selected depends not only
on the dictionary but also on what items are available in the CIF at the
time the selection was made.  Since the CIF is now dynamic it can evolve
over time, so that the same request made at another time might result in
the calculation following a different path.  It is naive to argue that
since all methods are equivalent it is immaterial how the calculation
was performed.  This would be true if the CIF were fully
self-consistent, but in the real world this cannot be guaranteed and at
the very least the user needs to be able to trouble shoot if things go
wrong.   The program must therefore provide the user with an
intelligible record of the methods used.  Since from the point of view
of the application all methods are equivalent, any information about the
method invoked must come from the dictionary.  One way this can be done
is by providing each method with a brief description (e.g., 'Fobs^2
calculated from Fobs') that can be written to the log every time the
method is invoked..  If we go this route we need to modify DDLm to
provide for the log message.

3. Alternatively the user could select the method to be used.  This
would allow the user, for example, to change the absorption correction
(e.g. by editing the CIF) and then ask the CIF to recalculate the values
of F^2 from the intensities rather than from the structure factors.  In
this case the user must be able to select the method and specify that
the existing values of F^2  should be overwritten.  This level of
interaction would involve a mixture of editing and calculation, with the
user controlling both the CIF and the way the dictionary is accessed.  
It would provide the maximum degree of interaction and allow many
different manipulations to be carried out using the CIF, for example
looking to see what effect the absorption correction has on the
structure.  If we adopt this approach, not only will we need messages to
inform the user what methods were available as well as messages to
create a log (as in 2 above), and  we must make provision for methods to
be directly accessed.  Again a change in DDLm is needed.

The decision we need to take is what level of interaction we are aiming
for.  If we decide to restrict CIFm to, say, level 2 above, we should
ask what happens if applications attempt to override this restriction.  
To paraphrase (the real) Murphy's Law: if an application can be written
to override the artificial restrictions we impose, sooner or later
someone will write such an application.  Should we make life easy for
them (and ourselves) by building this feature into the dictionary now,
or should we keep our sights low and possibly regret it later?

What ever decision we make will affect the way in which the dictionary
has to be written.  In case 1 we need to develop the hierarchy, in case
2 we need to develop a priority and provide some means of logging the
method used, in case 3 we need to provide a way of addressing individual
methods.  I cannot complete the conversion of the core dictionaries
until we have decided at what level the user will be able to interact
with the dictionary.




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.