Modern crystallographic algorithms and data structures (a very personal approach)

Joerg Bergmann

Ludwig-Renn-Allee 14, D-01217 Dresden, Germany.
E-mail: email@jbergmann.de - WWW: http://www.bgmn.de

1. Introduction

We will discuss the structure of crystallographic data indepth. As explained, advanced crystallographic algorithms need advanced crystallographic data structures for success.

I started learning programming in 1978 on some main-frame like computers. Well, we had some interactive computer facilities in our X-ray lab, but the computer programs as written in those days demanded a non-intuitive amount of input; showing nearly no inner intelligence of chosing default values. I begun to change that. Thus, I learnt that an intelligent program which hides the user from the need of input is a step ahead for an interactive program.

By developing these programs for some years, my programs were enriched with a lot of non-numeric stuff. While dealing with lots of non-numeric code, I looked around for solutions. Cite: "The choice of data representation is much complicated, often; it is determined by more than only the possibilities. It is related to the operations which should be carried out on the data." [1] From this point on, arrays (the usual data representation in Fortran) became one minor data representation amongst dozens of other possible data representations. Arrays are most likely good for numerics. They can be ineffective for handling non-numeric data. A reflection table or table of atomic positions is partially of non-numeric content.

After two dozens of years of writing computer programs (one dozen years working with C) and learning the common standardizations for crystallographic data I feel that Niklaus Wirth's idea is valid for external data representations too. Most of the common crystallographic standardizations date back to the time of old Fortran coded mainframes. Storage area was restricted, and so was the possible code. The external data representations of those days correspond to this type of code. The crystallographic community has fixed the basic ideas of such external data representations as their standards. By doing so, the algorithms of those days were also fixed. The development of up-to-date algorithms using such antiquated external data representations is very expensive. I will explain this by some examples.

2. Representation of crystal structure

From a practical point of view, the common representation of atomic positions is somewhat strange. It defines an asymmetric part of the full unit cell, and this part is filled with (possibly partial filled) atomic sites. This assymetric unit is expanded to generate the full contents of the unit cell. This is a purely theoretical point of view to crystallography, dealing with some periodic distribution of arbitrary values. But, in practice, we deal with atoms in a variety of situations. By handling atoms in the assymetric unit, one needs extra code and data to:

decide whether the atom is on a special position and on which type

therefore decide what is the maximum occupation factor of the atom

therefore decide about the symmetry conditions of the atom

To overcome these problems, I do not use such data representation in my programs. Instead, I have an external file containing ASCII information about all possible space groups. For example:

Wyckoff=i N=6

x 2*x 0

-2*x -x 1/3

x -x 2/3

-x -2*x 0

2*x x 1/3

-x x 2/3

is part of the entry for spacegroup no. 181. The expressions will be handled by the DATA module as mentioned below.

In the structure description, one needs to give the space group (Hermann Mauguin) and the Wyckoff symbol for every atomic position.

Thus, a difference to the common representation is that input data must define only the free parameters of the Wyckoff position. In the above case, only x=... must be defined.

Another difference to the common representation is that for every Wyckoff position, a setting of 1 (one) atom is valid. No partial atom values are needed to define full occupancy. Setting of 0.5 always means 50 per cent probability of having an atom at this position.

As mentioned above, this code is able to generate the symmetry of the special positions automatically. So an anisotropic Debye-Waller factor can be set simply by typing

TDS=ANISO

The algorithms which do that will be described in detail in the next section.

3. Automatic generation of special positions symmetry

By using this new external format, automatic correction of the Debye-Waller factor symmetry becomes simple. As explained, we have one set of atomic coordinates for each atomic position, instead of multiple coordinates according to the multiplicity ratio of general and special positions. So, we have only one positive definite matrice Bij in the anisotropic Debye-Waller factor:

exp(-B11*h*h-B22*k*k-B33*l*l-2*B12*h*k-2*B13*h*l-2*B23*k*l)

The remaining task is to automatically adjust the symmetries of the Bij of the atomic positions according to the crystal symmetry. This will be done by the following algorithm:

calculate the numeric coordinates of the first atomic position of the special position.

use these for x,y,z of the general coordinate triplets.

This will result in a set of general positions. A multiple of them (the ratio of general to special position) will coincide with each atomic position

Compute the derivatives of each general position to x,y,z. You will get a 3x3 transformation matrix for each general position.

So, you have a set of transformation matrices for each atomic position (the ratio of general to special positions again).

From each atomic position's set of transformation matrices, construct the G¹ operator. For details, see [2].

Set up one (parameterised) positive definite matrix.

Transform it using the Gamma1 matrices for each atomic position.

You will get a set of positive definite matrices (number of atomic positions), which will automatically hold the symmetry of the crystal. Use the positive definite matrix of each atomic position while computing the Debye-Waller factor. You will get a set of Debye-Waller factors for each hkl, one for each atomic position. By doing so, the correct structure factor of the whole crystal will be computed.

4. Advanced external formats

Most standard data formats as fixed by the crystallographic community use Fortran-like input until now: a number at a fixed position on a punch-card like data sheet for each input value. I will explain a more advanced modern format:

At first, we no longer use fixed positions and/or data sheets of fixed length (no more punch cards!).

We use assignments as input. This is well known as DATA input in PL/1. By doing so, one may decide which set of information one will input. The program then tries to use default values for the remaining input. As a side effect, the sequence of the input is arbitrary:

A=3 B=4 C=5

is the same input as

B=4 A=3 C=5

We use expressions instead of numbers for each value. And the input may contain arbitrary assignments, which will be held in a special data structure while the program runs. So:

A=3 i=1 B=A+i C=A+2*i

is fully equivalent to the previous input.

The assignments will be evaluated in order. In the case of the BGMN Rietveld software (http://www.bgmn.de/), this special data structure is the DATA module. By evaluating the data instructions in order, the input data itself is a kind of code. The input data/code must follow the rules of a specialized language, the input language of BGMN in our case.

The user of the program may define dependencies of input values, each from each other or from generalized variables. In Rietveld programs, you must give a key (also called codewords) to define which input values are to be refined. By using generalized variables, you may refine those variables instead of the native structure variables.

I will explain this in the following two sections.

5. Example: Chain modelling of organic molecules

This example gives the structure description of an organic compound from [3].

Fig 1: Image of organic compound that can be defined by the following style of structure description.

Comments start with //. The input data/code (structure description file) is as follows:

// part 1

// General crystal definition as to be set in every structure definition.

SpacegroupNo=11 TITEL=i24tn_tr1_14_b3_e

PHASE=i24tn_tr1_14_b3_e GEWICHT=SPHAR4

// Debye Waller Factors specific to the atoms H,C,N

PARAM=TDSH=0_0 PARAM=TDSC=0_0 PARAM=TDSN=0_0

PARAM=A=2.808_2.75^2.85 PARAM=B=1.462_1.4^1.5 PARAM=C=0.441_0.4^0.5

PARAM=BETA=90.05_89^91

RP=4 B1=ANISO PARAM=k1=0_0^1 PARAM=k2=0_0^0.00001

// General behaviour of the force field extension.

Theory=100.0 BondLevel=2

// Definition of bonded atoms. The force field will be computed

// for non-bonded atoms, only. Up to the 2nd bonding level each

// (see BondLevel=2).

Bondings=C1,N2,C3,C4,C6,C8,C1 Bondings=C3,N10,C11,N13,C21

...

Bondings=C51,H52 Bondings=N53,H54

// Definition of force field.

WW(aCC,-12,bCC,-6,(C1,C3,C4,C6,C8,C11,C21,C23,C24,C26,C28,C31,

C41,C43,C44,C46,C48,C51))

...

WW(cHNN,-12,dHNN,-10,(N50),(H54))

// Definition of some pseudo-positions in cartesian

// coordinates, which are useful for Eulerian rotation.

set(E0,0.000,0.000,0.000)

set(EX,1.000,0.000,0.000)

set(EY,0.000,1.000,0.000)

set(EZ,0.000,0.000,1.000)

// Definition of cartesian atomic coordinates of the

// (unfolded, unshifted) molecule.

set(O0,0.4800,0.0000,0.0600)

set(C1,0.0000,0.2425,0.2800)

set(N2,0.0000,0.2425,0.1400)

...

set(H54,-0.0000,0.1212,0.4600)

// Rotation of the molecule for 3 Eulerian angles rho1...rho3

PARAM=rho1=38.0_25.0^55.0 PARAM=rho2=71.0_55.0^90.0

PARAM=rho3=-12.0_-45.0^45.0

D(E0,EY,rho2,C1,N2,C3,C4,H5,C6,H7,C8,H9,N10)

...

D(E0,EY,rho2,N53,H54)

D(E0,EZ,rho3,C1,N2,C3,C4,H5,C6,H7,C8,H9,N10)

...

D(E0,EZ,rho3,N53,H54)

D(E0,EX,rho1,C1,N2,C3,C4,H5,C6,H7,C8,H9,N10)

...

D(E0,EX,rho1,N53,H54)

// Shift of the molecule for xx...zz

PARAM=xx=0.48_-0.48^2.8 PARAM=yy=0.0_-0.4^0.4 PARAM=zz=0.06_-0.40^0.4

T(xx,yy,zz,0,0,0,C1,N2,C3,C4,H5,C6,H7,C8,H9,N10)

...

T(xx,yy,zz,0,0,0,N53,H54)

// Shift of the single oxygen atom for xx0...zz0

PARAM=xx0=0.48_0.0^2.8 PARAM=yy0=0.0_0.0^0.4 PARAM=zz0=0.06_-0.1^0.4

T(xx0,yy0,zz0,0,0,0,O0)

// Rotation/torsion of parts of the molecule

// (folding of the molecular chain).

PARAM=rho0310=0_-90^90

D(C3,N10,rho0310,C11,H12,N13,H14,C21,N22,C23,C24,H25)

...

D(C3,N10,rho0310,H52,N53,H54)

PARAM=rho1011=0_-90^90

D(N10,C11,rho1011,N13,H14,C21,N22,C23,C24,H25)

...

D(N10,C11,rho1011,H52,N53,H54)

...

// Part 2: Definition of the Wyckoff positions. In this case,

// only general postions (Wyckoff=f). Note the following two points:

// -The atomic coordinates are computed by the 3 functions X()...Z(),

// which are predefined by the Rietveld program BGMN

// -The (isotropic) Debye-Waller factors of the multiple H,C,N atoms

// are neither independent nor generalized into a total Debye-Waller

// factor of the structure. Instead of specific Debye-Waller factors

// for each of H,C,N atoms were parametrized in part 1.

// In following, this values are assigned to each TDS in part 2.

E=O Wyckoff=f x=X(O0) y=Y(O0) z=Z(O0) TDS=ANISO

E=C Wyckoff=f x=X(C1) y=Y(C1) z=Z(C1) TDS=TDSC

E=N Wyckoff=f x=X(N2) y=Y(N2) z=Z(N2) TDS=TDSN

...

6. Example: Real structure of the clay mineral kaolinite

The asymmetric bands of disordered clay minerals are described by hkl dependent peak broadening and shifting [4]. The disordering model of kaolinite [6] was tried to approximate as well as possible.

Fig 2 (a, b): Rietveld plots of Koalinite quantitative analysis as modelled (a) without hkl dependent peak broadening (Rwp=19.8%, resulting kaolinite content = 73.0(8)%), and (b) with hkl dependent peak broadening (Rwp=9.0%, resulting kaolinite content = 89.5(4)%).

The input data/code (structure description file) is as follows:

// part 1

TITEL=KAOLC1F SpacegroupNo=1 HermannMauguin=C1

PARAM=A=0.5155_0.5^0.52 PARAM=B=0.8945_0.88^0.91 PARAM=C=0.7405_0.72^0.75

PARAM=ALPHA=91.700_90^94 PARAM=BETA=104.900_100^107

PARAM=GAMMA=89.800_88.6^91

// RefMult=2: Each hkl causes two lines in the pattern,

// different shifts, different widths, different intensities,

// which generate strongly asymmetric peak shapes as common in clays

RefMult=2 GEWICHT=SPHAR6 RP=4

PARAM=g1=0.5_0^1

// different intensities

GEWICHT[1]=g1*GEWICHT GEWICHT[2]=(1-g1)*GEWICHT

GOAL:kaolfehlneu=GEWICHT

b10=ANISOLIN^0.1

// The length of the rots for the sub phase i (i=1,2) is b11*c1[iref]

PARAM=cb1=0.1_0^1

PARAM=b11=0.6^0.7_0

c1[2]==1 c1[1]=cb1

// The square of the additional with the squared lorentzian functions is

// c12sqr*sqr(c1[iref]*b11)

c12sqr==1

// Shift of the line is c2*sqr(c1[iref]*b11)

c2==1

// Change of lattice constants by tau

PARAM=c3=0_-3^3

PARAM=k2=0_0 // micro strain

// taua, taub ,tauc is the difference vector between tau as given in

// "X-Ray Diffraction by Disordered Lamellar Structures"

// Victor A. Drits - Cyril Tchoubar, Springer Berlin Heidelberg 1990

// ISBN 3-540-51222-5, ISBN 0-387-51222-5

// Kap. 8.1.5 Models for the Stacking Faults in Kaolinite

// Pkt. 4: Model containing only enantiomorphic B layers, S. 245

// and the ideal translation b/3, multiplied with an arbitrary

// factor. This new tau should be a small value compared to

// lattice constants.

// tau in spherical coordinates

// as a difference to the above citation, tau has a significant

// value in c direction

PARAM=theta=0 PARAM=phi=0 PARAM=tau=0.2_-0.5^0.5

// tau in common coordinates

taua==tau*cos(theta)*cos(phi)

taub==tau*cos(theta)*sin(phi)

tauc==tau*sin(theta)

pi==2*acos(0)

// some helper variables

cosALPHA==cos(ALPHA*pi/180)

cosBETA==cos(BETA*pi/180)

sinGAMMA==sin(GAMMA*pi/180)

cosGAMMA==cos(GAMMA*pi/180)

K13==(cosALPHA*cosGAMMA-cosBETA)/A

K23==(cosBETA*cosGAMMA-cosALPHA)/B

K33==sqr(sinGAMMA)/C

// differnt widths B1,B2

// different shifts DELTAsk

// each depending from hkl as well as from iref

B1=cat(hklK==h*K13+k*K23+l*K33,hkltau==h*taua+k*taub+l*tauc,

b1k==K33*sqr(hkltau),

B2==k2*sqr(sk)+sqr(sqrt(sqr(sk)+

sqrt(c12sqr)*sqr(c1[iref]*ifthenelse(mod(k,3),b11+b1k,b1k)))-sk),

DELTAsk==sqrt(sqr(sk)+c2*sqr(c1[iref]*ifthenelse(mod(k,3),b11+b1k,b1k)))-sk+

c3*c1[iref]*K33*hklK*hkltau/sk,

b10+c1[iref]*ifthenelse(mod(k,3),b11+b1k,b1k)*abs(hklK)/sk)

GOAL=taua GOAL=taub GOAL=tauc

// part 2, not refined in this case

E=SI+4 Wyckoff=a x=0.9942 y=0.3393 z=0.0909 TDS=0.0044

E=SI+4 Wyckoff=a x=0.5064 y=0.1665 z=0.0913 TDS=0.0044

E=AL+3 Wyckoff=a x=0.2971 y=0.4957 z=0.4721 TDS=0.0083

E=AL+3 Wyckoff=a x=0.7926 y=0.3300 z=0.4699 TDS=0.0083

E=O Wyckoff=a x=0.0501 y=0.3539 z=0.3170 TDS=0.0071

E=O Wyckoff=a x=0.1214 y=0.6604 z=0.3175 TDS=0.0071

E=O Wyckoff=a x=0.0000 y=0.5000 z=0.0000 TDS=0.0071

E=O Wyckoff=a x=0.2085 y=0.2305 z=0.0247 TDS=0.0071

E=O Wyckoff=a x=0.2012 y=0.7657 z=0.0032 TDS=0.0071

E=O Wyckoff=a x=0.0510 y=0.9698 z=0.3220 TDS=0.0090

E=O Wyckoff=a x=0.9649 y=0.1665 z=0.6051 TDS=0.0090

E=O Wyckoff=a x=0.0348 y=0.4769 z=0.6080 TDS=0.0090

E=O Wyckoff=a x=0.0334 y=0.8570 z=0.6094 TDS=0.0090

In this case, only the lattice constants and the real structure will be refined. The ideal structure remains fixed.

7. Conclusion

The advantage of modern crystallographic data representations over traditional forms has been demonstrated. Representations of two totally different data sets (molecular chains, real structure of clay minerals) were given using the same frame set: the input language designed for the Rietveld program BGMN. Traditional crystallographic data represent a set of numbers. Advanced crystallographic data must represent a set of dependencies.

[1] Niklaus Wirth, "Algorithmen und Datenstrukturen (pascal version)", 5. edition, B.G. Teubner, Stuttgart (2000).

[2] J. Bergmann, T. Monecke, R. Kleeberg, "Alternative algorithm for the correction of preferred orientation in Rietveld analysis", J. Appl. Cryst. 34 (2001) pp. 16-19

[3] P. Friedel, J. Tobisch, D. Jehnichen, J. Bergmann, T. Taut, M. Rillich, C. Kunert, F. "Structure investigations of molecular crystals containing the ring cyclo-tris(2,6-pyridyl formamidine) by means of XPD and force field constrained Rietveld refinement", J. Appl. Cryst. 31 (1998), p. 874

[4] J. Bergmann and R. Kleeberg, Rietveld analysis of disordered layer silicates, proceedings of the 5th European Conference on Powder Diffraction (EPDIC 5) held in Parma, Italy, May 24-28, 1997, Mat. Sci. Forum, 278-281 (1998) part 1 pp. 300-305

[5] R. Kleeberg, J. Bergmann, Quantifizierung von fehlgeordneten Schichtsilikaten mit der Rietveld-Methode, Berichte der Deutschen Ton- und Tonmineralgruppe e.V. 5 (1997), pp. 35-44

[6] Victor A. Drits - Cyril Tchoubar, "X-Ray Diffraction by Disordered Lamellar Structures", Springer Berlin Heidelberg 1990

These pages are maintained by the Commission Last updated: 15 Oct 2021

Commission on Crystallographic Computing