Armel Le Bail

*Université du Maine, Laboratoire des Fluorures, CNRS UMR 6010, Avenue O. Messiaen, 72085 Le Mans Cedex 9, France. Email : *

When exact equations do not exist, or if they are too complex to manipulate, a problem-solving approach in crystallography (anywhere in fact), inelegant, maybe not deserving the Gregori Aminof Price (GAP) though able to fill a gap sometimes, brute force, but efficient, is the use of random numbers. Randomness finally can end in the finding of a desired order (compatible with both diffraction data and crystal chemistry). This is nothing else than a large scale trial and error process, involving 10^{6} to 10^{9} trials, if not much more. Quite easy and fast to develop with the programming language of your choice. Good languages offer efficient-enough intrinsic functions which can produce pseudo-random number sequences (for instance the RAN subroutine in Fortran). These random numbers are used for building algorithms producing "random walks" (stochastic techniques, ordinary or Markov chain Monte Carlo, Metropolis algorithm, etc... but this text will avoid the specialized terminology) of which applications have grown exponentially during the past twenty years, in absolutely all domains (literature is quite huge, introductions can be found on the web, see references [1-3]). The reason which may preclude their use is a prohibitive CPU time if a trial needs too long calculations for checking if the change in the parameter randomly modified should be retained or not. But computers are going faster so that software are already written today for problems that will be solved in a matter of minutes or seconds in a near future (10-20 years), even if several days of calculations can be necessary currently. On a standard PC running at 2.5GHz, the 10^{9} trials are made in one night if 20000 trials are performed in 1 second. This opens horizons, moreover trial and error process are easy to adapt for parallel computing.

Order is only a kind of disorder (and vice versa) so that if disorder is desired rather than order, a small change in the algorithm is all the business. An non-exhaustive list of computer programs applying random numbers for adjusting parameters (atom or molecule coordinates, cell dimensions, etc) that should match crystallographic data (diffraction or diffusion, intensity and/or peak position and peak shape, etc) follows :

- Small angle scattering, Monte Carlo calculation of the interparticle interference [4].
- RMCA [5] :for modelling disorder in amorphous compounds, matching neutron and X-ray interference functions plus other extra data like EXAFS, NMR (...). More than 100 applications of this program were published.
- DISCUS [6] : for analysis of diffuse scattering from single crystals.
- OCTOPUS, SA, DASH, PowderSolve, Endeavour, PSSP, FOX, TOPAS, EAGER, FOCUS, SAFE, MAUD, EXPO, ESPOIR (... no place for all references, typing the program names as keyword with Google should return some hyperlinks) : for structure solution from powder diffraction data, matching either the raw diffraction pattern [7] or the extracted structure factor amplitudes, or mathematical representations of them. See the recent SDPD (Structure Determination by Powder Diffractometry) Round Robin 2 which seems to establish the triumph of this approach [8].

In most of these programs, a parameter, an atom or a molecule is selected at random, moved randomly, the observed data is then compared with the calculated one, the fit quality is estimated, the move is accepted if the fit quality is improved, but also times to times if it is not improved (in order to not being trapped in a false minima), annealing may be simulated by reducing progressively the amplitude of the moves, and this is all the deal. Absurdities may well occur during the atoms (or molecules) random walk. They quite easily virtually pierce each other, without provoking any nuclear fusion or particle/radiation emission, fortunately. Introducing anti-bump distance restraint during the process is not always a good idea, some problems can be solved better without it. Even the order constraint of a given space group may not be a good idea, and the initially disordered atoms seem to appreciate more to come to order in the less constrained space group P1, you just need then to check for higher symmetry at the end of the process. At the beginning of the process, some program developers find more attractive to mimic the Big Bang (even if controversial), getting all atoms at the origin first, letting them choose another place at random. Some start from atoms randomly distributed. On the figure 1 is shown the ESPOIR [9] program in action, solving the La_{2}W_{2}O_{9} structure [10] from neutron diffraction data (extracted "|Fobs|"), searching randomly 9 oxygen atoms while the La and W atoms known from X-ray data are immobile. In five minutes, 20 tests of 200000 oxygen atom moves are realized of which few are retained (~500), leading 2 times to the solution with a final Rp~6% on the pseudo-powder pattern regenerated from the extracted '|Fobs|' in order to overcome overlapping problems.

**Fig. 1:** *ESPOIR finding oxygen atoms in the La _{2}W_{2}O_{9} structure by Monte Carlo from neutron data.*

In some circumstances, the Monte Carlo method is deliberately used as a kind of refinement process (in fact parameter adjustment, the parameter being either an atom coordinate or any crystallographic characteristic) replacing the least-square method. And the applications are not limited to atom moves, other problems in crystallography are candidates :

- Optimization of the design of neutron scattering instruments [11]. No less than five programs are described in Neutron News volume 13, issue 4, 2002.
- Magnetism : magnetic structures simulation by MCMAG [12], spin orientation estimations in FULLPROF [13].
- Crystallite size and microstrain distribution functions estimation from line profile broadening analysis [14].
- Structure factor amplitudes extraction from powder data by the Le Bail method revisited by Altomare in EXPO [15]: no equipartition of overlapping reflections but random values agreeing with the total amplitude sum seems to lead to more efficiency in further structure solution attempts by the direct methods. Etc.

Applications to indexing powder patterns was made recently by using a genetic algorithm [16] or Monte Carlo in the McMaille program [17]. In the latter case, random numbers are used for deciding which cell parameter to vary next in the process, for attributing a new value (between imposed Pmin and Pmax) to the cell parameter selected for a change. This is illustrated below in Fortran language by two calls of the RAN function (returning a random number between 0. and 1.) for deciding which parameter to vary in hexagonal or tetragonal symmetry (CELP(IP=1) = *a*, CELP(IP=3) = *c*) and what new value the selected parameter will have, large changes being allowed at this level :

C Which parameter to vary ? a or c ?

IP=3 ! *c* parameter selected

IF(RAN(ISEED).GT.0.5)IP=1 ! or *a* in 50% cases (GT = greater than)

CELP(IP)=PMIN(IP)+DELTA(IP)*RAN(ISEED) ! give a new value to *a* or *c*

NTRIAL=NTRIAL+1 ! count the trials

**Fig. 2:** *The cimetidine (pseudo-) powder pattern Monte Carlo-indexed by McMaille.*

When a cell proposal is found to explain all observed reflection positions (or a part of them if unindexed reflections are allowed), or if the profile fit reliability Rp is lower than a selected value (say 50%), then random numbers are used again for applying that time very small cell parameter changes accepted only if Rp decreases (the large changes being accepted sometimes even if Rp increases). This is more similar to simulated quenching than to annealing. A least-square refinement would not converge starting from a 50% reliability factor, but trial and error does it well (in 100-3000 trials from cubic to triclinic). Unfortunately, this apparently simple problem with up to 6 cell parameters maximum to determine from a set of 20 reflection diffracting angles is transformed into a difficult one (not one clear cell proposal is obtained, but several dubious cells are ranked with similar figures if merit) essentially because of data inaccuracies and sample impurity, but also in special cases when the data which would allow the finding of some parameter are obscured by too much data corresponding to the other parameters (anisotropic cells, for instance characterized by 2 long cell parameters and a very short one). Due to large computing time, McMaille will really be competitive with more mathematically elegant approaches (ITO, TREOR, DICVOL) in ten years, but can already propose a monoclinic cell (figure 2) in one minute, after "only" 10^{6} cell trials, when a systematic grid search with 1000 steps for each of the four monoclinic parameters would need 10^{12} trials (2 years of calculation). No need to explain the difficulty with triclinic cells, or with adding a supplementary adjustable parameter which would be the zeropoint... Is that interesting? Well, indexing from powder diffraction data is not a pure mathematical problem and requires the use of complementary approaches, because the various computer programs are at their best in different cases. Any new efficient approach is welcome.

Now, think to your own problems in terms of adjustment by random parameter changes, evaluate the feasibility and interest of your project if compared to other calculation techniques, and, if conclusive, just build the program (preferably open source). Developing McMaille needed 2 months, full time, to one crystallographer/developer.

References

[1] http://www.taygeta.com/mcarlo/references.html

[2] http://www.statslab.cam.ac.uk/~mcmc/pages/list.html

[3] http://www.cooper.edu/engineering/chemechem/monte.html

[4] S.R. Hubbard, S. R. & S. Doniach, S. (1988). *J. Appl. Cryst*. **21**, 953-959.

[5] D. A. Keen & R.L. McGreevy (1990). *Nature* **344**, 423-424.

RMCA program : http://www.studsvik.uu.se/software/rmc/rmc.htm

[6] T.R. Welberry & Th. Proffen (1998). *J. Appl. Cryst*. **31**, 309-317. DISCUS program : http://www.uni-wuerzburg.de/mineralogie/crystal/discus/discus.html

[7] Tremayne, M., Kariuki, B. M. & Harris, K. D. M. (1996). *J. Appl. Cryst.* 29, 211-214.

[8] SDPD Round Robin - 2 (2002). http://www.cristal.org/sdpdrr2/

[9] A. Le Bail (2001). *Mat. Sci. Forum*, **378-381**, 65-70. ESPOIR program : http://www.cristal.org/sdpd/espoir/

[10] Y. Laligant, A. Le Bail & F. Goutenoire (2001). *J. Solid State Chem*. **159**, 223-237.

[11] P.A. Seeger, L.L. Daemen, E. Farhi, W.-T. Lee , X.-L. Wang, L. Passell, J. Ĺ aroun & G. Zsigmond (2002). *Neutron News* **13(4)**, 24-29.

[12] P. Lacorre & J. Pannetier (1987). *J. Magn. Magn. Mat*. **71**, 63-82.

[13] J. Rodriguez-Carvajal (2001). *Mat. Sci. Forum* **378-381**, 268-273. FULLPROF program : http://www-llb.cea.fr/fullweb/powder.htm

[14] P.E. Di Nunzio, S. Martelli & R. Ricci Bitti (1995). *J. Appl. Cryst*. **28**, 146-159.

[15] A. Altomare, C. Giacovazzo, A.G.G. Moliterni & R. Rizzi, R. (2001). *J. Appl. Cryst*. **34**, 704-709.

EXPO program : http://www.ic.cnr.it

[16] B.M. Kariuki, S.A. Belmonte, M.I. McMahon,, R.L. Johnston, K.D.M. Harris and R.J. Nelmes (1999). *J. Synchrotron Rad*. **6**, 87-92.

[17] McMaille powder indexing program (2002). http://www.cristal.org/McMaille/