Why do so many crystal structures remain unpublished?
In describing some uses of the Cambridge Structural Database (CSD) in research and teaching in a recent IUCr Newsletter (Clegg, 2020), I mentioned the general agreement that this and other crystallographic databases undoubtedly contain only a minority of crystal structures that have been successfully solved and refined.
The 2010 British Crystallographic Association (BCA) Spring Meeting included a session devoted to issues of publication of small-molecule crystal structures. As part of that session, I gave an invited lecture with the title “Reasons good and bad for not publishing crystal structures,” which was followed by a lively discussion. This article is adapted from the material of that lecture, with appropriate updates. Much has changed in ten years!
The continued development of high-throughput equipment for diffraction measurements, of ever more brilliant X-ray sources, and of sophisticated and increasingly automatic software for structure solution and refinement in the intervening years has surely added enormously to the wealth of unpublished results of good technical quality. At the same time, opportunities and facilities for formal publication have seen no major advances on a comparable scale, and reasons for not publishing have probably multiplied through competing pressures on time and other resources.
Here I pose and seek to address a number of related questions: Why publish at all? Why do structures not get published? Why should structures not get published? What do we mean by publication anyway? The views expressed are, of course, personal and are based on decades of experience in a rapidly changing environment; they are intended, as at the BCA meeting, to provoke reaction and discussion.
Why publish at all?
It is right to ask such fundamental questions of motive, especially when publication significantly consumes time and other resources. Several positive answers can readily be given.
1. To justify public funding: in recent years this has become a definite formal policy of research funders, who often ask for at least outline publication plans when they are considering grant applications, combined with a requirement of some kind of open-access status of resulting publications, for which an earmarked proportion of the funding may be provided. It should also be considered a moral imperative – public finance demands public access to the results of the research.
2. To further careers: this raises the follow-up question, whose careers? Research usually involves both established personnel and those who are at an early stage or still in formal training, and publication, preferably with high profile and impact, is important at all stages of career development.
3. For reputation: not only personal but also institutional. There is no doubt that publication approaches and practices are strongly influenced by official research assessment exercises, by all sorts of metrics, and by overused performance league tables.
4. To contribute to knowledge: this is, of course, the fundamental purpose of research in general. Clearly, some knowledge is more useful and more important than others, but there is a good argument in crystallography for extending our knowledge and understanding of structures as widely as possible to enable the recognition of patterns and notable exceptions to them. The failure to bring “less interesting” results into the public domain is a form of bias in the available complete set of data.
5. To avoid unnecessary duplicate work: there must be many structures that have not been published because they were not the expected or desired result. In some cases, these are recovered starting materials, unwanted by-products, or the products of “failed” reactions, but nevertheless are previously unreported structures. Having these available, at least in an accessible database if not in a journal publication, can save time and effort by other researchers if a matching unit cell is recognised in the first stages of a data collection.
Worthy as these motives are, they are not sufficiently strong to prevent a growing mountain of unpublished structures accumulated in research laboratories across the world. The factors militating against publication are many and forceful.
Why do structures not get published?
Here are a number of reasons – and excuses – that might be given in response. Their validity is itself a matter for debate.
1. There are too many results to be able to publish them all. In the days of photographic data collection and then of serial diffractometers measuring one reflection at a time, when it took days or weeks to determine a crystal structure, it was possible for publication to keep up with the output of results, but it began to fall behind as data collection (Clegg, 1981) and computing facilities became faster. With the widespread introduction of area detectors for routine use in the 1990s, the flood of new structures overwhelmed the channel of publications, and large reservoirs of results began to build up. Now we have various criteria and priorities to decide which structures make it through to publication and which are left behind, many of them not based on the intrinsic crystallographic merit of the individual structures.
2. There is too much time and effort involved in publication. It can certainly take far longer to generate a manuscript, or even just the crystallographic component of one, than to carry out a complete structure determination. This is true despite the enormous electronic advances in publishing in recent decades, significantly aided by the virtually universal adoption of CIF as an archiving and dissemination medium for crystal structures alongside powerful validation, analysis and graphical tools.
3. Publication is not my responsibility. This depends on the status of crystallographers and the terms and conditions under which they operate, ranging from independent research leaders in their own right to managers and staff of an externally funded technical service. It is tied up with potentially contentious issues, such as the question of who is the primary “owner” (in terms of intellectual property rights and of accountability) of a crystal structure as a piece of information. Who actually chooses whether, and how, and when and where, any particular structure is submitted for publication?
4. Other people involved (chemists, research group leaders) have moved, retired or died. Is relevant information and expertise to generate a suitable publication still available? This may be combined with or have similar impacts to the next two reasons.
5. Funding for this work has ended.
6. We are no longer interested in this topic. Priorities, funding opportunities and fashions do change in scientific research, for both good and bad reasons. In some cases, a series of crystal structures may be generated as part of a mainly synthetic PhD project, a primary aim being to obtain definitive confirmation of the chemical identity of the compounds or some specific structural feature such as conformation or absolute configuration. Once the results are included in a successfully defended thesis, and the student takes up another post, this particular project is left behind, and any good intention to convert the results into publications may get buried in other priorities and new projects.
7. The result is not interesting, or not what we wanted. In particular, it may just not be important enough for possible inclusion in performance-related assessments, such as in the UK Research Excellence Framework or similar exercises elsewhere. Unexpected results can, of course, turn out to be particularly exciting and lead to publications in high-profile journals, but many are regarded as disappointing and even a waste of time and resources. As one example of this contrast, my own involvement in characterising polyoxometalates has revealed some unexpected condensed products with structural complexity (Errington et al., 2007). But reactions in this field can often lead to the thermodynamic sink of the [M6O19]2− cluster anion (M = Mo, W; Fig. 1). We have encountered many salts of these anions with various organic cations, including numerous polymorphs and different solvates, only some of which have previously been reported. These structures are never going to be considered important enough for a publication.
8. There are further experiments to be done for a full publication. This may be combined with or be an alternative version of the next one.
9. We do not wish to reveal these results (yet, if ever). Avoiding a “salami-slicing” approach to the publication of individual results that would be much better in a combined authoritative account of an entire project is commendable; but I have known cases in which the “further experiments” take many years and are never successfully completed, or are simply overtaken by new priorities. A more acceptable reason for delay is the preparation and filing of a patent for which the structure is part of the evidence, so its premature appearance in the public domain must be avoided. Most crystallographic work carried out in the chemical or pharmaceutical industry is never published; outsourcing of commercial work to other structural service providers is usually covered by non-disclosure agreements, with the full intellectual property rights explicitly assigned to the funders.
10. The result is not of good quality (at least, not yet). Crystal structures vary considerably in their overall quality, precision and reliability. Some of the factors that can adversely affect the result include very small crystal size, poor crystal quality, disorder, twinning and radiation damage. In some cases, improvements can be achieved through recrystallisation, use of a more intense X-ray source and/or a more sensitive detector, or the preparation of a different derivative or solvate, among other approaches. As an example, in a study of the impact of different substituents on the cage geometry of the ortho-dicarbaborane C2B10H12, especially the effect on bond lengths, varying the substituent at one C atom while leaving the other C–H unsubstituted leads to unresolvable disorder of the C–H and four B–H positions; the result is a statistically averaged geometry with no useful outcome. Attachment of a methyl or (better) phenyl group to the second C atom prevents this disorder and gives clear results for a series of substituents at the first C atom (Fig. 2), with a well-defined substantial effect on the C–C bond length (Coult et al., 1992; Boyd et al., 2004; Fox, MacBride et al., 2009; Fox, Peace et al., 2009).
11. We tried, but the paper was rejected. This rejection may not be because of the crystallography component, though some chemical journals do have a regrettable attitude that denigrates a manuscript judged to be “only crystallography”. Sometimes criticism is made by identifiably crystallographic referees. Constructive criticism is, of course, the proper responsibility of referees, and it often leads to improvement and correction. However, I take issue with what I categorise as three particular types of poor crystallographic referees:
(a) the sloppy, whose review of the manuscript appears to be superficial, offering no significant comment and, in some cases, failing to notice obvious faults that consequently appear in publications;
(b) the robotic, who merely cite PLATON/checkCIF alerts (Spek, 2020) and incorrectly call these “errors” that need to be corrected, basing a recommendation of rejection purely on these with no consideration of their possible causes or author responses to them;
(c) the opinionated, who seem to consider that their personally preferred approach to the palette of available refinement tools (particularly the use of restraints) is the only acceptable one and who demand a revision carried out in their particular way.
Why should structures not get published?
While there are journal referees who do a poor job, most are surely conscientious and impartial. Their role is essential to ensure the accurate and reliable reporting of valuable structural results. They, and editors, serve as a defence against the publication of unsuitable results. Crystallographers themselves, however, have the primary responsibility of judging their own results, as to how appropriate they are for release in the public domain, and in what way, for use by others. Some structures should simply not be published, as they do not serve the purposes outlined in the first section of this article. What features can make a structure unsuitable for publication?
1. It is of poor quality. Quality is a relative property on a continuous scale, not a binary characteristic, so this requires judgement against relevant criteria. It depends on the purpose of the experiment in the first place, and on the way in which the results are to be used in the context of the overall work. The “quality” of a structure may also not be uniform in all its features. For example, a badly disordered solvent molecule or counterion that has not been well modelled, despite best attempts, does not usually detract from the value of the main part of the structure (though it does generally mean a reduced precision overall). On the other hand, a poorly defined H atom could be a key feature of the structure and severely limit important conclusions about the chemistry.
2. It is not fit for purpose. What is the point of the experimentally determined structure? A lower quality can be tolerated for a result intended to confirm the identity of a product from among a number of possibilities, provided the answer is unambiguous, than for a study of detailed molecular geometry in which numerical standard uncertainties need to be minimised if sensible conclusions are to be drawn. Whatever the purpose, however, there is no excuse for sloppy work; the goal should always be the best possible structural result, with all stages of experiment and computation geared to this.
3. It is incorrect. Various experimental failings, lack of appropriate corrections for known systematic errors and effects, and poor use of software can lead to inappropriate structural models and misinterpretation of the results. Possible errors include wrongly assigned atom types, missing or misplaced H atoms, and a failure to recognise symmetry-related connections in a polymeric structure. I was particularly amused, while also being horrified, by one example from my work as a founding Section Editor of Acta Crystallographica Section E in the early 2000s. It would be unethical to give details, but it became clear to me on close inspection that the authors had incorrectly identified F atoms as OH groups and a nitro group as a carboxylic acid (Fig. 3), leading to some unacceptable geometry for supposed hydrogen bonds; correct atom assignments resulted in a structure with markedly lower refinement residuals that was actually already known. Correspondence with the principal author, who would not accept these findings, was protracted until I recommended that he submit the paper to Nature, because the reported structure, if correct, would be unusual enough to make the front cover! I preserve to this day the comment sent to me that “I fear you are suffering from a disillumination of the mind”!
4. It is fraudulent. Fortunately, as far as we are aware, this is rare in reported crystal structures. Sadly, there was a spate of fraudulent structures based on manipulated diffraction data in Acta Crystallographica Section E around 2007 (Harrison et al., 2010), perpetrated by a small number of dishonest authors. It was uncovered by careful use of validation software, which has subsequently been enhanced to provide better detection of such fraud and other problems and is routinely used by IUCr journals and many others. Several other science journals, some very prestigious, were prompted by the up-front honesty of the Acta editors to investigate, and found serious fraud in some of their own publications. The nature of crystal structure results and their relationship to the original experimental data (deposition of which is demanded by IUCr journals) makes this kind of abuse much harder to cover up than in many other areas of science, where deliberate fraud can be difficult to detect.
What do you mean by publication anyway?
The word “publication” covers a range of ways in which scientific research results are placed in the public domain. The conventional publication used to be, principally, a physical manuscript submitted to, peer-reviewed for, and accepted by a publisher for inclusion in a printed journal issue (or a book), with a recognised literature reference. It could also be used for articles published without a peer-review step. The term has been broadened over the years by the appearance of hybrid print/electronic and electronic-only journals and by the distinction between subscription-paid and open-access publication, with various levels of “open access” having a range of embargo and release stages. Other forms of presence in the public domain, with different degrees of effective accessibility, include conference presentations and abstracts, doctoral theses usually lodged in an institutional library, institutional and organisational electronic document repositories, databases that may be freely accessible or available on a subscription basis, and personal or corporate websites.
Important designations and properties that have been introduced over the years for publications, associated with different perceived degrees of status and value, include acceptance into indexing services such as Chemical Abstracts (a division of the American Chemical Society) or Web of Science (Clarivate Analytics PLC, formerly ISI Web of Knowledge), and the assignment of Digital Object Identifiers (DOI). On the basis of these and other criteria such as journal impact factors, different kinds of publication, and different publications within these kinds, are generally considered to have different degrees of worth in uses such as personal CVs and institutional research assessment submissions.
Whatever form of publication we choose, the dissemination of our crystal structure results is an important part of the research work overall. What, may we ask, is the point of doing research at all unless we show and explain to others what we have done?
Boyd, L. A., Clegg, W., Copley, R. C. B., Davidson, M. G., Fox, M. A., Hibbert, T. G., Howard, J. A. K., Mackinnon, A., Peace, R. J. & Wade, K. (2004). Dalton Trans. pp. 2786–2799.
Coult, R., Fox, M. A., Gill, W. R., Wade, K. & Clegg, W. (1992). Polyhedron, 11, 2717–2721.
Errington, R. J., Petkar, S. S., Middleton, P. S., McFarlane, W., Clegg, W., Coxall, R. A. & Harrington, R. W. (2007). J. Am. Chem. Soc. 129, 12181–12196.
Fox, M. A., MacBride, J. A. H., Peace, R. J., Clegg, W., Elsegood, M. R. J. & Wade, K. (2009). Polyhedron, 28, 789–795.
Fox, M. A., Peace, R. J., Clegg, W., Elsegood, M. R. J. & Wade, K. (2009). Polyhedron, 28, 2359–2370.
Copyright © - All Rights Reserved - International Union of Crystallography