A high-throughput infrastructure for density functional theory calculations

https://doi.org/10.1016/j.commatsci.2011.02.023Get rights and content

Abstract

The use of high-throughput density functional theory (DFT) calculations to screen for new materials and conduct fundamental research presents an exciting opportunity for materials science and materials innovation. High-throughput DFT typically involves computations on hundreds, thousands, or tens of thousands of compounds, and such a change of scale requires new calculation and data management methodologies. In this article, we describe aspects of the necessary data infrastructure for such projects to handle data generation and data analysis in a scalable way. We discuss the problem of accurately computing properties of compounds across diverse chemical spaces with a single exchange correlation functional, and demonstrate that errors in the generalized gradient approximation are highly dependent on chemical environment.

Highlights

► We describe theoretical methodologies to tackle high-throughput computation. ► We describe how k-point mesh and convergence criteria affect energy and cell volume. ► We analyze systematic errors in ternary formation enthalpies calculated with GGA.

Introduction

The benefits of density functional theory (DFT) [1] calculations in the design and optimization of new materials have now been demonstrated across several research fields [2], [3], [4], [5], [6], [7]. The scalability of computations makes it possible – at least in principle – to make predictions on thousands of compounds, and potentially for all known inorganic materials. The objective of the “Materials Genome” project [8] described in this paper is to enable accelerated materials discovery, and ultimately to develop a database of calculated properties and structural information on all known inorganic compounds for the materials community. In this paper, we describe the calculation infrastructure used to compute some properties of approximately 80,000 compounds, encompassing the majority of unique compounds in the inorganic crystal structure database (ICSD) [9], [10], as well as many newly predicted systems. The number of calculations achievable is limited only by the prevailing computing technology and resources. Subsets of calculations performed with this methodology have been applied to structure prediction [11], screening of Hg sorbents [12], band gap prediction [13], and battery design [14], [15].

The potential benefits of automating and scaling computational property predictions have been demonstrated in recent years by several research groups. Curtarolo et al. investigated the effect of structure on the stability of binary alloys using about 14,000 energies calculated from first principles [16]. More recently, Curtarolo et al. presented an overview of technical issues in high-throughput band structure calculations [17]. Ortiz et al. screened about 22,000 materials to suggest new materials for radiation detectors [18]. Several smaller DFT studies, involving hundreds of materials, have been performed by various research groups for catalysis and hydrogen storage [19], [20], [21], [22], [23], [24]. In addition, a small number of general-purpose online electronic structure databases are now emerging [18], [25], [26], [27], [28], [29], including a large (∼81,000 DFT calculations) alloys database by Munter et al. [30]. Given the rising interest in high-throughput DFT, we describe in this paper some of the unique challenges faced when scaling to high-throughput as well as techniques to overcome these challenges.

Fig. 1 is the data flow diagram for this work, which we expect to be typical of most high-throughput computational screening projects. Because high-throughput calculations inherently involve generating, storing, and analyzing large amounts of data, a formal data flow strategy is needed to manage data efficiently. Fig. 2 is a visual overview of the technologies and techniques we used to implement the abstract steps in the data flow diagram. In the remaining sections, we examine each of the data flow stages in more detail.

Section snippets

Data selection

In this paper, we do not focus on data selection for high-throughput; however, several algorithms exist to optimize data selection when screening materials for a particular application. Many of these algorithms, such as tiered screening and evolutionary algorithms [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], have been outlined previously by Bligaard et al. [47]. If the intent is to create a general-purpose database, one additional approach is to compute compounds tabulated in a

Data generation

After selecting compounds for computation, the next step in high-throughput screening (Fig. 1) is the generation of ab initio property data on these compounds. Although DFT calculations are well suited for high-throughput due to their relatively small number of adjustable parameters, automating DFT calculations is not yet trivial. As we will discuss, DFT calculations require making choices related to the accuracy of the computation, including the choice of the exchange–correlation functional,

Data storage and retrieval

Data storage and retrieval require dedicated attention when scaling up to a high-throughput project. A well-designed architecture for data storage allows researchers to explore large amounts of data intuitively and naturally, greatly enhancing the possibility of finding new and interesting compounds for an application or discovering scientific trends in the data.

Many relational database systems for data storage are now available to researchers. These include, for example, MySQL [88] and

Data analysis

The data analysis process will depend heavily on the area of application. In Section 6, we present one example of data analysis that examines the error of binary and ternary formation enthalpy calculations calculated with GGA. Here, we summarize several types of analyses that have broad applications:

  • (i)

    Structural equivalence – Many properties of materials are correlated to their crystal structure, but it can be difficult to determine equivalence between crystal structures. Raw crystal structure

Cancelation of errors in GGA and formation enthalpy dependence on reference states

No single exchange correlation functional achieves consistent accuracy across a diversity of chemical environments. High-throughput data sets can be used to probe the accuracy of a particular exchange–correlational functional either within a single chemical class or across chemistries. Such a study was carried out by Curtarolo et al. for binary metals [16] and pure elements [99] to evaluate the accuracy of LDA and GGA predictions of the relative stabilities of crystal structures. More recently,

Conclusion

High-throughput density functional theory presents new opportunities for materials design and rapid computational screening, but also poses unique technical challenges regarding implementation and accuracy across wide chemical systems. In this paper, we demonstrated how data flow was managed in our high-throughput project and described the computational tools we used to scale DFT calculations to a large scale. In addition, we presented data regarding convergence for a large number of compounds,

Acknowledgements

This research was supported by the US Department of Energy through grants Nos. #DE-FG02-96ER4557 and DE-FG02-97ER25308. Additional funding was provided by Umicore and Bosch. The authors would like to acknowledge discussions and contributions to the high-throughput infrastructure and methodology from Dr. Fei Zhou. In addition, we thank Dr. Shirley Meng for discussions related to high-throughput Li ion battery design, Dr. Michael Kocher for conversations on methods of interaction between the

References (103)

  • M.L. Cohen

    Solid State Communications

    (1998)
  • A. Jain et al.

    Chemical Engineering Science

    (2010)
  • S.P. Ong et al.

    Electrochemistry Communications

    (2010)
  • W. Setyawan et al.

    Computational Materials Science

    (2010)
  • C. Ortiz et al.

    Computational Materials Science

    (2009)
  • J. Greeley

    Surface Science

    (2007)
  • M. Andersson et al.

    Journal of Catalysis

    (2006)
  • G.A. Gazonas et al.

    International Journal of Solids and Structures

    (2006)
  • R. Giro et al.

    Chemical Physics Letters

    (2002)
  • P. Villars

    Journal of Alloys and Compounds

    (1998)
  • G. Kresse et al.

    Computational Materials Science

    (1996)
  • F. Zhou et al.

    Electrochemistry Communications

    (2004)
  • F. Zhou et al.

    Solid State Communications

    (2004)
  • M.L. Cohen et al.

    Solid State Physics

    (1970)
  • L.A. Montoro et al.

    Chemical Physics Letters

    (1999)
  • P. Pulay

    Chemical Physics Letters

    (1980)
  • Y. Wang et al.

    Calphad

    (2004)
  • P. Hohenberg

    Physical Review

    (1964)
  • J. Hafner et al.

    MRS Bulletin

    (2006)
  • K. Kang et al.

    Science

    (2006)
  • J. Wang et al.

    Science

    (2003)
  • A. Kolmogorov et al.

    Physical Review B

    (2008)
  • G.K.H. Madsen

    Journal of the American Chemical Society

    (2006)
  • Materials Genome...
  • G. Bergerhoff et al.

    Journal of Chemical Information and Computer Sciences

    (1983)
  • F. Karlsruhe, Inorganic Crystal Structure Database,...
  • G. Hautier et al.

    Chemistry of Materials

    (2010)
  • M. Chan et al.

    Physical Review Letters

    (2010)
  • J.C. Kim et al.

    Journal of the Electrochemical Society

    (2011)
  • S. Curtarolo, D. Morgan, G. Ceder, Accuracy of ab initio methods in predicting the crystal structures of metals: review...
  • J. Greeley et al.

    Nature Materials

    (2004)
  • J. Greeley et al.

    Nature Materials

    (2006)
  • J.S. Hummelshøj et al.

    The Journal of Chemical Physics

    (2009)
  • J. Greeley et al.

    The Journal of Physical Chemistry C

    (2009)
  • Japan Science and Technology Agency, Computational Electronic Structure Database (CompES),...
  • N. Tatara et al.

    Progress of Theoretical Physics Supplement

    (2000)
  • H.L. Skriver, The hls Alloy Database,...
  • M. Klintenberg, Electronic Structure Project,...
  • S. Curtarolo, AFLOW-lib databases,...
  • T.R. Munter et al.

    Computational Science & Discovery

    (2009)
  • G. Ceder et al.

    MRS Bulletin

    (2006)
  • C.C. Fischer et al.

    Nature Materials

    (2006)
  • Sun Grid Engine,...
  • The Perl Programming Language,...
  • M. Stonebraker et al.

    ACM SIGMOD Record

    (1986)
  • M. Stonebraker et al.

    Communications of the ACM

    (1991)
  • K. Mitra

    International Materials Reviews

    (2008)
  • D. Scott et al.

    Journal of Chemical Information and Modeling

    (2008)
  • G. Johannesson et al.

    Physical Review Letters

    (2002)
  • S. Woodley

    Applications of Evolutionary Computation in Chemistry

    (2004)
  • Cited by (0)

    View full text