Skip to main content
Top
Published in: Integrating Materials and Manufacturing Innovation 2/2020

27-05-2020 | Technical Article

Benchmark AFLOW Data Sets for Machine Learning

Authors: Conrad L. Clement, Steven K. Kauwe, Taylor D. Sparks

Published in: Integrating Materials and Manufacturing Innovation | Issue 2/2020

Login to get access

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Materials informatics is increasingly finding ways to exploit machine learning algorithms. Techniques such as decision trees, ensemble methods, support vector machines, and a variety of neural network architectures are used to predict likely material characteristics and property values. Supplemented with laboratory synthesis, applications of machine learning to compound discovery and characterization represent one of the most promising research directions in materials informatics. A shortcoming of this trend, in its current form, is a lack of standardized materials data sets on which to train, validate, and test model effectiveness. Applied machine learning research depends on benchmark data to make sense of its results. Fixed, predetermined data sets allow for rigorous model assessment and comparison. Machine learning publications that do not refer to benchmarks are often hard to contextualize and reproduce. In this data descriptor article, we present a collection of data sets of different material properties taken from the AFLOW database. We describe them, the procedures that generated them, and their use as potential benchmarks. We provide a compressed ZIP file containing the data sets and a GitHub repository of associated Python code. Finally, we discuss opportunities for future work incorporating the data sets and creating similar benchmark collections.
Literature
1.
2.
go back to reference Seshadri R, Sparks TD (2016) Perspective: interactive material property databases through aggregation of literature data. APL Mater 4(5):053206CrossRef Seshadri R, Sparks TD (2016) Perspective: interactive material property databases through aggregation of literature data. APL Mater 4(5):053206CrossRef
3.
go back to reference Curtarolo S, Setyawan W, Hart GLW, Jahnatek M, Chepulskii RV, Taylor RH, Wang S, Xue J, Yang K, Levy O et al (2012) AFLOW: an automatic framework for high-throughput materials discovery. Comput Mater Sci 58:218–226CrossRef Curtarolo S, Setyawan W, Hart GLW, Jahnatek M, Chepulskii RV, Taylor RH, Wang S, Xue J, Yang K, Levy O et al (2012) AFLOW: an automatic framework for high-throughput materials discovery. Comput Mater Sci 58:218–226CrossRef
4.
go back to reference Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G et al (2013) Commentary: the materials project—a materials genome approach to accelerating materials innovation. Apl Mater 1(1):011002CrossRef Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G et al (2013) Commentary: the materials project—a materials genome approach to accelerating materials innovation. Apl Mater 1(1):011002CrossRef
5.
go back to reference Hellenbrandt M (2004) The inorganic crystal structure database (ICSD)—present and future. Crystallogr Rev 10(1):17–22CrossRef Hellenbrandt M (2004) The inorganic crystal structure database (ICSD)—present and future. Crystallogr Rev 10(1):17–22CrossRef
6.
go back to reference Saal JE, Kirklin S, Aykol M, Meredig B, Wolverton C (2013) Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65(11):1501–1509CrossRef Saal JE, Kirklin S, Aykol M, Meredig B, Wolverton C (2013) Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65(11):1501–1509CrossRef
7.
go back to reference Hill J, Mulholland G, Persson K, Seshadri R, Wolverton C, Meredig B (2016) Materials science with large-scale data and informatics: unlocking new opportunities. MRS Bull 41(5):399–409CrossRef Hill J, Mulholland G, Persson K, Seshadri R, Wolverton C, Meredig B (2016) Materials science with large-scale data and informatics: unlocking new opportunities. MRS Bull 41(5):399–409CrossRef
8.
go back to reference Ward L, Dunn A, Faghaninia A, Zimmermann NER, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M et al (2018) Matminer: an open source toolkit for materials data mining. Comput Mater Sci 152:60–69CrossRef Ward L, Dunn A, Faghaninia A, Zimmermann NER, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M et al (2018) Matminer: an open source toolkit for materials data mining. Comput Mater Sci 152:60–69CrossRef
9.
go back to reference Ong SP, Richards WD, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier VL, Persson KA, Ceder G (2013) Python materials genomics (pymatgen): a robust, open-source python library for materials analysis. Comput Mater Sci 68:314–319CrossRef Ong SP, Richards WD, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier VL, Persson KA, Ceder G (2013) Python materials genomics (pymatgen): a robust, open-source python library for materials analysis. Comput Mater Sci 68:314–319CrossRef
10.
go back to reference Giannozzi P, Baroni S, Bonini N, Calandra M, Car R, Cavazzoni C, Ceresoli D, Chiarotti GL, Cococcioni M, Dabo I et al (2009) Quantum ESPRESSO: a modular and open-source software project for quantum simulations of materials. J Phys Condens Matter 21(39):395502CrossRef Giannozzi P, Baroni S, Bonini N, Calandra M, Car R, Cavazzoni C, Ceresoli D, Chiarotti GL, Cococcioni M, Dabo I et al (2009) Quantum ESPRESSO: a modular and open-source software project for quantum simulations of materials. J Phys Condens Matter 21(39):395502CrossRef
11.
12.
go back to reference Schmidt J, Marques MRG, Botti S, Marques MAL (2019) Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater 5(1):1–36CrossRef Schmidt J, Marques MRG, Botti S, Marques MAL (2019) Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater 5(1):1–36CrossRef
13.
go back to reference Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH (2017) PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min 10(1):36CrossRef Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH (2017) PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min 10(1):36CrossRef
14.
go back to reference Deng L (2012) The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142CrossRef Deng L (2012) The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142CrossRef
15.
go back to reference Krizhevsky A, Nair V, Hinton G, CIFAR-10 and CIFAR-100 datasets. www.cs.toronto.edu/kriz/cifar.html Krizhevsky A, Nair V, Hinton G, CIFAR-10 and CIFAR-100 datasets. www.cs.toronto.edu/kriz/cifar.html
17.
go back to reference Zhuo Y, Tehrani AM, Brgoch J (2018) Predicting the band gaps of inorganic solids by machine learning. J Phys Chem Lett 9(7):1668–1673CrossRef Zhuo Y, Tehrani AM, Brgoch J (2018) Predicting the band gaps of inorganic solids by machine learning. J Phys Chem Lett 9(7):1668–1673CrossRef
18.
go back to reference Zhang Y, Kitchaev DA, Yang J, Chen T, Dacek ST, Sarmiento-Pérez RA, Marques MAL, Peng H, Ceder G, Perdew JP et al (2018) Efficient first-principles prediction of solid stability: towards chemical accuracy. NPJ Comput Mater 4(1):1–6CrossRef Zhang Y, Kitchaev DA, Yang J, Chen T, Dacek ST, Sarmiento-Pérez RA, Marques MAL, Peng H, Ceder G, Perdew JP et al (2018) Efficient first-principles prediction of solid stability: towards chemical accuracy. NPJ Comput Mater 4(1):1–6CrossRef
20.
go back to reference Hall SR, Allen FH, Brown ID (1991) The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallogr A 47(6):655–685CrossRef Hall SR, Allen FH, Brown ID (1991) The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallogr A 47(6):655–685CrossRef
21.
go back to reference Xie T, Grossman JC (2018) Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys Rev Lett 120(14):145301CrossRef Xie T, Grossman JC (2018) Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys Rev Lett 120(14):145301CrossRef
22.
go back to reference Schütt KT, Sauceda HE, Kindermans P-J, Tkatchenko A, Müller K-R (2018) SchNet—a deep learning architecture for molecules and materials. J Chem Phys 148(24):241722CrossRef Schütt KT, Sauceda HE, Kindermans P-J, Tkatchenko A, Müller K-R (2018) SchNet—a deep learning architecture for molecules and materials. J Chem Phys 148(24):241722CrossRef
23.
go back to reference Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, BerlinCrossRef Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, BerlinCrossRef
Metadata
Title
Benchmark AFLOW Data Sets for Machine Learning
Authors
Conrad L. Clement
Steven K. Kauwe
Taylor D. Sparks
Publication date
27-05-2020
Publisher
Springer International Publishing
Published in
Integrating Materials and Manufacturing Innovation / Issue 2/2020
Print ISSN: 2193-9764
Electronic ISSN: 2193-9772
DOI
https://doi.org/10.1007/s40192-020-00174-4

Premium Partners