Skip to main content
Erschienen in: Integrating Materials and Manufacturing Innovation 2/2020

27.05.2020 | Technical Article

Benchmark AFLOW Data Sets for Machine Learning

verfasst von: Conrad L. Clement, Steven K. Kauwe, Taylor D. Sparks

Erschienen in: Integrating Materials and Manufacturing Innovation | Ausgabe 2/2020

Einloggen, um Zugang zu erhalten

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Materials informatics is increasingly finding ways to exploit machine learning algorithms. Techniques such as decision trees, ensemble methods, support vector machines, and a variety of neural network architectures are used to predict likely material characteristics and property values. Supplemented with laboratory synthesis, applications of machine learning to compound discovery and characterization represent one of the most promising research directions in materials informatics. A shortcoming of this trend, in its current form, is a lack of standardized materials data sets on which to train, validate, and test model effectiveness. Applied machine learning research depends on benchmark data to make sense of its results. Fixed, predetermined data sets allow for rigorous model assessment and comparison. Machine learning publications that do not refer to benchmarks are often hard to contextualize and reproduce. In this data descriptor article, we present a collection of data sets of different material properties taken from the AFLOW database. We describe them, the procedures that generated them, and their use as potential benchmarks. We provide a compressed ZIP file containing the data sets and a GitHub repository of associated Python code. Finally, we discuss opportunities for future work incorporating the data sets and creating similar benchmark collections.
Literatur
1.
Zurück zum Zitat Donoho D (2017) 50 years of data science. J Comput Gr Stat 26(4):745–766CrossRef Donoho D (2017) 50 years of data science. J Comput Gr Stat 26(4):745–766CrossRef
2.
Zurück zum Zitat Seshadri R, Sparks TD (2016) Perspective: interactive material property databases through aggregation of literature data. APL Mater 4(5):053206CrossRef Seshadri R, Sparks TD (2016) Perspective: interactive material property databases through aggregation of literature data. APL Mater 4(5):053206CrossRef
3.
Zurück zum Zitat Curtarolo S, Setyawan W, Hart GLW, Jahnatek M, Chepulskii RV, Taylor RH, Wang S, Xue J, Yang K, Levy O et al (2012) AFLOW: an automatic framework for high-throughput materials discovery. Comput Mater Sci 58:218–226CrossRef Curtarolo S, Setyawan W, Hart GLW, Jahnatek M, Chepulskii RV, Taylor RH, Wang S, Xue J, Yang K, Levy O et al (2012) AFLOW: an automatic framework for high-throughput materials discovery. Comput Mater Sci 58:218–226CrossRef
4.
Zurück zum Zitat Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G et al (2013) Commentary: the materials project—a materials genome approach to accelerating materials innovation. Apl Mater 1(1):011002CrossRef Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G et al (2013) Commentary: the materials project—a materials genome approach to accelerating materials innovation. Apl Mater 1(1):011002CrossRef
5.
Zurück zum Zitat Hellenbrandt M (2004) The inorganic crystal structure database (ICSD)—present and future. Crystallogr Rev 10(1):17–22CrossRef Hellenbrandt M (2004) The inorganic crystal structure database (ICSD)—present and future. Crystallogr Rev 10(1):17–22CrossRef
6.
Zurück zum Zitat Saal JE, Kirklin S, Aykol M, Meredig B, Wolverton C (2013) Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65(11):1501–1509CrossRef Saal JE, Kirklin S, Aykol M, Meredig B, Wolverton C (2013) Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65(11):1501–1509CrossRef
7.
Zurück zum Zitat Hill J, Mulholland G, Persson K, Seshadri R, Wolverton C, Meredig B (2016) Materials science with large-scale data and informatics: unlocking new opportunities. MRS Bull 41(5):399–409CrossRef Hill J, Mulholland G, Persson K, Seshadri R, Wolverton C, Meredig B (2016) Materials science with large-scale data and informatics: unlocking new opportunities. MRS Bull 41(5):399–409CrossRef
8.
Zurück zum Zitat Ward L, Dunn A, Faghaninia A, Zimmermann NER, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M et al (2018) Matminer: an open source toolkit for materials data mining. Comput Mater Sci 152:60–69CrossRef Ward L, Dunn A, Faghaninia A, Zimmermann NER, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M et al (2018) Matminer: an open source toolkit for materials data mining. Comput Mater Sci 152:60–69CrossRef
9.
Zurück zum Zitat Ong SP, Richards WD, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier VL, Persson KA, Ceder G (2013) Python materials genomics (pymatgen): a robust, open-source python library for materials analysis. Comput Mater Sci 68:314–319CrossRef Ong SP, Richards WD, Jain A, Hautier G, Kocher M, Cholia S, Gunter D, Chevrier VL, Persson KA, Ceder G (2013) Python materials genomics (pymatgen): a robust, open-source python library for materials analysis. Comput Mater Sci 68:314–319CrossRef
10.
Zurück zum Zitat Giannozzi P, Baroni S, Bonini N, Calandra M, Car R, Cavazzoni C, Ceresoli D, Chiarotti GL, Cococcioni M, Dabo I et al (2009) Quantum ESPRESSO: a modular and open-source software project for quantum simulations of materials. J Phys Condens Matter 21(39):395502CrossRef Giannozzi P, Baroni S, Bonini N, Calandra M, Car R, Cavazzoni C, Ceresoli D, Chiarotti GL, Cococcioni M, Dabo I et al (2009) Quantum ESPRESSO: a modular and open-source software project for quantum simulations of materials. J Phys Condens Matter 21(39):395502CrossRef
11.
12.
Zurück zum Zitat Schmidt J, Marques MRG, Botti S, Marques MAL (2019) Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater 5(1):1–36CrossRef Schmidt J, Marques MRG, Botti S, Marques MAL (2019) Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater 5(1):1–36CrossRef
13.
Zurück zum Zitat Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH (2017) PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min 10(1):36CrossRef Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH (2017) PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min 10(1):36CrossRef
14.
Zurück zum Zitat Deng L (2012) The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142CrossRef Deng L (2012) The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142CrossRef
15.
Zurück zum Zitat Krizhevsky A, Nair V, Hinton G, CIFAR-10 and CIFAR-100 datasets. www.cs.toronto.edu/kriz/cifar.html Krizhevsky A, Nair V, Hinton G, CIFAR-10 and CIFAR-100 datasets. www.cs.toronto.edu/kriz/cifar.html
17.
Zurück zum Zitat Zhuo Y, Tehrani AM, Brgoch J (2018) Predicting the band gaps of inorganic solids by machine learning. J Phys Chem Lett 9(7):1668–1673CrossRef Zhuo Y, Tehrani AM, Brgoch J (2018) Predicting the band gaps of inorganic solids by machine learning. J Phys Chem Lett 9(7):1668–1673CrossRef
18.
Zurück zum Zitat Zhang Y, Kitchaev DA, Yang J, Chen T, Dacek ST, Sarmiento-Pérez RA, Marques MAL, Peng H, Ceder G, Perdew JP et al (2018) Efficient first-principles prediction of solid stability: towards chemical accuracy. NPJ Comput Mater 4(1):1–6CrossRef Zhang Y, Kitchaev DA, Yang J, Chen T, Dacek ST, Sarmiento-Pérez RA, Marques MAL, Peng H, Ceder G, Perdew JP et al (2018) Efficient first-principles prediction of solid stability: towards chemical accuracy. NPJ Comput Mater 4(1):1–6CrossRef
20.
Zurück zum Zitat Hall SR, Allen FH, Brown ID (1991) The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallogr A 47(6):655–685CrossRef Hall SR, Allen FH, Brown ID (1991) The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallogr A 47(6):655–685CrossRef
21.
Zurück zum Zitat Xie T, Grossman JC (2018) Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys Rev Lett 120(14):145301CrossRef Xie T, Grossman JC (2018) Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys Rev Lett 120(14):145301CrossRef
22.
Zurück zum Zitat Schütt KT, Sauceda HE, Kindermans P-J, Tkatchenko A, Müller K-R (2018) SchNet—a deep learning architecture for molecules and materials. J Chem Phys 148(24):241722CrossRef Schütt KT, Sauceda HE, Kindermans P-J, Tkatchenko A, Müller K-R (2018) SchNet—a deep learning architecture for molecules and materials. J Chem Phys 148(24):241722CrossRef
23.
Zurück zum Zitat Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, BerlinCrossRef Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, BerlinCrossRef
Metadaten
Titel
Benchmark AFLOW Data Sets for Machine Learning
verfasst von
Conrad L. Clement
Steven K. Kauwe
Taylor D. Sparks
Publikationsdatum
27.05.2020
Verlag
Springer International Publishing
Erschienen in
Integrating Materials and Manufacturing Innovation / Ausgabe 2/2020
Print ISSN: 2193-9764
Elektronische ISSN: 2193-9772
DOI
https://doi.org/10.1007/s40192-020-00174-4

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.