Many structural and mechanical properties of crystals, glasses, and biological macromolecules can be modeled from the local interactions between atoms. These interactions ultimately derive from the quantum nature of electrons, which can be prohibitively expensive to simulate. Machine learning has the potential to revolutionize materials modeling due to its ability to efficiently approximate complex functions. For example, neural networks can be trained to reproduce results of density functional theory calculations at a much lower cost. However, how neural networks reach their predictions is not well understood, which has led to them being used as a “black box” tool. This lack of understanding is not desirable especially for applications of neural networks in scientific inquiry. We argue that machine learning models trained on physical systems can be used as more than just approximations since they had to “learn” physical concepts in order to reproduce the labels they were trained on. We use dimensionality reduction techniques to study in detail the representation of silicon atoms at different stages in a neural network, which provides insight into how a neural network learns to model atomic interactions.

Although the ultimate basis of atomic interactions in solids is the quantum mechanical behavior of valence electrons, classical interatomic potentials are widely and successfully used to describe many properties of materials. Conventional approaches to deriving interatomic potentials are based on physically motivated functional forms. For example, the embedded atom method (EAM), commonly used to model atomic interactions in metals, is based on the intuition that the electrons responsible for the interactions are spherically distributed around the atoms, and hence the charge density of a metallic solid is largely isotropic.1 These implied approximations determine the success and the limitations of the EAM. Analogous concepts based on physical intuition are used to construct interatomic potentials for covalently or ionically bonded solids and molecules. It is not easy to generalize this approach to systems of mixed character, or systems that contain a large number of different elements. As a response to this challenge, several ML-based empirical potentials have been proposed in the last decade.2,3 Since ML algorithms use flexible functional forms, any kind of atomic interaction that is present in the data can, in principle, be “learned” with sufficient training. Through the use of clever molecular dynamics (MD) schemes4–7 and careful construction of training sets, ML-based potentials can allow for much longer and larger simulations with accuracy comparable to that of first-principles quantum mechanical approaches like density functional theory (DFT). Due to the complexity and the flexibility afforded by ML algorithms, it is difficult to extract information on how they can model complicated systems. For this reason, ML-based potentials are often viewed as “black boxes” that approximate the target values on which the algorithm was trained and not as a tool that can produce conceptual insights.

Previously, dimensionality reduction has been applied to molecular dynamics trajectories by Ceriotti and co-authors.8 The coarse grained descriptions provided by sketch-maps can be used to analyze and accelerate the exploration of phase space.9,10 On the other hand, linear classifier probes have been applied to intermediate layers to visualize the role of each hidden layer of a deep neural network (NN).11 Additionally, hidden layers of a deep NN that has been trained on organic molecule properties have been visualized (for the properties that the NN was trained on) using principal component analysis (PCA).12 

Here we propose an approach to extract what a NN has “learned” in order to predict total energies as a function of the atomic positions and to elucidate how this was accomplished. Specifically, we consider Si as a model system and explore its behavior in several different solid and liquid phases. We begin by training a NN potential to reproduce total energies that were originally obtained through DFT calculations, as a function of atomic configurations. We use the NN architecture proposed by Behler and Parrinello,2 which has been employed to successfully construct empirical potentials for semiconductors,13,14 metals,15,16 and water clusters.17 The input to the NN is symmetry functions of the atomic positions, which describe each atom’s local environment (Fig. 1). We then explore the internal characteristics of this potential by investigating the representation of the Si crystal as it passes through the NN architecture, by using dimensionality reduction algorithms. This allows us to study and visualize the energy landscape of Si in an unbiased way since we do not impose any physical intuition in the construction of the potential. By focusing on the interaction of an individual Si atom with its local environment, we can present the complicated many-body nature of the interaction in a simple and intuitive manner.

FIG. 1.

Schematic of the NN used to construct the interatomic potential, with contributions from many atoms (two atoms are highlighted red and green), starting with atomic coordinates which are transformed to symmetry functions of radial (represented as circles) and angular (represented as triangular funnels) character.

FIG. 1.

Schematic of the NN used to construct the interatomic potential, with contributions from many atoms (two atoms are highlighted red and green), starting with atomic coordinates which are transformed to symmetry functions of radial (represented as circles) and angular (represented as triangular funnels) character.

Close modal

Our training set is composed of 64-atom Si cells in four different phases: cubic-diamond (CD), β-tin, R8, and liquid (L). Data are collected by running MD at temperatures of 300, 600, and 900 K for the solid phases and up to 2500 K for the L phase. Additional data are prepared by random distortions of the crystal structures. All the unit cells are relaxed at zero pressure. The MD simulations are performed within the Vienna ab initio simulation package (VASP),18 using a projector-augmented wave (PAW) potential.19,20 We use the generalized gradient approximation proposed by Perdew, Burke, and Ernzerhof as the exchange-correlation functional.21 The k-point mesh is optimized for an energy convergence of 0.5 meV/atom and a stress convergence of 0.1 kbar. The plane wave energy cutoff was set to 300 eV. In order to reduce the correlation between data points, we thin the MD data by using one out of every 100 consecutive structures from the MD simulations at 300 K and one out of every 20 structures from higher temperature MD simulations. After the thinning, 23 000 64-atom structures from many temperatures and phases and their DFT energies are used to train and validate the NN. Of these data, 18 000 structures were randomly chosen as the training set and the rest were divided into a validation set and an independent test set. We use the symmetry function set reported in Ref. 22 as our descriptors, which consists of 8 radial and 22 angular symmetry functions for an input vector of size 30 for each atomic contribution. We reduce the number of symmetry functions to 28 since 2 of the angular functions are linear combinations of other angular symmetry functions. Since each structure has 64 atoms, our data set has a total of 23000×641.5 × 106 local neighborhoods.

The total energy of a system is approximated as a sum of individual atomic contributions, where each atomic contribution is represented by a NN.2 In the case of a system in which all atoms are the same, each NN has the same numerical weights and biases, and they are updated in a concurrent way as the network is trained. To train the NN, total energies of Si cells containing a relatively large number of atoms, obtained from MD simulations using DFT-derived forces for the dynamics, are used as labels. The non-linear transformations applied to the symmetry functions by a trained NN can reproduce their energies almost as accurately as DFT (within 3-6 meV/atom). The final output of an individual Si atom’s NN is the energy contribution Ei of that Si atom’s local neighborhood, which is a linear combination of the numbers that come out of the Nn nodes in the last hidden layer. Thus, these Nn numbers that the last hidden layer outputs can be thought of as the representation23 of a Si atom’s local neighborhood within the NN, after which a simple linear operation gives Ei, as illustrated in Fig. 1. The calculation of symmetry functions and the subsequent application of the hidden layers transform the Cartesian coordinates of atoms so that the resulting Nn numbers in the last hidden layer are a convenient representation for predicting the energy.

The NNs are trained with the Torch 7 software package,24 using a Graphics Processing Unit (GPU). In examining different NN architectures, we explore 1–3 hidden layers with the number of hidden nodes Nn ranging from 10 to 60 in each layer and pick the topology that performs best on the validation set. The lowest test error is achieved using a NN with 2 hidden layers and Nn = 60 nodes in each layer. This network achieves a root mean squared error (RMSE) of 5.1 meV/atom on the training set, and 5.7 meV/atom on the validation set. Then we applied the NN with this topology to the independent test set which achieved a 5.9 meV/atom RMSE. This is comparable to the RMSE of 4-5 meV/atom for the training set and 5-6 meV/atom for the test set for Si reported in the literature using the same network architecture.2 As a simple baseline, the mean predictor, which predicts the mean value of the energy labels in the test set for every structure, achieves an RMSE of 182 meV/atom. A separate NN was trained only on the L and CD phases, in order to focus on the structural transition between them. With less variety of neighborhoods in the data set, the NN can focus on these two phases, with lower RMSE on the training and test sets. The NN with the same topology of 2 hidden layers of Nn = 60 achieves a RMSE of 2 meV/atom on the training set and 2.4 meV/atom on the test set.

As a first attempt to visualize atomic neighborhoods we use PCA25 as a dimensionality reduction algorithm on the symmetry function representation and the hidden layer representation. In Fig. 2(a), we plot the symmetry function representation in the first two principal components (PC1, PC2). Each point represents a 64-atom Si cell, color coded for the different phases. Structures with CD order in the nearest neighbor shell are presented in three different colors that identify the degree of crystalline order: green if the four nearest neighbors of the Si atom form a distorted tetrahedron,26–29 orange if they have perfect tetrahedral order in the nearest neighbor shell, and red if they have perfect tetrahedral order in the first and second neighbor shells. In the two-dimensional projection of the symmetry function space, the different phases are already separated in clusters, Fig. 2(a). Although there are hundreds of red points on this plot, all the perfectly ordered CD structures are contained within a very narrow region of (PC1, PC2) values. PC1 can be identified as the phase information although it is not able to distinguish between different levels of order in the CD structures very well. β-tin (magenta) structures form an isolated island in PC1. In the two dimensional projection of the last hidden layer representation Fig. 2(b), PC1 distinguishes between different phases with higher precision. The level of order present in CD structures, between red, green, and blue is well represented in PC1. L (blue) structures have two distinct regions: the higher temperature liquid-like regions overlap with the R8 phase in PC1, whereas the lower temperature L has some overlap with the CD (green) phase. The β-tin phase is a 6-fold metallic structure whereas R8, CD, and the L phases all represent different structures with various degrees of 4-fold tetrahedral coordination. The R8 phase can transform to the CD phase at zero pressure whereas a pressure of 11.7 GPa is required to transform the CD phase to the β-tin phase.30 Interestingly, this is captured in the hidden layer representation since the data collected at zero pressure present β-tin as a separate island in PC1, whereas the symmetry function representation does not present this distinction. PC2 is related to the potential energy of the structures. In Fig. 2(a), we show the potential energy, which is correlated with the PC2 value for different phases. The dependence is not as strong for the β-tin phase, as the energy of the β-tin structures seems to go down and then up as a function of PC2. In the hidden layer representation, the correlation is stronger [Fig. 2(b)]: the energy of structures in each phase is a linear function of PC2. It is interesting that the last layer of the NN projects the hidden layer representation onto PC2 to calculate the energy of a structure.

FIG. 2.

(a) PCA of the symmetry function representation. (b) PCA of the last hidden layer representation. The central panels are the representations of Si structures projected onto the first two principal components, PC1 and PC2. The side panels: right—a histogram of structures as a function of their PC1 value to facilitate visualization of overlap; top—average potential energy per atom (in eV) of structures plotted against PC2, calculated using a fixed bin size of 50 structures.

FIG. 2.

(a) PCA of the symmetry function representation. (b) PCA of the last hidden layer representation. The central panels are the representations of Si structures projected onto the first two principal components, PC1 and PC2. The side panels: right—a histogram of structures as a function of their PC1 value to facilitate visualization of overlap; top—average potential energy per atom (in eV) of structures plotted against PC2, calculated using a fixed bin size of 50 structures.

Close modal

The PCA showed us that, through non-linear transformations, the NN represents the original data by emphasizing the structural phase and energy information. Not surprisingly, the PC1 of symmetry functions is related to the phase of a structure since phase is the main source of variance in the structure of atomic neighborhoods. It is interesting that the NN makes that relation stronger, by making the PC1 of the hidden layer describe phase information more precisely. Since specific information on the various phases of the Si structures was not supplied to the NN or used at any step of the learning process, as, for instance, in the definition of the symmetry functions, it is quite interesting that the NN learned to emphasize these key features independently in order to represent correctly the energy landscape. Overall, Fig. 2 shows that the symmetry function representation is a good representation for expressing the potential energy, with PC1 and PC2 being the physically relevant quantities. The NN does not change the character of the symmetry function representation, but makes the relations stronger and more precise, which produces the almost ab initio level prediction of the total energy of each structure.

To gain a more detailed understanding of the representations, especially the non-linear and local properties of the data, we use the non-linear dimensionality reduction technique t-SNE (t-Distributed Stochastic Neighbor Embedding).31 Unlike PCA, t-SNE focuses on the local topology of points in high dimensional space to project data onto a lower dimensional space, typically of size 2 or 3, which captures how the individual atom neighborhoods are represented within the NN, in relation to similar neighborhoods around it. In order to visualize several tens of thousands of points, we used the Barnes-Hut32 implementation which can substantially accelerate t-SNE for large data sets.33 

In order to investigate a single Si atom’s interaction with its neighbors, we study the representation of local neighborhoods as we step through the NN, beginning with a Cartesian description and ending in the final hidden layer. Before the application of symmetry functions and the NN, the atoms are represented in Cartesian coordinates. An atom’s neighborhood is represented by 3×Na numbers, the coordinates of Na neighbors relative to the central atom at the origin. The symmetry functions are calculated with a cutoff of 6 Å, which on average includes 43 neighbors. We apply t-SNE to local neighborhoods in the Cartesian space, which is (3×43) = 129-dimensional. We sort the coordinates of neighbors in order of their distance, that is, the nth set of three numbers denotes the position of the nth nearest neighbor. This representation is invariant to translations and permutations of atoms, but not to rotations. We apply t-SNE to 50 000 randomly chosen local neighborhoods in the Cartesian representation, which we show in Fig. 3(a). There is a clear separation between the β-tin phase and the other phases. The R8 and L phases are completely mixed, and the CD phase is scattered irregularly.

FIG. 3.

Representations of the Si local neighborhoods by t-SNE: (a)–(c) are maps of the Cartesian, symmetry function, and hidden layer representations, respectively. Each map has 50 000 neighborhoods. The color of each phase is the same as in Fig. 2. (d) NN hidden layer colored according to the energy contribution of neighborhoods. [(e) and (f)] Hidden layer representation of the NN trained only on the CD and L phases.

FIG. 3.

Representations of the Si local neighborhoods by t-SNE: (a)–(c) are maps of the Cartesian, symmetry function, and hidden layer representations, respectively. Each map has 50 000 neighborhoods. The color of each phase is the same as in Fig. 2. (d) NN hidden layer colored according to the energy contribution of neighborhoods. [(e) and (f)] Hidden layer representation of the NN trained only on the CD and L phases.

Close modal

The next step of the NN pipeline is the calculation of the symmetry functions. We use t-SNE to map the 28-dimensional symmetry function space to 2 dimensions, as shown in Fig. 3(b). The symmetry function representation is invariant to rotations, in addition to the symmetry invariances of the Cartesian representation. Since radial (angular) symmetry functions are calculated from two-body (three-body) type interactions, they are more convenient for describing the contributions of atomic interactions to the energy. This is apparent in Fig. 3(b), where neighborhoods in the same phase are clustered together. CD phases at different tetrahedral levels of order are mixed together. Some of the local neighborhoods from the R8 phase are mixed with some of the L neighborhoods. The clustering of similar neighborhoods is the reason that many existing ML algorithms, even linear models (in the case of metallic atoms),34 have successfully modeled the energy landscape of atomistic systems using symmetry functions. In contrast, it would be much harder to perform ML techniques on the Cartesian representation directly.

Next, we apply t-SNE to the hidden layer representation [Fig. 3(c)]. The β-tin phase is completely separated, as it was in the first two dimensions of the PCA. L neighborhoods are connected to both R8 and CD. Neighborhoods in the CD phase farther from the L phase have better tetrahedral order, in a connecting, neck-like structure color-coded from green to orange to red to indicate increasing order. This set of structures represents the region of phase transition from the L to the perfect CD order.

In the remaining discussion, we offer arguments of how this analysis can actually lead to interesting physical insight into possible phase transformations in Si. As a first step in this direction, in Fig. 3(d), we present the same map with each point colored not by its phase label but according to the energy contribution of that local neighborhood. This energy is not defined within DFT, but it is defined as the output of each atom’s individual NN. This plot shows how the neighborhoods within a phase are distributed: the higher energy neighborhoods of the R8 phase are closer to the L region; the R8 and CD phases are both connected to the L phase, but the energy is lower for neighborhoods that are farther from the L region; finally, the β-tin neighborhoods have generally higher energies, with those on the right side at a higher potential energy compared to those on the left side of the corresponding phase region.

To see the structural transition from the L phase to the CD crystal in more detail, we train a separate NN on just these two phases. We apply t-SNE on the representation of this NN [Fig. 3(f)] which allows us to see the structure of the hidden layer representation on these two phases in more detail. There is a path from liquid-like Si neighborhood to the CD crystal through a smooth transition of increasing order. This transition is also apparent in Fig. 3(e). The higher energy L neighborhoods smoothly transform into the lowest energy CD neighborhoods. To offer a visualization of this general shape in the original 60-dimensional representation space, we could describe it as a “sphere” of neighborhoods with no tetrahedral order connected by a long “neck” that extends from the “sphere” all the way to the perfect crystal. Including the potential energy as an additional dimension morphs this representation into a landscape with a single funnel extending to the global minimum. It is well known that Si crystallizes to the CD phase easily during cooling.35 This can be explained by Si having a deep global minimum at the bottom of a single funnel that is easily reached.36,37

A second insight gained by studying these maps relates to the transformation between crystalline phases of Si. For example, the fact that high external pressure is required for the transition from the CD crystalline phase to the β-tin crystalline phase is explained by the feature of the neighborhood map that the local neighborhoods corresponding to the β-tin structure lie on a separate manifold than those of the L and CD phases. A significant energy barrier (through the applied pressure) must be overcome to pass from one phase to the other, as has been established both experimentally38,39 and theoretically.40 

ML algorithms are gaining popularity in solid state physics and chemistry as an efficient approximation tool.13,41–50 Due to increasing computational resources, the wider availability of scientific data, and advanced ML algorithms, data science approaches can prove very effective for studying complex atomistic problems. Instead of merely serving as a “black box” approximation tool, ML can also be used to “learn” scientific concepts from computational data. For example, ML models have been studied to gain insight into the kinetic energy of noninteracting electrons51,52 or local chemical potential of a hydrogen test charge for various molecules.53 Elsewhere, we used ML representations combined with theoretical models to gain insights into the relationship between structural and dynamical properties of atomistic systems.54–57 In this work, we have shown that the application of the ML algorithm can be interpreted as a set of physically meaningful transformations on data. We expect that these notions can be extended to address issues related to more complex materials.

See supplementary material for the training data from molecular dynamics runs at different temperatures for CD, β-tin, L, and R8 phases and also for a Torch NN file that can be used to predict the DFT energy of a unit cell with Si atoms.

We thank David Wales, Wenguang Zhu, and the Ceder group for helpful discussions. We also thank Bryce Meredig for providing useful comments on the manuscript. Neural networks were trained on the Odyssey cluster supported by the FAS Division of Science Research Computing Group at Harvard University. Density functional theory calculations were run on Stampede, Texas Advanced Computing Center as part of an XSEDE allocation, which is supported by NSF Grant No. ACI-1053575.

1.
R.
Johnson
and
D.
Oh
,
J. Mater. Res.
4
,
1195
(
1989
).
2.
J.
Behler
and
M.
Parrinello
,
Phys. Rev. Lett.
98
,
146401
(
2007
).
3.
A. P.
Bartók
,
M. C.
Payne
,
R.
Kondor
, and
G.
Csányi
,
Phys. Rev. Lett.
104
,
136403
(
2010
).
4.
G.
Csányi
,
T.
Albaret
,
M.
Payne
, and
A.
De Vita
,
Phys. Rev. Lett.
93
,
175503
(
2004
).
5.
Z.
Li
,
J. R.
Kermode
, and
A.
De Vita
,
Phys. Rev. Lett.
114
,
096405
(
2015
).
6.
A.
Waterland
,
E.
Angelino
,
E. D.
Cubuk
,
E.
Kaxiras
,
R. P.
Adams
,
J.
Appavoo
, and
M.
Seltzer
, in
Proceedings of the 6th International Systems and Storage Conference
(
ACM
,
2013
), p.
8
.
7.
D.
Perez
,
E. D.
Cubuk
,
A.
Waterland
,
E.
Kaxiras
, and
A. F.
Voter
,
J. Chem. Theory Comput.
12
,
18
(
2016
).
8.
M.
Ceriotti
,
G. A.
Tribello
, and
M.
Parrinello
,
Proc. Natl. Acad. Sci. U. S. A.
108
,
13023
(
2011
).
9.
G. A.
Tribello
,
M.
Ceriotti
, and
M.
Parrinello
,
Proc. Natl. Acad. Sci. U. S. A.
109
,
5196
(
2012
).
10.
M.
Ceriotti
,
G. A.
Tribello
, and
M.
Parrinello
,
J. Chem. Theory Comput.
9
,
1521
(
2013
).
11.
G.
Alain
and
Y.
Bengio
, preprint arXiv:1610.01644 (
2016
).
12.
G.
Montavon
,
M.
Rupp
,
V.
Gobre
,
A.
Vazquez-Mayagoitia
,
K.
Hansen
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O. A.
Von Lilienfeld
,
New J. Phys.
15
,
095003
(
2013
).
13.
R. Z.
Khaliullin
,
H.
Eshet
,
T. D.
Kühne
,
J.
Behler
, and
M.
Parrinello
,
Nat. Mater.
10
,
693
(
2011
).
14.
J.
Behler
,
R.
Martoňák
,
D.
Donadio
, and
M.
Parrinello
,
Phys. Rev. Lett.
100
,
185501
(
2008
).
15.
N.
Artrith
and
J.
Behler
,
Phys. Rev. B
85
,
045439
(
2012
).
16.
N.
Artrith
and
A. M.
Kolpak
,
Nano Lett.
14
,
2670
(
2014
).
17.
S. K.
Natarajan
,
T.
Morawietz
, and
J.
Behler
,
Phys. Chem. Chem. Phys.
17
,
8356
(
2015
).
18.
J.
Hafner
,
J. Comput. Chem.
29
,
2044
(
2008
).
19.
G.
Kresse
and
J.
Hafner
,
Phys. Rev. B
47
,
558
(
1993
).
20.
G.
Kresse
and
J.
Furthmüller
,
Comput. Mater. Sci.
6
,
15
(
1996
).
21.
J. P.
Perdew
,
K.
Burke
, and
M.
Ernzerhof
,
Phys. Rev. Lett.
77
,
3865
(
1996
).
22.
N.
Artrith
,
B.
Hiller
, and
J.
Behler
,
Phys. Status Solidi (b)
250
,
1191
(
2013
).
23.
Y.
Bengio
,
A.
Courville
, and
P.
Vincent
,
IEEE Trans. Pattern Anal. Mach. Intell.
35
,
1798
(
2013
).
24.
R.
Collobert
,
K.
Kavukcuoglu
, and
C.
Farabet
, in
BigLearn, NIPS Workshop, EPFL-CONF-192376
,
2011
.
25.
S.
Wold
,
K.
Esbensen
, and
P.
Geladi
,
Chemom. Intell. Lab. Syst.
2
,
37
(
1987
).
26.
E. D.
Cubuk
and
E.
Kaxiras
,
Nano Lett.
14
,
4065
(
2014
).
27.
V.
Luchnikov
,
N.
Medvedev
,
A.
Appelhagen
, and
A.
Geiger
,
Mol. Phys.
88
,
1337
(
1996
).
28.
E. D.
Cubuk
,
W. L.
Wang
,
K.
Zhao
,
J. J.
Vlassak
,
Z.
Suo
, and
E.
Kaxiras
,
Nano Lett.
13
,
2011
(
2013
).
29.
A.
Ostadhossein
,
S.-Y.
Kim
,
E. D.
Cubuk
,
Y.
Qi
, and
A. C.
van Duin
,
J. Phys. Chem. A
120
,
2114
(
2016
).
30.
B. D.
Malone
,
J. D.
Sau
, and
M. L.
Cohen
,
Phys. Rev. B
78
,
035210
(
2008
).
31.
L.
Van der Maaten
and
G.
Hinton
,
J. Mach. Learn. Res.
9
,
85
(
2008
).
32.
J.
Barnes
and
P.
Hut
,
Nature
324
,
446
(
1986
).
33.
L.
Van Der Maaten
,
J. Mach. Learn. Res.
15
,
3221
(
2014
).
34.
A.
Seko
,
A.
Takahashi
, and
I.
Tanaka
,
Phys. Rev. B
90
,
024101
(
2014
).
35.
A.
Hedler
,
S. L.
Klaumünzer
, and
W.
Wesch
,
Nat. Mater.
3
,
804
(
2004
).
36.
D. J.
Wales
,
Philos. Trans. R. Soc., B
370
,
2877
(
2012
).
37.
D. J.
Wales
,
M. A.
Miller
, and
T. R.
Walsh
,
Nature
394
,
758
(
1998
).
39.
J.
Hu
and
I.
Spain
,
Solid State Commun.
51
,
263
(
1984
).
40.
B. G.
Pfrommer
,
M.
Côté
,
S. G.
Louie
, and
M. L.
Cohen
,
Phys. Rev. B
56
,
6662
(
1997
).
41.
E. D.
Cubuk
,
S. S.
Schoenholz
,
J. M.
Rieser
,
B. D.
Malone
,
J.
Rottler
,
D. J.
Durian
,
E.
Kaxiras
, and
A. J.
Liu
,
Phys. Rev. Lett.
114
,
108001
(
2015
).
42.
M.
Rupp
,
A.
Tkatchenko
,
K.-R.
Müller
, and
O. A.
von Lilienfeld
,
Phys. Rev. Lett.
108
,
058301
(
2012
).
43.
K.
Fujimura
,
A.
Seko
,
Y.
Koyama
,
A.
Kuwabara
,
I.
Kishida
,
K.
Shitara
,
C. A.
Fisher
,
H.
Moriwake
, and
I.
Tanaka
,
Adv. Energy Mater.
3
,
980
(
2013
).
44.
S.
Curtarolo
,
G. L.
Hart
,
M. B.
Nardelli
,
N.
Mingo
,
S.
Sanvito
, and
O.
Levy
,
Nat. Mater.
12
,
191
(
2013
).
45.
C. C.
Fischer
,
K. J.
Tibbetts
,
D.
Morgan
, and
G.
Ceder
,
Nat. Mater.
5
,
641
(
2006
).
46.
L.
Ward
,
A.
Agrawal
,
A.
Choudhary
, and
C.
Wolverton
,
Comput. Mater.
2
,
16028
(
2016
).
47.
L. M.
Ghiringhelli
,
J.
Vybiral
,
S. V.
Levchenko
,
C.
Draxl
, and
M.
Scheffler
,
Phys. Rev. Lett.
114
,
105503
(
2015
).
48.
J. C.
Snyder
,
M.
Rupp
,
K.
Hansen
,
K.-R.
Müller
, and
K.
Burke
,
Phys. Rev. Lett.
108
,
253002
(
2012
).
49.
A. D.
Sendek
,
Q.
Yang
,
E. D.
Cubuk
,
K.-A. N.
Duerloo
,
Y.
Cui
, and
E. J.
Reed
,
Energy Environ. Sci.
10
,
306
(
2017
).
50.
F. A.
Faber
,
L.
Hutchison
,
B.
Huang
,
J.
Gilmer
,
S. S.
Schoenholz
,
G. E.
Dahl
,
O.
Vinyals
,
S.
Kearnes
,
P. F.
Riley
, and
O. A.
von Lilienfeld
, preprint arXiv:1702.05532 (
2017
).
51.
K.
Vu
,
J. C.
Snyder
,
L.
Li
,
M.
Rupp
,
B. F.
Chen
,
T.
Khelif
,
K.-R.
Müller
, and
K.
Burke
,
Int. J. Quantum Chem.
115
,
1115
(
2015
).
52.
L.
Li
,
J. C.
Snyder
,
I. M.
Pelaschier
,
J.
Huang
,
U.-N.
Niranjan
,
P.
Duncan
,
M.
Rupp
,
K.-R.
Müller
, and
K.
Burke
,
Int. J. Quantum Chem.
116
,
819
(
2015
).
53.
K. T.
Schütt
,
F.
Arbabzadah
,
S.
Chmiela
,
K. R.
Müller
, and
A.
Tkatchenko
,
Nat. Commun.
8
,
13890
(
2017
).
54.
S. S.
Schoenholz
,
E. D.
Cubuk
,
D. M.
Sussman
,
E.
Kaxiras
, and
A. J.
Liu
,
Nat. Phys.
12
,
469
(
2016
).
55.
E. D.
Cubuk
,
S. S.
Schoenholz
,
E.
Kaxiras
, and
A. J.
Liu
,
J. Phys. Chem. B
120
,
6139
(
2016
).
56.
S. S.
Schoenholz
,
E. D.
Cubuk
,
E.
Kaxiras
, and
A. J.
Liu
,
Proc. Natl. Acad. Sci. U. S. A.
114
,
263
(
2017
).
57.
D. M.
Sussman
,
S. S.
Schoenholz
,
E. D.
Cubuk
, and
A. J.
Liu
, preprint arXiv:1610.03401 (
2016
).

Supplementary Material