Machine learning approach for structure-based zeolite classification

https://doi.org/10.1016/j.micromeso.2008.07.027Get rights and content

Abstract

Application of knowledge discovery methods to crystal structure databases is an emerging research area of materials science that is playing an important role in facilitating data analysis. This study is aimed at combining computational geometry methods with machine learning algorithms for classification of inorganic solid materials of known structure. Zeolite crystals are used for the pilot study where a model based on the topology is developed for classification of the compound by mineral name and by zeolite framework type. The topological descriptors are derived from the Delaunay tessellation for 220 zeolites contained in the inorganic crystal structure database. This zeolite-structure-predictor (ZSP) is trained for classifying this set of selected zeolite crystals into 22 different types of minerals and into 13 framework types. The ZSP is based on the random forest algorithm and contains attributes of Delaunay simplex properties such as tetrahedrality index, frequency of simplex occurrence, and site occupation probability. The ZSP is able to obtain classification in this multitude of classes with more than 81% of correctly classified instances based on framework type. The model shows that the classification into framework types is superior, and that the classification into mineral names is not structurally unique.

Introduction

Machine learning methods are successfully used for the discovery, analysis and optimization of chemical and biochemical compounds in medicinal chemistry, drug design, computational biology and other fields [1], [2]. Several machine learning techniques such as decision trees [3], genetic algorithms [4] and neural networks [5] were instrumental in recent advances in drug discovery and optimization. Mining of inorganic solid materials data is still today a daunting task because of the lack of open source data containing evaluated and validated physical and chemical information. On the other hand, the inorganic crystal structure database (ICSD) maintained jointly by the National Institute of Standards and Technology (NIST) in the US and the Fachinformationszentrum Karlsruhe (FIZ) in Germany is a valuable source of inorganic crystals data. This database contains structural information assembled from publications mainly based on X-ray diffraction experiments. Therefore, this is an excellent data source for data mining in materials science. Although informative, the ICSD provides partial knowledge concerning crystal structure and opens a window of opportunity for machine learning implementations aimed at supplementing understanding of structure at deeper levels. A simple approach is to explore the statistical structural content of families of compounds and organize these results in a relational database which can be easily accessed by users. More fundamental questions such as structure prediction are challenging computational tasks for which there are very few tools.

The aim of this work is to explore the applicability of statistical geometry combined with machine learning methods for accurate classification of solid materials with known X-ray structures. A pilot data set of zeolite crystal structures was selected for this study, although other families of materials can be analyzed as well. In the case of zeolites, the ICSD provides the user with the mineral name, chemical name and formula, space group, symmetry system but does not include structure-based classifications such as the framework topology. Zeolites are porous crystalline compounds that occur naturally and can be synthesized. The sustained interest in zeolites from the materials science and chemistry communities is based on their unique ion-exchange capabilities as well as their ability to retain substances of various sizes in their porous network. Zeolites are widely used for water softening, separation and removal of gases and solvents, molecular sieving, fuel refining, and petrochemical cracking.

Historically, zeolites were defined as aluminosilicate frameworks with loosely bonded cations and water molecules in out-of-framework positions [6], [7]. More recently, the definition has been extended to exclude the constraints on chemical composition. Currently, any crystalline substance characterized by a specific framework of linked TO4 units (T is a tetrahedrally-bonded atom in the framework) and containing cavities is considered to be a zeolite [8]. A number of minerals with frameworks based on other tetrahedrally-bonded chemical groups also display zeolitic properties [9]. Hundreds of zeolite crystals in the ICSD and millions of hypothetical zeolites [10], [11], [12] necessitate development of robust identification schema. Zeolites crystals are classified following three commonly used approaches. One structural classification is based on the framework topology, with distinct frameworks assigned three-letter codes [13]. A second structural classification scheme implements the “secondary building units” (SBU) concept, where SBU is a local mutual spatial arrangement of tetrahedra [9]. The third classification method is similar to the SBU supplemented with morphology content [9]. According to the Structure Commission of the International Zeolite Association (IZA-SC) there are 176 known zeolite frame types [13] and more than 400 isotypic mineral types associated with them. The ICSD contains on the order of 1600 zeolite crystals reported with their mineral names rather than with the framework-type-codes. This database is systematically used for identification of compounds and is incorporated in the software of X-ray diffraction equipment.

Computational geometry analyses based on topological features of zeolite crystal structures is an emerging research area and the literature on this topic is fairly limited. Most geometric methodologies are aimed at facilitating design and structure optimization of hypothetical zeolites, e.g., by accommodating distortions of the TO4 units into novel frameworks [14]. A statistical geometry approach for studying the structure of molecular systems with Voronoi’s partitioning of the occupied space [15] was pioneered by Bernal in the late 1950s [16]. This approach was generalized for the study of simple liquids by defining a set of parameters describing the packing of Voronoi polyhedra [17]. Statistical geometry was further applied for studying packing and volume distributions in proteins [18], [19], [20]. A topological dual to the Voronoi partitioning of space is the Delaunay tessellation [21]. This tessellation is useful for providing an exact identification of neighboring points in molecular systems represented by extended sets of points in space. Delaunay tessellation has been applied for studying model and simple liquids [22], [23], water and aqueous solutions [24], [25], [26]. The Delaunay tessellation is particularly efficient for describing the water structure, where a tetrahedral network of molecules is present in the first hydration shell [25]. The mid-1990s brought applications of the Delaunay tessellation for identification of nearest neighbor residues in proteins, development of four-body statistical potentials [27]. Since then the approach has been widely used for analyzing protein structure in various applications [28].

In this study, we explore the use of the Delaunay space on zeolite crystals with the goal of developing a self contained topological tool that only requires the X-ray crystallographic information without relying on additional calculations such as the sequence of coordination numbers or energetic studies. From a topological perspective, zeolite crystals are challenging materials with huge unit cells containing a multitude of elements and the solid-state framework of TO4 units. The crystal complexity is increased by the fractional occupation probability of multiple sites. For example, a lattice site might have a probability of 0.7 of being occupied by a silicon atom and a probability of 0.3 of being occupied by an aluminum atom. This site fractional occupancy leads to overlapping of topological spaces. Studies of zeolite regularized frameworks [29], [30] where all sites have occupancy probability of one, lose the compositional characteristics of the zeolite crystal and decrease the system complexity.

The tool developed in this study is based on Delaunay tessellation and contains information of overlapping topological spaces. Therefore, the compositional properties of the compound are taken into consideration. Furthermore, this study demonstrates that application of the Delaunay tessellation of zeolite crystals allows accurate prediction and classification by mineral types and framework types. The paper is organized as follows: Section 2 gives results on the statistics obtained on the basis of the Delaunay tessellation of 220 zeolites. The crystal data for these zeolites comes from the ICSD. Section 3 elaborates on the descriptors used in the machine learning approach and provides the classification results obtained from the random forests algorithm framed into our zeolite-structure-predictor (ZSP) model. The classification allows for assignment to the zeolites of 22 mineral names or 13 framework types. Section 4 contains several predictions using the newly developed model and a discussion on the efficiency and accuracy of the undertaken approach. The paper is concluded in Section 5.

Section snippets

Structural information based on Delaunay graphical representation

In statistical geometry methods the nearest neighbor atoms, or groups of atoms, are identified by statistical analysis of irregular polyhedra obtained as a result of a specific tessellation in three-dimensional space. Voronoi tessellation partitions the space into convex polytopes called Voronoi polyhedra [15]. In a 3D molecular system a Voronoi polyhedron has the smallest volume enclosing an atom such that all points inside such cell are closer to that atom than to any other atom in the

Zeolite-structure-predictor

Recently, ensemble learning has drawn interest to the areas of chemistry and materials science. These are methods that generate many classifiers, which are subsequently aggregated. The most common methods are bagging [36] and boosting of classification trees [37]. In bagging, trees are independent and built by a bootstrap sample of the available data set. On the other hand in boosting, successive trees are weighted in such a way that prediction is achieved through a weighted vote (based on

Predictions from the ZSP model

It is not sufficient to just provide a measure of model robustness without looking at prediction capabilities. The confusion matrix can be used to evaluate performance and draw predictions. When the ZSP contained 10 zeolites in each class, the diagonal elements of the confusion matrix would be 10 for a perfect classification. Table 5, Table 6 give the confusion matrix corresponding to the ZSP with 168 attributes and data sets containing 220 zeolites (mineral name classification) and 130

Conclusion

Delaunay tessellation as a tool is shown to be a valid descriptive tool for framework classification when using representative structures. This paper shows that the tessellation can be used on structures that have fractional site occupancies, are highly symmetric, and have very large primitive cells. We describe the methodology used for comparing crystal structures within an equal-volume Delaunay space and show the information content that this graphical analysis has for classifying zeolite

Acknowledgments

This work was supported by the National Science Foundation grant CHE-0626111. Authors acknowledge the NIST Standard Reference Data Program for making available the ICSD zeolite data set. We acknowledge interesting discussions with Prof. John Schreifels of GMU. EBB acknowledges useful discussions with Dr. V. Karen of NIST.

References (39)

  • R. Burbidge et al.

    Comput. Chem.

    (2001)
  • M.M.J. Treacy et al.

    Micropor. Mesopor. Mater.

    (2004)
  • S.A. Wells et al.

    Micropor. Mesopor. Mater.

    (2006)
  • F.M. Richards

    J. Mol. Biol.

    (1974)
  • J.L. Finney

    J. Mol. Biol.

    (1975)
  • K. Sugihara et al.

    Inform. Process. Lett.

    (1995)
  • W. Duch et al.

    Curr. Pharm. Des.

    (2007)
  • D.M. Hawkins et al.

    Act. Relationships

    (1997)
  • J. Devillers

    Genetic Algorithms in Molecular Modeling

    (1999)
  • J. Devillers

    Neural Networks and Drug Design

    (1999)
  • M. Hey

    Mineralog. Mag.

    (1930)
  • J.V. Smith

    Chem. Rev.

    (1988)
  • D.S. Coombs

    Mineralog. Mag.

    (1998)
  • T. Armbruster et al.

    Rev. Mineral. Geochem.

    (2001)
  • D.J. Earl et al.

    Ind. Eng. Chem. Res.

    (2006)
  • A. Le Bail

    J. Solid State Phenom.

    (2007)
  • Ch. Baerlocher et al.

    Atlas of Zeolite Framework Types

    (2007)
  • G.F. Voronoi

    J. Reine Angew. Math.

    (1908)
  • J.D. Bernal

    Nature

    (1959)
  • Cited by (56)

    • Biomedical applications of zeolite-based materials: A review

      2020, Materials Science and Engineering C
      Citation Excerpt :

      Each one of these structures includes one or more zeolites; for example, Faujasite (FAU) structure contains two types of zeolites: zeolites Y and X, each of them has their own distinct physical and chemical properties. Therefore, the application of synthetic or natural zeolites will be dependent on the desired physicochemical properties, which may be dependent on the function of A) crystalline structure B) chemical composition of zeolites [4]. Zeolites are edible, biocompatible, and possibly non-toxic substances and they have a few unique features like molecular sieve structure, ionic exchangeability, and water absorbent.

    • Fabrication and properties of βTCP/Zeolite/Gelatin scaffold as developed scaffold in bone regeneration: in vitro and in vivo studies

      2020, Biocybernetics and Biomedical Engineering
      Citation Excerpt :

      The combination of gelatin with nano-hydroxyapatite, β‐tricalcium phosphate, chitosan has been widely researched in bone regeneration due to suitability and biocompatibility [11]. Zeolites are porous crystalline solids with a regular structure [12]. Zeolites are divided into two groups of synthetic and natural zeolites [13,14].

    View all citing articles on Scopus
    View full text