Skip to main content
Top

2020 | OriginalPaper | Chapter

A Study on Distance Based Representation of Molecules for Statistical Learning

Authors : Abdul Wasee, Rajib Ghosh Chaudhuri, Prakash Kumar, Eldhose Iype

Published in: New Trends in Computational Vision and Bio-inspired Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Statistical learning of molecular structure properties is gaining interests among researchers. These methods are faster compared to traditional QM based methods. In addition, the physical properties can be incorporated as feature sets and a properly trained model can predict the desired properties of a molecular system. For this, a number of machine learning regressors are used to predict molecular energies of Si − n (n = 1, 2, ⋯25) clusters, water, methane and ethane molecules. For the Sin cluster, six out of eight regressors seem to predict the energies accurately. For other data sets, Decision Tree Regressor prediction resulted fairly good, in general, compared to others. However, through the addition of atomic charges as an extra feature improved the performance of other regressors, this did not cause any improvement for the Decision Tree regressor. Since calculating atomic charges in itself is an expensive task, we summarize that decision tree regressor is suitable for predicting molecular properties compared to other regressors tested here.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Christopher M Bishop. Pattern Recognition and Machine Learning, volume 4. 2006. Christopher M Bishop. Pattern Recognition and Machine Learning, volume 4. 2006.
2.
go back to reference Abdellaziz Doghmane, Linda Achou, and Zahia Hadjoub. Determination of an analytical relation for binding energy dependence on small size silicon nanoclusters (nSi ≤ 10 at.). Journal of Optoelectronics and Advanced Materials, 18(7–8):685–690, 2016. Abdellaziz Doghmane, Linda Achou, and Zahia Hadjoub. Determination of an analytical relation for binding energy dependence on small size silicon nanoclusters (nSi ≤ 10 at.). Journal of Optoelectronics and Advanced Materials, 18(7–8):685–690, 2016.
3.
go back to reference Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification, 2001. Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification, 2001.
6.
go back to reference Martin a Fischler and Robert C Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 24(6):381–395, 1981. Martin a Fischler and Robert C Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 24(6):381–395, 1981.
7.
go back to reference Célia Fonseca Guerra, J. G. Snijders, G. Te Velde, and Evert Jan Baerends. Towards an order-N DFT method. Theoretical Chemistry Accounts, 99:391–403, 1998. Célia Fonseca Guerra, J. G. Snijders, G. Te Velde, and Evert Jan Baerends. Towards an order-N DFT method. Theoretical Chemistry Accounts, 99:391–403, 1998.
8.
go back to reference Yoel Haitovsky and Yohanan Wax. Generalized ridge regression, least squares with stochastic prior information, and Bayesian estimators. Applied Mathematics and Computation, 7(2):125–154, 1980.MathSciNetCrossRef Yoel Haitovsky and Yohanan Wax. Generalized ridge regression, least squares with stochastic prior information, and Bayesian estimators. Applied Mathematics and Computation, 7(2):125–154, 1980.MathSciNetCrossRef
9.
go back to reference Douglas M. Hawkins, Subhash C. Basak, and Xiaofang Shi. QSAR with Few Compounds and Many Features. Journal of Chemical Information and Computer Sciences, 41(3):663–670, 2001.CrossRef Douglas M. Hawkins, Subhash C. Basak, and Xiaofang Shi. QSAR with Few Compounds and Many Features. Journal of Chemical Information and Computer Sciences, 41(3):663–670, 2001.CrossRef
10.
go back to reference P J Huber. Robust Statistics. Statistics, 60(1986):1–11, 2004. P J Huber. Robust Statistics. Statistics, 60(1986):1–11, 2004.
11.
go back to reference David J C MacKay. Information Theory, Inference, and Learning Algorithms David J.C. MacKay, volume 100. 2005. David J C MacKay. Information Theory, Inference, and Learning Algorithms David J.C. MacKay, volume 100. 2005.
12.
go back to reference Jan Mielniczuk and Joanna Tyrcha. Consistency of multilayer perceptron regression estimators. Neural Networks, 6(7):1019–1022, 1993.CrossRef Jan Mielniczuk and Joanna Tyrcha. Consistency of multilayer perceptron regression estimators. Neural Networks, 6(7):1019–1022, 1993.CrossRef
13.
go back to reference Tom M Mitchell. Machine Learning. Number 1. 1997. Tom M Mitchell. Machine Learning. Number 1. 1997.
14.
go back to reference Gregoire Montavon, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia, Katja Hansen, Alexandre Tkatchenko, Klaus Robert Muller, and O. Anatole Von Lilienfeld. Machine learning of molecular electronic properties in chemical compound space. New Journal of Physics, 15, 2013. Gregoire Montavon, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia, Katja Hansen, Alexandre Tkatchenko, Klaus Robert Muller, and O. Anatole Von Lilienfeld. Machine learning of molecular electronic properties in chemical compound space. New Journal of Physics, 15, 2013.
15.
go back to reference Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. Introduction to Linear Regression Analysis (5th ed.). Technometrics, 49(December):232–233, 2011. Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. Introduction to Linear Regression Analysis (5th ed.). Technometrics, 49(December):232–233, 2011.
16.
go back to reference Leena Pasanen, Lasse Holmström, and Mikko J. Sillanpää. Bayesian LASSO, scale space and decision making in association genetics. PLoS ONE, 10(4):1–26, 2015.CrossRef Leena Pasanen, Lasse Holmström, and Mikko J. Sillanpää. Bayesian LASSO, scale space and decision making in association genetics. PLoS ONE, 10(4):1–26, 2015.CrossRef
17.
go back to reference John P Perdew and Yue Wang. Accurate and simple analytical representation of the electron-gas correlation energy. 45(23):244–249, 1992. John P Perdew and Yue Wang. Accurate and simple analytical representation of the electron-gas correlation energy. 45(23):244–249, 1992.
18.
go back to reference J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1(1):81–106, 1986.CrossRef J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1(1):81–106, 1986.CrossRef
19.
go back to reference Rahul Raguram, Ondrej Chum, Marc Pollefeys, Jiri Matas, and Jan Michael Frahm. USAC: A universal framework for random sample consensus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):2022–2038, 2013. Rahul Raguram, Ondrej Chum, Marc Pollefeys, Jiri Matas, and Jan Michael Frahm. USAC: A universal framework for random sample consensus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):2022–2038, 2013.
20.
go back to reference Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole von Lilienfeld. Big Data meets Quantum Chemistry Approximations: The Delta-Machine Learning Approach. Journal of Chemical Theory and Computation, 2015. Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole von Lilienfeld. Big Data meets Quantum Chemistry Approximations: The Delta-Machine Learning Approach. Journal of Chemical Theory and Computation, 2015.
21.
go back to reference Raghunathan Ramakrishnan, Mia Hartmann, Enrico Tapavicza, and O. Anatole Von Lilienfeld. Electronic spectra from TDDFT and machine learning in chemical space. Journal of Chemical Physics, 143(8), 2015. Raghunathan Ramakrishnan, Mia Hartmann, Enrico Tapavicza, and O. Anatole Von Lilienfeld. Electronic spectra from TDDFT and machine learning in chemical space. Journal of Chemical Physics, 143(8), 2015.
22.
go back to reference David E Rumelhart, Geoffrey E Hinton, and R J Williams. Learning Internal Representations by Error Propagation, 1986. David E Rumelhart, Geoffrey E Hinton, and R J Williams. Learning Internal Representations by Error Propagation, 1986.
23.
go back to reference I. Sammut, Claude and Webb, Geoffrey. Encyclopedia of Machine Learning and Data Mining. Springer, 2 edition, 2017. I. Sammut, Claude and Webb, Geoffrey. Encyclopedia of Machine Learning and Data Mining. Springer, 2 edition, 2017.
24.
go back to reference G. te Velde, F. M. Bickelhaupt, E. J. Baerends, C. Fonseca Guerra, S. J.A. van Gisbergen, J. G. Snijders, and T. Ziegler. Chemistry with ADF. Journal of Computational Chemistry, 22(9):931–967, 2001.CrossRef G. te Velde, F. M. Bickelhaupt, E. J. Baerends, C. Fonseca Guerra, S. J.A. van Gisbergen, J. G. Snijders, and T. Ziegler. Chemistry with ADF. Journal of Computational Chemistry, 22(9):931–967, 2001.CrossRef
25.
go back to reference E. Van Lenthe and E. J. Baerends. Optimized Slater-type basis sets for the elements 1–118. Journal of Computational Chemistry, 24(9):1142–1156, 2003.CrossRef E. Van Lenthe and E. J. Baerends. Optimized Slater-type basis sets for the elements 1–118. Journal of Computational Chemistry, 24(9):1142–1156, 2003.CrossRef
26.
go back to reference O. Anatole Von Lilienfeld. First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties. International Journal of Quantum Chemistry, 113(12):1676–1689, 2013.CrossRef O. Anatole Von Lilienfeld. First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties. International Journal of Quantum Chemistry, 113(12):1676–1689, 2013.CrossRef
27.
go back to reference Yan Xin and Xiao Gang Su. Linear Regression Analysis: Theory and Computing. World Scientific Publishing Co., Inc., River Edge, NJ, USA. Yan Xin and Xiao Gang Su. Linear Regression Analysis: Theory and Computing. World Scientific Publishing Co., Inc., River Edge, NJ, USA.
Metadata
Title
A Study on Distance Based Representation of Molecules for Statistical Learning
Authors
Abdul Wasee
Rajib Ghosh Chaudhuri
Prakash Kumar
Eldhose Iype
Copyright Year
2020
Publisher
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-030-41862-5_56

Premium Partner