Skip to main content
Top

2018 | OriginalPaper | Chapter

7. Principles of Data Science: Advanced

Author : Jeremy David Curuksu

Published in: Data Driven

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter covers advanced analytics principles and applications. Let us first back up on our objectives and progress so far. In Chap. 6, we defined the key concepts underlying the mathematical science of data analysis. The discussion was structured in two categories: descriptive and inferential statistics. In the context of a data science project, these two categories may be referred to as unsupervised and supervised modeling respectively. These two categories are ubiquitous because the objective of a data science project is always (bear with me please) to better understand some data or else to predict something. Chapter 7 thus again follows this binary structure, although some topics (e.g. computer simulation, Sect. 7.3) may be used to collect and understand data, forecast events, or both.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Note that in this example the two variables are so correlated that one could have ignored the other variable from the beginning and thereby bypass the process of coordinate mapping altogether. Coordinate mapping becomes useful when the trend is not just a 50/50 contribution of two variables (which corresponds to a 45° correlation line in the scatter plot) but some more subtle relationship where maximum variance lies along an asymmetrically weighted combination of the two variables.
 
2
This assertion is only true under certain conditions, but for most real-world applications where observations are made across a finite set of variables in a population, these conditions are fulfilled.
 
3
The word Eigen comes from the German for characteristic.
 
4
The equation used to find eigenvectors and eigenvalues for a given matrix when they exist is det(A − λI) = 0. Not surprisingly, it is referred to as the matrix’s characteristic equation.
 
5
Quantum theory teaches us that everything in the universe is periodic! But describing the dynamics of any system except small molecules at a quantum level would require several years of computations even on last-generation supercomputers. And this is assuming we would know how to decompose the signal into a nearly exhaustive set of factors, which we generally don’t. Hence an harmonic analysis in practice requires periodic features to be detected at a scale directly relevant to the analysis in question; this defines macroscopic in all circumstances. For example, a survey of customer behaviors may apply Fourier analysis if a periodic feature is detected in a behavior or any factor believed to influence a behavior.
 
6
k can be fixed in advance or refined recursively based on a distance metric threshold.
 
7
It is standard practice in calculus to truncate a Taylor expansion after second order derivative because higher order term tend to be insignificant.
 
8
The idea that stable states are exponentially more likely than unstable states extends much beyond the confines of physical systems. This Boltzmann distribution law has a different name in different fields, such as the Gibbs Measure, the Log-linear response or the Exponential response (to name a few), but the concept is always the same: There is an exponential relationship between the notions of probability and stability.
 
9
The general concept of inference was introduced in Sect. 6.1.2 – Long story short: it indicates a transition from a descriptive to a probabilistic point of view.
 
10
Categorical variables also include non-ordinal numbers, i.e. numbers that don’t follow a special order and instead correspond to different labels. For example, to predict whether customers will choose a product identified as #5 or one identified as #16, the response is presented by two numbers (5 and 16) but they define a qualitative variable since there is no unit of measure nor zero value for this variable.
 
11
Bootstrap refers to successive sampling of a same dataset by leaving out some part of the dataset until convergence of the estimated quantity, in this case a decision tree.
 
Literature
177.
go back to reference Abdi H, Williams LJ (2010) Principal component analysis, Wiley interdisciplinary reviews. Comput Stat 2(4):433–459CrossRef Abdi H, Williams LJ (2010) Principal component analysis, Wiley interdisciplinary reviews. Comput Stat 2(4):433–459CrossRef
178.
go back to reference Dyke P (2001) An introduction to Laplace transforms and Fourier series. Springer, LondonCrossRef Dyke P (2001) An introduction to Laplace transforms and Fourier series. Springer, LondonCrossRef
179.
go back to reference Pereyra M, Ward L (2012) Harmonic analysis: from Fourier to wavelets, American Mathematical Society. Institute for Advanced Study, Salt Lake CityCrossRef Pereyra M, Ward L (2012) Harmonic analysis: from Fourier to wavelets, American Mathematical Society. Institute for Advanced Study, Salt Lake CityCrossRef
180.
go back to reference Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. Taylor and Francis Group, Boca Raton Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. Taylor and Francis Group, Boca Raton
181.
go back to reference Box G, Jenkins G (1970) Time series analysis: forecasting and control. Holden-Day, San Francisco Box G, Jenkins G (1970) Time series analysis: forecasting and control. Holden-Day, San Francisco
182.
go back to reference Peter Ď, Silvia P (2012) ARIMA vs. ARIMAX - Which approach is better to analyze and forecast macroeconomic time series. In: Proceedings of 30th international conference mathematical methods in economics, pp 136–140 Peter Ď, Silvia P (2012) ARIMA vs. ARIMAX - Which approach is better to analyze and forecast macroeconomic time series. In: Proceedings of 30th international conference mathematical methods in economics, pp 136–140
183.
go back to reference Chen R, Schulz R, Stephan S (2003) Multiplicative SARIMA models. In: Computer-aided introduction to econometrics. Springer, Berlin, pp 225–254CrossRef Chen R, Schulz R, Stephan S (2003) Multiplicative SARIMA models. In: Computer-aided introduction to econometrics. Springer, Berlin, pp 225–254CrossRef
184.
go back to reference Kuznetsov V (2016) Theory and algorithms for forecasting non-stationary time series, Doctoral dissertation, New York University Kuznetsov V (2016) Theory and algorithms for forecasting non-stationary time series, Doctoral dissertation, New York University
185.
go back to reference Wilmott P (2007) Paul Wilmott introduces quantitative finance. Wiley, Chichester Wilmott P (2007) Paul Wilmott introduces quantitative finance. Wiley, Chichester
186.
go back to reference Hull JC (2006) Options, futures, and other derivatives. Pearson, Upper Saddle River Hull JC (2006) Options, futures, and other derivatives. Pearson, Upper Saddle River
187.
go back to reference Torben A, Chung H, Sørensen B (1999) Efficient method of moments estimation of a stochastic volatility model: a Monte Carlo study. J Econ 91:61–87CrossRef Torben A, Chung H, Sørensen B (1999) Efficient method of moments estimation of a stochastic volatility model: a Monte Carlo study. J Econ 91:61–87CrossRef
188.
go back to reference Rubinstein R, Marcus R (1985) Efficiency of multivariate control variates in Monte Carlo simulation. Oper Res 33:661–677CrossRef Rubinstein R, Marcus R (1985) Efficiency of multivariate control variates in Monte Carlo simulation. Oper Res 33:661–677CrossRef
189.
go back to reference Hammersley J, Morton K (1956) A new Monte Carlo technique: antithetic variates. In: Mathematical proceedings of the Cambridge philosophical society, vol 52, pp 449–475 Hammersley J, Morton K (1956) A new Monte Carlo technique: antithetic variates. In: Mathematical proceedings of the Cambridge philosophical society, vol 52, pp 449–475
190.
go back to reference Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett 314:141–151CrossRef Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett 314:141–151CrossRef
191.
go back to reference Hill TL (1956) Statistical mechanics: principles and selected applications. McGraw Hill, New York Hill TL (1956) Statistical mechanics: principles and selected applications. McGraw Hill, New York
192.
go back to reference Brin M, Stuck G (2002) Introduction to dynamical systems. Cambridge University Press, CambridgeCrossRef Brin M, Stuck G (2002) Introduction to dynamical systems. Cambridge University Press, CambridgeCrossRef
193.
go back to reference Karplus M, McCammon JA (2002) Molecular dynamics simulations of biomolecules. Nature 9(9):646–652 Karplus M, McCammon JA (2002) Molecular dynamics simulations of biomolecules. Nature 9(9):646–652
194.
go back to reference Case D (1994) Normal mode analysis of protein dynamics. Curr Opin Struct Biol 4(2):285–290CrossRef Case D (1994) Normal mode analysis of protein dynamics. Curr Opin Struct Biol 4(2):285–290CrossRef
195.
go back to reference Alpaydin E (2014) Introduction to machine learning. MIT Press, Boston Alpaydin E (2014) Introduction to machine learning. MIT Press, Boston
197.
go back to reference Harrell F (2013) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York Harrell F (2013) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York
200.
go back to reference Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556
201.
go back to reference Cho K, Van Merriënboer B, et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 1406.1078 Cho K, Van Merriënboer B, et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 1406.1078
202.
go back to reference Mnih V, Kavukcuoglu K et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef Mnih V, Kavukcuoglu K et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef
203.
go back to reference Kumar A, Irsoy O et al (2016) Ask me anything: dynamic memory networks for natural language processing. In: international conference on machine learning, pp 1378–1387 Kumar A, Irsoy O et al (2016) Ask me anything: dynamic memory networks for natural language processing. In: international conference on machine learning, pp 1378–1387
204.
go back to reference LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef
205.
go back to reference Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRef Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRef
206.
go back to reference Brown C, Davis H (2006) Receiver operating characteristic curves and related decision measures: a tutorial. Chemom Intell Lab Syst 80:24–38CrossRef Brown C, Davis H (2006) Receiver operating characteristic curves and related decision measures: a tutorial. Chemom Intell Lab Syst 80:24–38CrossRef
207.
go back to reference Gero J, Udo K (2007) An ontological model of emergent design in software engineering, ICED’07 Gero J, Udo K (2007) An ontological model of emergent design in software engineering, ICED’07
208.
go back to reference Hundt M, Mollin S, Pfenninger S (2017) The changing English language: psycholinguistic perspectives. Cambridge University Press, CambridgeCrossRef Hundt M, Mollin S, Pfenninger S (2017) The changing English language: psycholinguistic perspectives. Cambridge University Press, CambridgeCrossRef
210.
go back to reference Bird S (2006) NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive presentation sessions, pp 69–72, Association for Computational Linguistics Bird S (2006) NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive presentation sessions, pp 69–72, Association for Computational Linguistics
212.
go back to reference Google’s Software Beats Human Go Champion (2016) Wall Street Journal, Mar 9 Google’s Software Beats Human Go Champion (2016) Wall Street Journal, Mar 9
213.
go back to reference Larousserie D, Tual M (2016) Première défaite d’un professionnel du go contre une intelligence artificielle, Le Monde, Jan 27 Larousserie D, Tual M (2016) Première défaite d’un professionnel du go contre une intelligence artificielle, Le Monde, Jan 27
Metadata
Title
Principles of Data Science: Advanced
Author
Jeremy David Curuksu
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-70229-2_7