Skip to main content

2018 | OriginalPaper | Buchkapitel

7. Principles of Data Science: Advanced

verfasst von : Jeremy David Curuksu

Erschienen in: Data Driven

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter covers advanced analytics principles and applications. Let us first back up on our objectives and progress so far. In Chap. 6, we defined the key concepts underlying the mathematical science of data analysis. The discussion was structured in two categories: descriptive and inferential statistics. In the context of a data science project, these two categories may be referred to as unsupervised and supervised modeling respectively. These two categories are ubiquitous because the objective of a data science project is always (bear with me please) to better understand some data or else to predict something. Chapter 7 thus again follows this binary structure, although some topics (e.g. computer simulation, Sect. 7.3) may be used to collect and understand data, forecast events, or both.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that in this example the two variables are so correlated that one could have ignored the other variable from the beginning and thereby bypass the process of coordinate mapping altogether. Coordinate mapping becomes useful when the trend is not just a 50/50 contribution of two variables (which corresponds to a 45° correlation line in the scatter plot) but some more subtle relationship where maximum variance lies along an asymmetrically weighted combination of the two variables.
 
2
This assertion is only true under certain conditions, but for most real-world applications where observations are made across a finite set of variables in a population, these conditions are fulfilled.
 
3
The word Eigen comes from the German for characteristic.
 
4
The equation used to find eigenvectors and eigenvalues for a given matrix when they exist is det(A − λI) = 0. Not surprisingly, it is referred to as the matrix’s characteristic equation.
 
5
Quantum theory teaches us that everything in the universe is periodic! But describing the dynamics of any system except small molecules at a quantum level would require several years of computations even on last-generation supercomputers. And this is assuming we would know how to decompose the signal into a nearly exhaustive set of factors, which we generally don’t. Hence an harmonic analysis in practice requires periodic features to be detected at a scale directly relevant to the analysis in question; this defines macroscopic in all circumstances. For example, a survey of customer behaviors may apply Fourier analysis if a periodic feature is detected in a behavior or any factor believed to influence a behavior.
 
6
k can be fixed in advance or refined recursively based on a distance metric threshold.
 
7
It is standard practice in calculus to truncate a Taylor expansion after second order derivative because higher order term tend to be insignificant.
 
8
The idea that stable states are exponentially more likely than unstable states extends much beyond the confines of physical systems. This Boltzmann distribution law has a different name in different fields, such as the Gibbs Measure, the Log-linear response or the Exponential response (to name a few), but the concept is always the same: There is an exponential relationship between the notions of probability and stability.
 
9
The general concept of inference was introduced in Sect. 6.1.2 – Long story short: it indicates a transition from a descriptive to a probabilistic point of view.
 
10
Categorical variables also include non-ordinal numbers, i.e. numbers that don’t follow a special order and instead correspond to different labels. For example, to predict whether customers will choose a product identified as #5 or one identified as #16, the response is presented by two numbers (5 and 16) but they define a qualitative variable since there is no unit of measure nor zero value for this variable.
 
11
Bootstrap refers to successive sampling of a same dataset by leaving out some part of the dataset until convergence of the estimated quantity, in this case a decision tree.
 
Literatur
177.
Zurück zum Zitat Abdi H, Williams LJ (2010) Principal component analysis, Wiley interdisciplinary reviews. Comput Stat 2(4):433–459CrossRef Abdi H, Williams LJ (2010) Principal component analysis, Wiley interdisciplinary reviews. Comput Stat 2(4):433–459CrossRef
178.
Zurück zum Zitat Dyke P (2001) An introduction to Laplace transforms and Fourier series. Springer, LondonCrossRef Dyke P (2001) An introduction to Laplace transforms and Fourier series. Springer, LondonCrossRef
179.
Zurück zum Zitat Pereyra M, Ward L (2012) Harmonic analysis: from Fourier to wavelets, American Mathematical Society. Institute for Advanced Study, Salt Lake CityCrossRef Pereyra M, Ward L (2012) Harmonic analysis: from Fourier to wavelets, American Mathematical Society. Institute for Advanced Study, Salt Lake CityCrossRef
180.
Zurück zum Zitat Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. Taylor and Francis Group, Boca Raton Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. Taylor and Francis Group, Boca Raton
181.
Zurück zum Zitat Box G, Jenkins G (1970) Time series analysis: forecasting and control. Holden-Day, San Francisco Box G, Jenkins G (1970) Time series analysis: forecasting and control. Holden-Day, San Francisco
182.
Zurück zum Zitat Peter Ď, Silvia P (2012) ARIMA vs. ARIMAX - Which approach is better to analyze and forecast macroeconomic time series. In: Proceedings of 30th international conference mathematical methods in economics, pp 136–140 Peter Ď, Silvia P (2012) ARIMA vs. ARIMAX - Which approach is better to analyze and forecast macroeconomic time series. In: Proceedings of 30th international conference mathematical methods in economics, pp 136–140
183.
Zurück zum Zitat Chen R, Schulz R, Stephan S (2003) Multiplicative SARIMA models. In: Computer-aided introduction to econometrics. Springer, Berlin, pp 225–254CrossRef Chen R, Schulz R, Stephan S (2003) Multiplicative SARIMA models. In: Computer-aided introduction to econometrics. Springer, Berlin, pp 225–254CrossRef
184.
Zurück zum Zitat Kuznetsov V (2016) Theory and algorithms for forecasting non-stationary time series, Doctoral dissertation, New York University Kuznetsov V (2016) Theory and algorithms for forecasting non-stationary time series, Doctoral dissertation, New York University
185.
Zurück zum Zitat Wilmott P (2007) Paul Wilmott introduces quantitative finance. Wiley, Chichester Wilmott P (2007) Paul Wilmott introduces quantitative finance. Wiley, Chichester
186.
Zurück zum Zitat Hull JC (2006) Options, futures, and other derivatives. Pearson, Upper Saddle River Hull JC (2006) Options, futures, and other derivatives. Pearson, Upper Saddle River
187.
Zurück zum Zitat Torben A, Chung H, Sørensen B (1999) Efficient method of moments estimation of a stochastic volatility model: a Monte Carlo study. J Econ 91:61–87CrossRef Torben A, Chung H, Sørensen B (1999) Efficient method of moments estimation of a stochastic volatility model: a Monte Carlo study. J Econ 91:61–87CrossRef
188.
Zurück zum Zitat Rubinstein R, Marcus R (1985) Efficiency of multivariate control variates in Monte Carlo simulation. Oper Res 33:661–677CrossRef Rubinstein R, Marcus R (1985) Efficiency of multivariate control variates in Monte Carlo simulation. Oper Res 33:661–677CrossRef
189.
Zurück zum Zitat Hammersley J, Morton K (1956) A new Monte Carlo technique: antithetic variates. In: Mathematical proceedings of the Cambridge philosophical society, vol 52, pp 449–475 Hammersley J, Morton K (1956) A new Monte Carlo technique: antithetic variates. In: Mathematical proceedings of the Cambridge philosophical society, vol 52, pp 449–475
190.
Zurück zum Zitat Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett 314:141–151CrossRef Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett 314:141–151CrossRef
191.
Zurück zum Zitat Hill TL (1956) Statistical mechanics: principles and selected applications. McGraw Hill, New York Hill TL (1956) Statistical mechanics: principles and selected applications. McGraw Hill, New York
192.
Zurück zum Zitat Brin M, Stuck G (2002) Introduction to dynamical systems. Cambridge University Press, CambridgeCrossRef Brin M, Stuck G (2002) Introduction to dynamical systems. Cambridge University Press, CambridgeCrossRef
193.
Zurück zum Zitat Karplus M, McCammon JA (2002) Molecular dynamics simulations of biomolecules. Nature 9(9):646–652 Karplus M, McCammon JA (2002) Molecular dynamics simulations of biomolecules. Nature 9(9):646–652
194.
Zurück zum Zitat Case D (1994) Normal mode analysis of protein dynamics. Curr Opin Struct Biol 4(2):285–290CrossRef Case D (1994) Normal mode analysis of protein dynamics. Curr Opin Struct Biol 4(2):285–290CrossRef
195.
Zurück zum Zitat Alpaydin E (2014) Introduction to machine learning. MIT Press, Boston Alpaydin E (2014) Introduction to machine learning. MIT Press, Boston
197.
Zurück zum Zitat Harrell F (2013) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York Harrell F (2013) Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York
198.
199.
200.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556
201.
Zurück zum Zitat Cho K, Van Merriënboer B, et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 1406.1078 Cho K, Van Merriënboer B, et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 1406.1078
202.
Zurück zum Zitat Mnih V, Kavukcuoglu K et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef Mnih V, Kavukcuoglu K et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef
203.
Zurück zum Zitat Kumar A, Irsoy O et al (2016) Ask me anything: dynamic memory networks for natural language processing. In: international conference on machine learning, pp 1378–1387 Kumar A, Irsoy O et al (2016) Ask me anything: dynamic memory networks for natural language processing. In: international conference on machine learning, pp 1378–1387
204.
Zurück zum Zitat LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef
205.
Zurück zum Zitat Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRef Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRef
206.
Zurück zum Zitat Brown C, Davis H (2006) Receiver operating characteristic curves and related decision measures: a tutorial. Chemom Intell Lab Syst 80:24–38CrossRef Brown C, Davis H (2006) Receiver operating characteristic curves and related decision measures: a tutorial. Chemom Intell Lab Syst 80:24–38CrossRef
207.
Zurück zum Zitat Gero J, Udo K (2007) An ontological model of emergent design in software engineering, ICED’07 Gero J, Udo K (2007) An ontological model of emergent design in software engineering, ICED’07
208.
Zurück zum Zitat Hundt M, Mollin S, Pfenninger S (2017) The changing English language: psycholinguistic perspectives. Cambridge University Press, CambridgeCrossRef Hundt M, Mollin S, Pfenninger S (2017) The changing English language: psycholinguistic perspectives. Cambridge University Press, CambridgeCrossRef
209.
210.
Zurück zum Zitat Bird S (2006) NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive presentation sessions, pp 69–72, Association for Computational Linguistics Bird S (2006) NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive presentation sessions, pp 69–72, Association for Computational Linguistics
212.
Zurück zum Zitat Google’s Software Beats Human Go Champion (2016) Wall Street Journal, Mar 9 Google’s Software Beats Human Go Champion (2016) Wall Street Journal, Mar 9
213.
Zurück zum Zitat Larousserie D, Tual M (2016) Première défaite d’un professionnel du go contre une intelligence artificielle, Le Monde, Jan 27 Larousserie D, Tual M (2016) Première défaite d’un professionnel du go contre une intelligence artificielle, Le Monde, Jan 27
Metadaten
Titel
Principles of Data Science: Advanced
verfasst von
Jeremy David Curuksu
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-70229-2_7

Premium Partner