Skip to main content

2018 | OriginalPaper | Buchkapitel

6. Principles of Data Science: Primer

verfasst von : Jeremy David Curuksu

Erschienen in: Data Driven

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Let us face it. Statistics and mathematics deter almost everyone except the ones who choose to specialize in it. If you kept reading and reached this far in the book you are probably now considering skipping the chapters on Data Science and moving on to the next on Strategy because, well, it sounds more exciting. Thus, let us start this chapter on statistics by a simple example that illustrates why it is worth reading and why consultants may increasingly use mathematics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
One may recommend “Naked Statistics” from Charles Wheelan [89], which introduces the overall field of statistics in a simple and humoristic way …technical expertise not required.
 
2
The software-hardware interface defines the field of Robotics as an application of Cybernetics, a field invented by the late Norbert Wiener and from where Machine Learning emerged as a subfield.
 
3
Pearson correlation is the most common in loose usage.
 
4
By general purpose, I mean the assumption of linear relationship between variables, which is often what is meant by a “simple” model in mathematics.
 
5
Eq. 6.5 is formally the divergence of p2 from p1. An unbiased degree of association according to Kullback and Leibler [157] is obtained by taking the sum of each one-sided divergence: D(p 1 ,p 2 ) + D(p 2 ,p 1 ).
 
6
All 1-dimentional values in mathematics are referred to as scalars; multi-dimensional objects may bear different names, most common of which are vectors, matrices and tensors.
 
7
Hyperspace is the name given to a space made of more than three dimensions (i.e. three variables). A plane that lies in a hyperspace is defined by more than two vectors, and called a hyperplane. It does not have a physical representation in our 3D world. The way scientists present “hyper-“objects such as hyperplanes is by presenting consecutive 2D planes along different values of the 4th variable, the 5th variable, etc. This is why the use of functions, matrices and tensors is strictly needed to handle computations in multivariable spaces.
 
8
As mentioned in Sect. 6.1, the standard error is the standard deviation of the means of different sub-samples drawn from the original sample or population
 
9
The 80/20 rule, or Pareto principle, is a principle commonly used in business and economics that states that 80% of a problem stem from only 20% of its causes. It was first suggested by the late Joseph Juran, one of the most prominent management consultants of the twentieth century.
 
Literatur
59.
Zurück zum Zitat Sarkar et al (2011) Translational bioinformatics: linking knowledge across biological and clinical realms. J Am Med Inform Assoc 18:354–357CrossRef Sarkar et al (2011) Translational bioinformatics: linking knowledge across biological and clinical realms. J Am Med Inform Assoc 18:354–357CrossRef
65.
89.
Zurück zum Zitat Siegel E (2013) Predictive analytics: the power to predict who will click, buy, lie, or die. Wiley, Hoboken Siegel E (2013) Predictive analytics: the power to predict who will click, buy, lie, or die. Wiley, Hoboken
91.
Zurück zum Zitat Wheelan C (2013) Naked statistics. Norton, New York Wheelan C (2013) Naked statistics. Norton, New York
154.
Zurück zum Zitat Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2):1137–1145 Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2):1137–1145
155.
Zurück zum Zitat Lee Rodgers J, Nicewander WA (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42(1):59–66CrossRef Lee Rodgers J, Nicewander WA (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42(1):59–66CrossRef
156.
Zurück zum Zitat Cover TM, Thomas JA (2012) Elements of information theory. Wiley, New York Cover TM, Thomas JA (2012) Elements of information theory. Wiley, New York
157.
Zurück zum Zitat Kullback S (1959) Information theory and statistics. Wiley, New York Kullback S (1959) Information theory and statistics. Wiley, New York
158.
Zurück zum Zitat Gower JC (1985) Properties of Euclidean and non-Euclidean distance matrices. Linear Algebra Appl 67:81–97CrossRef Gower JC (1985) Properties of Euclidean and non-Euclidean distance matrices. Linear Algebra Appl 67:81–97CrossRef
159.
Zurück zum Zitat Legendre A (1805) Nouvelles méthodes pour la détermination des orbites des comètes. Didot, Paris Legendre A (1805) Nouvelles méthodes pour la détermination des orbites des comètes. Didot, Paris
160.
Zurück zum Zitat Ozer DJ (1985) Correlation and the coefficient of determination. Psychol Bull 97(2):307CrossRef Ozer DJ (1985) Correlation and the coefficient of determination. Psychol Bull 97(2):307CrossRef
161.
Zurück zum Zitat Nagelkerke NJ (1991) A note on a general definition of the coefficient of determination. Biometrika 78(3):691–692CrossRef Nagelkerke NJ (1991) A note on a general definition of the coefficient of determination. Biometrika 78(3):691–692CrossRef
162.
Zurück zum Zitat Aiken LS, West SG, Reno RR (1991) Multiple regression: testing and interpreting interactions. Sage, London Aiken LS, West SG, Reno RR (1991) Multiple regression: testing and interpreting interactions. Sage, London
163.
Zurück zum Zitat Gibbons MR (1982) Multivariate tests of financial models: a new approach. J Financ Econ 10(1):3–27CrossRef Gibbons MR (1982) Multivariate tests of financial models: a new approach. J Financ Econ 10(1):3–27CrossRef
164.
Zurück zum Zitat Berger JO (2013) Statistical decision theory and Bayesian analysis. Springer, New York Berger JO (2013) Statistical decision theory and Bayesian analysis. Springer, New York
166.
Zurück zum Zitat Ott RL, Longnecker M (2001) An introduction to statistical methods and data analysis. Cengage Learning, Belmont Ott RL, Longnecker M (2001) An introduction to statistical methods and data analysis. Cengage Learning, Belmont
168.
169.
Zurück zum Zitat Goodman SN (1999) Toward evidence-based medical statistics: the p-value fallacy. Ann Intern Med 130(12):995–1004CrossRef Goodman SN (1999) Toward evidence-based medical statistics: the p-value fallacy. Ann Intern Med 130(12):995–1004CrossRef
170.
Zurück zum Zitat Lyapunov A (1901) Nouvelle forme du théorème sur la limite de probabilité. Mémoires de l'Académie de St-Petersbourg 12 Lyapunov A (1901) Nouvelle forme du théorème sur la limite de probabilité. Mémoires de l'Académie de St-Petersbourg 12
171.
Zurück zum Zitat Baesens B (2014) Analytics in a big data world: the essential guide to data science and its applications. Wiley, New York Baesens B (2014) Analytics in a big data world: the essential guide to data science and its applications. Wiley, New York
172.
Zurück zum Zitat Curuksu J (2012) Adaptive conformational sampling based on replicas. J Math Biol 64:917–931CrossRef Curuksu J (2012) Adaptive conformational sampling based on replicas. J Math Biol 64:917–931CrossRef
173.
Zurück zum Zitat Pidd M (1998) Computer simulation in management science. Wiley, Chichester Pidd M (1998) Computer simulation in management science. Wiley, Chichester
175.
Zurück zum Zitat Becla J, Lim KT, Wang DL (2010) Report from the 3rd workshop on extremely large databases. Data Sci J 8:MR1–MR16CrossRef Becla J, Lim KT, Wang DL (2010) Report from the 3rd workshop on extremely large databases. Data Sci J 8:MR1–MR16CrossRef
176.
Zurück zum Zitat Treinen W (2014) Big data value strategic research and innovation agenda. European Commission Press, New York Treinen W (2014) Big data value strategic research and innovation agenda. European Commission Press, New York
Metadaten
Titel
Principles of Data Science: Primer
verfasst von
Jeremy David Curuksu
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-70229-2_6

Premium Partner