Skip to main content
Erschienen in:
Buchtitelbild

2016 | OriginalPaper | Buchkapitel

A Machine Learning Perspective on Big Data Analysis

verfasst von : Nathalie Japkowicz, Jerzy Stefanowski

Erschienen in: Big Data Analysis: New Algorithms for a New Society

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter surveys the field of Big Data analysis from a machine learning perspective. In particular, it contrasts Big Data analysis with data mining, which is based on machine learning, reviews its achievements and discusses its impact on science and society. The chapter concludes with a summary of the book’s contributing chapters divided into problem-centric and domain-centric essays.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
While this application was originally considered a success, it subsequently obtained disappointing results and is now in the process of getting improved [4].
 
2
Please note that graphs were sometimes considered in traditional data mining (e.g., as structures of chemical compounds), but the graphs in question were of much smaller size than those considered today.
 
Literatur
1.
Zurück zum Zitat Abiteboul, S.: Querying semi-structured data. In: ICDT ’97 Proceedings of the 6th International Conference on Database Theory, pp. 1–18 (1997) Abiteboul, S.: Querying semi-structured data. In: ICDT ’97 Proceedings of the 6th International Conference on Database Theory, pp. 1–18 (1997)
2.
Zurück zum Zitat An interview with Michal Jordan—Why Big Data Could Be a Big Fail. IEEE Spectrum. (Posted by Lee Gomes, 20 Oct 2014) An interview with Michal Jordan—Why Big Data Could Be a Big Fail. IEEE Spectrum. (Posted by Lee Gomes, 20 Oct 2014)
3.
Zurück zum Zitat Anderson, C.: The end of Theory. The data deluge makes the scientific method obsolete, Wired Magazine, 16/07 (2008, June 23) Anderson, C.: The end of Theory. The data deluge makes the scientific method obsolete, Wired Magazine, 16/07 (2008, June 23)
5.
Zurück zum Zitat Azzara, M.: Big Data Ethics: Transparency, Privacy, and Identity. Blog cmo.com. (Retrieved 2015) Azzara, M.: Big Data Ethics: Transparency, Privacy, and Identity. Blog cmo.com. (Retrieved 2015)
6.
Zurück zum Zitat Barbaro, M., Zeller, Jr, T.: A Face Is Exposed for AOL Searcher No. 4417749. The New York Times Magazine. (August 9, 2006) Barbaro, M., Zeller, Jr, T.: A Face Is Exposed for AOL Searcher No. 4417749. The New York Times Magazine. (August 9, 2006)
7.
Zurück zum Zitat Barbier, G., Liu, H.: Data Mining in Social Media. In: Aggarwal, C. (eds.) Social Network Data Analytics, pp. 327–352. Kluwer Academic Publishers, Springer (2011) Barbier, G., Liu, H.: Data Mining in Social Media. In: Aggarwal, C. (eds.) Social Network Data Analytics, pp. 327–352. Kluwer Academic Publishers, Springer (2011)
8.
Zurück zum Zitat Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning. Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)CrossRef Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning. Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)CrossRef
11.
Zurück zum Zitat Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010) Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
13.
Zurück zum Zitat Boyd, D., Crawford, K.: Six provocations for Big Data. Presented at "A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society" Oxford Internet Institute, Sept 21 (2011) Boyd, D., Crawford, K.: Six provocations for Big Data. Presented at "A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society" Oxford Internet Institute, Sept 21 (2011)
14.
Zurück zum Zitat Boyd, D., Crawford, K.: Critical questions for big data. Inf. Commun. Soc. 15(5), 662–679 (2012)CrossRef Boyd, D., Crawford, K.: Critical questions for big data. Inf. Commun. Soc. 15(5), 662–679 (2012)CrossRef
15.
Zurück zum Zitat Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues and opportunities. In: Hong, B, et al. (eds.) DASFAA Workshops, Springer LNCS 7827, pp. 1–15 (2013) Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues and opportunities. In: Hong, B, et al. (eds.) DASFAA Workshops, Springer LNCS 7827, pp. 1–15 (2013)
16.
Zurück zum Zitat Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mobile New Appl. 19, 171–209 (2014)CrossRef Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mobile New Appl. 19, 171–209 (2014)CrossRef
17.
Zurück zum Zitat Dai, C., Lin, D., Bertino, E., Kantarcioglu, M.: An approach to evaluate data trustworthiness based on data provenance. In: Proceedings of the 5th VLDB Workshop on Secure Data Management, pp. 82– 98 (2008) Dai, C., Lin, D., Bertino, E., Kantarcioglu, M.: An approach to evaluate data trustworthiness based on data provenance. In: Proceedings of the 5th VLDB Workshop on Secure Data Management, pp. 82– 98 (2008)
18.
Zurück zum Zitat Davidson, S., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the SIGMOD’08 (2008) Davidson, S., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the SIGMOD’08 (2008)
19.
Zurück zum Zitat Davis, K.: Ethics of Big Data. Balancing Risk and Innovation. O’Reily (2012) Davis, K.: Ethics of Big Data. Balancing Risk and Innovation. O’Reily (2012)
20.
Zurück zum Zitat De Mauro, A., Greco, M., Grimaldi, M.: What is big data? a consensual definition and a review of key research topics. In: Proceedings of 4th Conference on Integrated Information (2014) De Mauro, A., Greco, M., Grimaldi, M.: What is big data? a consensual definition and a review of key research topics. In: Proceedings of 4th Conference on Integrated Information (2014)
21.
Zurück zum Zitat Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000) Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
22.
Zurück zum Zitat Einav, L., Levin, J.D.: The data revolution and economic analysis. National Bureau of Economic Research Working Paper, no. 19035 (2013) Einav, L., Levin, J.D.: The data revolution and economic analysis. National Bureau of Economic Research Working Paper, no. 19035 (2013)
23.
Zurück zum Zitat Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. SIGKDD Explor. Newsl. 12(2), 1–5 (2013) Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. SIGKDD Explor. Newsl. 12(2), 1–5 (2013)
24.
Zurück zum Zitat Frontiers in Massive Data Analysis. The National Research Council, the National Academy of Sciences, USA (2013) Frontiers in Massive Data Analysis. The National Research Council, the National Academy of Sciences, USA (2013)
26.
Zurück zum Zitat Gaber, M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM Sigmod Record 34(2), 18–26 (2005)CrossRefMATH Gaber, M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM Sigmod Record 34(2), 18–26 (2005)CrossRefMATH
27.
Zurück zum Zitat Gama, J.: Knowledge Discovery from Data Streams, 1st ed. Hall/CRC, (2010) Gama, J.: Knowledge Discovery from Data Streams, 1st ed. Hall/CRC, (2010)
28.
Zurück zum Zitat Ghoting, A., Kambadur, P., Pednault, E., Kannan, R.: NIMBLE: A toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2011, pp. 334–342 (2011) Ghoting, A., Kambadur, P., Pednault, E., Kannan, R.: NIMBLE: A toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2011, pp. 334–342 (2011)
29.
Zurück zum Zitat Ginsberg, J., Mohebbi, M. H., Patel, Rajan S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012–1014 (19 Feb 2009) Ginsberg, J., Mohebbi, M. H., Patel, Rajan S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012–1014 (19 Feb 2009)
30.
Zurück zum Zitat Glavic, B.: Big Data provenance: challenges and implications for benchmarking. In: Specifying Big Data Benchmarks, pp. 72–80. Springer (2014) Glavic, B.: Big Data provenance: challenges and implications for benchmarking. In: Specifying Big Data Benchmarks, pp. 72–80. Springer (2014)
31.
Zurück zum Zitat Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understanding individual human mobility patterns. Nature 453, 779–782 (2008)CrossRef Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understanding individual human mobility patterns. Nature 453, 779–782 (2008)CrossRef
33.
Zurück zum Zitat Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. San Francisco, Morgan Kaufmann (2005)MATH Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. San Francisco, Morgan Kaufmann (2005)MATH
34.
Zurück zum Zitat Harford, T.: Big Data: are we making a big mistakes? Financial Times, March 28 (2014) Harford, T.: Big Data: are we making a big mistakes? Financial Times, March 28 (2014)
35.
Zurück zum Zitat Hashem, I., Yaqoob, I., Anuor, N., Mokhter, S., Gani, A., Khan, S.: The rise of bog data on cloud computing. Review and open research issues. Inf. Syst. 47, 98–115 (2015)CrossRef Hashem, I., Yaqoob, I., Anuor, N., Mokhter, S., Gani, A., Khan, S.: The rise of bog data on cloud computing. Review and open research issues. Inf. Syst. 47, 98–115 (2015)CrossRef
36.
Zurück zum Zitat How big data analysis helped increase Walmart’s sales turnover. DeZyre Web page (23 May 2015) How big data analysis helped increase Walmart’s sales turnover. DeZyre Web page (23 May 2015)
37.
Zurück zum Zitat Kang, U., Faloutsos, C.: Big graph mining: algorithms and discoveries. ACM SIGKDD Explor. Newsl. 14(2), 29–36 (2012)CrossRef Kang, U., Faloutsos, C.: Big graph mining: algorithms and discoveries. ACM SIGKDD Explor. Newsl. 14(2), 29–36 (2012)CrossRef
38.
Zurück zum Zitat Kraska, T., Talwalkar, A., Duchi, J.C., Griffith, R., Franklin, M.J., Jordan, M.I. MLbase: A distributed machine-learning system. In: Proceedings of Sixth Biennial Conference on Innovative Data Systems Research (2013) Kraska, T., Talwalkar, A., Duchi, J.C., Griffith, R., Franklin, M.J., Jordan, M.I. MLbase: A distributed machine-learning system. In: Proceedings of Sixth Biennial Conference on Innovative Data Systems Research (2013)
39.
Zurück zum Zitat Krempl, G., Zliobaite, I., Brzezinski, D., Hullermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. ACM SIGKDD Explor. 16(1), 1–10 (2014). JuneCrossRef Krempl, G., Zliobaite, I., Brzezinski, D., Hullermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. ACM SIGKDD Explor. 16(1), 1–10 (2014). JuneCrossRef
41.
Zurück zum Zitat Maimon, O., Rokach, L. (eds.): The Data Mining and Knowledge Discovery Handbook. Springer (2005) Maimon, O., Rokach, L. (eds.): The Data Mining and Knowledge Discovery Handbook. Springer (2005)
42.
Zurück zum Zitat Mannila, H.: Data mining: machine learning, statistics, and databases, In: Proceedings of the Eight International Conference on Scientific and Statistical Database Management. Stockholm June 18–20, pp. 1–8 (1996) Mannila, H.: Data mining: machine learning, statistics, and databases, In: Proceedings of the Eight International Conference on Scientific and Statistical Database Management. Stockholm June 18–20, pp. 1–8 (1996)
43.
Zurück zum Zitat Manning C., Schutze H. Foundations of Statistical Natural Language Processing. MIT Press (1999) Manning C., Schutze H. Foundations of Statistical Natural Language Processing. MIT Press (1999)
44.
Zurück zum Zitat Marcus, G., Davis, E.: Eight (No, Nine!) Problems With Big Data. New York Times (Apr 6, 2014) Marcus, G., Davis, E.: Eight (No, Nine!) Problems With Big Data. New York Times (Apr 6, 2014)
45.
Zurück zum Zitat Matwin, S.: Privacy-preserving data mining techniques: survey and challenges. In: Custers, B., Calders, T., Schermer, B., Zarsky T. (eds.) Discrimination and Privacy in the Information Society. Springer Series on Studies in Applied Philosophy, Epistemology and Rational Ethics, vol. 3, pp. 209–221 (2013) Matwin, S.: Privacy-preserving data mining techniques: survey and challenges. In: Custers, B., Calders, T., Schermer, B., Zarsky T. (eds.) Discrimination and Privacy in the Information Society. Springer Series on Studies in Applied Philosophy, Epistemology and Rational Ethics, vol. 3, pp. 209–221 (2013)
46.
Zurück zum Zitat Matwin, S.: Machine learning: four lessons and what is next? Bull. Pol. AI Soc. 2, 2–7 (2013) Matwin, S.: Machine learning: four lessons and what is next? Bull. Pol. AI Soc. 2, 2–7 (2013)
47.
Zurück zum Zitat Mayer-Schonberger, V., Cukier, K.: Big Data: A Revolution That Will Transform How We Live, Work and Think. Eamon, Dolan/Houghton Mifflin Harcourt (2013) Mayer-Schonberger, V., Cukier, K.: Big Data: A Revolution That Will Transform How We Live, Work and Think. Eamon, Dolan/Houghton Mifflin Harcourt (2013)
48.
Zurück zum Zitat Morales, G., Bifet, A.: SAMOA: scalable advanced massive online analysis. J. Mach. Learn. Res. 16, 149–153 (2015) Morales, G., Bifet, A.: SAMOA: scalable advanced massive online analysis. J. Mach. Learn. Res. 16, 149–153 (2015)
49.
Zurück zum Zitat Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset). In: Proceedings of the 2008 IEEE Symposium on Security and Privacy SP’08, pp. 111–125 (2008) Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset). In: Proceedings of the 2008 IEEE Symposium on Security and Privacy SP’08, pp. 111–125 (2008)
50.
Zurück zum Zitat Piatetsky-Shapiro, G., Matheus, C. (eds): Knowledge discovery in databases. AAAI/MIT Press (1991) Piatetsky-Shapiro, G., Matheus, C. (eds): Knowledge discovery in databases. AAAI/MIT Press (1991)
51.
Zurück zum Zitat Pietsch, W.: Big Data? The New Science of Complexity. In: 6th Munich-Sydney-Tilburg Conference on Models and Decisions (Munich; 10–12 April 2013) Pietsch, W.: Big Data? The New Science of Complexity. In: 6th Munich-Sydney-Tilburg Conference on Models and Decisions (Munich; 10–12 April 2013)
52.
Zurück zum Zitat Reinventing Society in the Wake of Big Data—Edge’s interview with Alex "Sandy" Pentland (Posted August 30, 2012) Reinventing Society in the Wake of Big Data—Edge’s interview with Alex "Sandy" Pentland (Posted August 30, 2012)
53.
Zurück zum Zitat Ritter, D.: When to act on a correlation and when no to. Harward Business Review, March 19 (2014) Ritter, D.: When to act on a correlation and when no to. Harward Business Review, March 19 (2014)
54.
Zurück zum Zitat Roddick, J., Hornsby, K., Spiliopoulou, M.: An updated bibliography of temporal, spatial, and spatio-temporal data mining research. Lect. Notes Comput. Sci. 2007, 147–163 (2001)CrossRefMATH Roddick, J., Hornsby, K., Spiliopoulou, M.: An updated bibliography of temporal, spatial, and spatio-temporal data mining research. Lect. Notes Comput. Sci. 2007, 147–163 (2001)CrossRefMATH
55.
Zurück zum Zitat Rudin, C., Passonneau, R., Radeva, A., Jerome, S., Issac, D.: 21st century data miners meet 19-th century electrical cables. IEEE Comput. 103–105 (June 2011) Rudin, C., Passonneau, R., Radeva, A., Jerome, S., Issac, D.: 21st century data miners meet 19-th century electrical cables. IEEE Comput. 103–105 (June 2011)
56.
Zurück zum Zitat Rudin, C., et al.: Machine learning for the New York city power grid. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 328–345 (2012)CrossRef Rudin, C., et al.: Machine learning for the New York city power grid. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 328–345 (2012)CrossRef
58.
Zurück zum Zitat Simmhan, Y., Plale, B., Gannon, D.: A survey on data provenance techniques. Technical Report Indiana University, IUB-CS-TR618 (2005) Simmhan, Y., Plale, B., Gannon, D.: A survey on data provenance techniques. Technical Report Indiana University, IUB-CS-TR618 (2005)
59.
Zurück zum Zitat Singh, D., Reddy, C.: A survey on platforms for Big Data analytics. J. Big Data 1(8), 2–20 (2014) Singh, D., Reddy, C.: A survey on platforms for Big Data analytics. J. Big Data 1(8), 2–20 (2014)
61.
Zurück zum Zitat Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers (2012) Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers (2012)
63.
Zurück zum Zitat Thomson, C.: What Is IBMs Watson? The New York Times Magazine, June 16 (2010) Thomson, C.: What Is IBMs Watson? The New York Times Magazine, June 16 (2010)
65.
Zurück zum Zitat Venkateswara Rao, K., Govardhan, A., Chalapati, Rao K.V.: Spatiotemporal data mining: issues, tasks and applications. Int. J. Comput. Sci. & Eng. Surv. (IJCSES) 3(1) (Feb 2012) Venkateswara Rao, K., Govardhan, A., Chalapati, Rao K.V.: Spatiotemporal data mining: issues, tasks and applications. Int. J. Comput. Sci. & Eng. Surv. (IJCSES) 3(1) (Feb 2012)
66.
Zurück zum Zitat Vucetic S., Obradovis, Z.: Discovering homogeneous regions in spatial data through competition. In: Proceedings of the 17th International Conference of Machine Learning ICML, pp. 1095–1102 (2000) Vucetic S., Obradovis, Z.: Discovering homogeneous regions in spatial data through competition. In: Proceedings of the 17th International Conference of Machine Learning ICML, pp. 1095–1102 (2000)
67.
Zurück zum Zitat Zhou, Z.H., Chavla, N., Jin, Y., Williams, G.: Big Data opportunities and challenges: discussions from data analytics perspectives. IEEE Comput. Intell. Mag. 9(4), 62–74 (2014)CrossRef Zhou, Z.H., Chavla, N., Jin, Y., Williams, G.: Big Data opportunities and challenges: discussions from data analytics perspectives. IEEE Comput. Intell. Mag. 9(4), 62–74 (2014)CrossRef
Metadaten
Titel
A Machine Learning Perspective on Big Data Analysis
verfasst von
Nathalie Japkowicz
Jerzy Stefanowski
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-26989-4_1

Premium Partner