Skip to main content
Erschienen in: Knowledge and Information Systems 3/2019

18.09.2018 | Survey Paper

The big data system, components, tools, and technologies: a survey

verfasst von: T. Ramalingeswara Rao, Pabitra Mitra, Ravindara Bhatt, A. Goswami

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The traditional databases are not capable of handling unstructured data and high volumes of real-time datasets. Diverse datasets are unstructured lead to big data, and it is laborious to store, manage, process, analyze, visualize, and extract the useful insights from these datasets using traditional database approaches. However, many technical aspects exist in refining large heterogeneous datasets in the trend of big data. This paper aims to present a generalized view of complete big data system which includes several stages and key components of each stage in processing the big data. In particular, we compare and contrast various distributed file systems and MapReduce-supported NoSQL databases concerning certain parameters in data management process. Further, we present distinct distributed/cloud-based machine learning (ML) tools that play a key role to design, develop and deploy data models. The paper investigates case studies on distributed ML tools such as Mahout, Spark MLlib, and FlinkML. Further, we classify analytics based on the type of data, domain, and application. We distinguish various visualization tools pertaining three parameters: functionality, analysis capabilities, and supported development environment. Furthermore, we systematically investigate big data tools and technologies (Hadoop 3.0, Spark 2.3) including distributed/cloud-based stream processing tools in a comparative approach. Moreover, we discuss functionalities of several SQL Query tools on Hadoop based on 10 parameters. Finally, We present some critical points relevant to research directions and opportunities according to the current trend of big data. Investigating infrastructure tools for big data with recent developments provides a better understanding that how different tools and technologies apply to solve real-life applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Mattmann CA (2013) Computing: a vision for data science. Nature 493(7433):473–475CrossRef Mattmann CA (2013) Computing: a vision for data science. Nature 493(7433):473–475CrossRef
4.
Zurück zum Zitat Clavin W (2013) Managing the deluge of ‘big data’ from space. NASA Jet Propulsion Labratory Clavin W (2013) Managing the deluge of ‘big data’ from space. NASA Jet Propulsion Labratory
5.
Zurück zum Zitat Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805MATHCrossRef Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805MATHCrossRef
6.
Zurück zum Zitat SCB Intelligence (2008) Six technologies with potential impacts on us interests out to 2025. National Intelligent Concil, Tech. Rep SCB Intelligence (2008) Six technologies with potential impacts on us interests out to 2025. National Intelligent Concil, Tech. Rep
7.
Zurück zum Zitat Yu S, Liu M, Dou W, Liu X, Zhou S (2017) Networking for big data: a survey. IEEE Commun Surv Tutor 19(1):531–549CrossRef Yu S, Liu M, Dou W, Liu X, Zhou S (2017) Networking for big data: a survey. IEEE Commun Surv Tutor 19(1):531–549CrossRef
8.
Zurück zum Zitat Pouyanfar S, Yang Y, Chen S-C, Shyu M-L, Iyengar SS (2018) Multimedia big data analytics: a survey. ACM Comput Surv 51(1):10CrossRef Pouyanfar S, Yang Y, Chen S-C, Shyu M-L, Iyengar SS (2018) Multimedia big data analytics: a survey. ACM Comput Surv 51(1):10CrossRef
9.
Zurück zum Zitat Alaba FA, Othman M, Hashem IAT, Alotaibi F (2017) Internet of things security: a survey. J Netw Comput Appl 88:10–28CrossRef Alaba FA, Othman M, Hashem IAT, Alotaibi F (2017) Internet of things security: a survey. J Netw Comput Appl 88:10–28CrossRef
10.
Zurück zum Zitat Zikopoulos P, Eaton C, et al (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. ISBN: 0071790535 Zikopoulos P, Eaton C, et al (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. ISBN: 0071790535
11.
Zurück zum Zitat Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209CrossRef Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209CrossRef
12.
Zurück zum Zitat Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of big data on cloud computing: review and open research issues. Inf Syst 47:98–115CrossRef Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of big data on cloud computing: review and open research issues. Inf Syst 47:98–115CrossRef
13.
Zurück zum Zitat Ma C, Zhang HH, Wang X (2014) Machine learning for big data analytics in plants. Trends Plant Sci 19(12):798–808CrossRef Ma C, Zhang HH, Wang X (2014) Machine learning for big data analytics in plants. Trends Plant Sci 19(12):798–808CrossRef
14.
Zurück zum Zitat Laney D (2013) 3d data management: controlling data volume, velocity and variety. META Group Research Note 6(70), 1 Laney D (2013) 3d data management: controlling data volume, velocity and variety. META Group Research Note 6(70), 1
15.
Zurück zum Zitat Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM sIGKDD Explor Newsl 14(2):1–5CrossRef Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM sIGKDD Explor Newsl 14(2):1–5CrossRef
16.
Zurück zum Zitat Demchenko Y, De Laat C, Membrey P (2014) Defining architecture components of the big data ecosystem. In: Collaboration technologies and systems (CTS), 2014 international conference on, pp 104–112 Demchenko Y, De Laat C, Membrey P (2014) Defining architecture components of the big data ecosystem. In: Collaboration technologies and systems (CTS), 2014 international conference on, pp 104–112
17.
Zurück zum Zitat Fernández A, del Río S, López V, Bawakid A, del Jesus MJ, Benítez JM, Herrera F (2014) Big data with cloud computing: an insight on the computing environment, mapreduce, and programming frameworks. Wiley Interdiscip Rev: Data Min Knowl Discov 4(5):380–409 Fernández A, del Río S, López V, Bawakid A, del Jesus MJ, Benítez JM, Herrera F (2014) Big data with cloud computing: an insight on the computing environment, mapreduce, and programming frameworks. Wiley Interdiscip Rev: Data Min Knowl Discov 4(5):380–409
18.
Zurück zum Zitat Assunção MD, Calheiros RN, Bianchi S, Netto MAS, Buyya R (2015) Big data computing and clouds: trends and future directions. J Parallel Distrib Comput 79:3–15CrossRef Assunção MD, Calheiros RN, Bianchi S, Netto MAS, Buyya R (2015) Big data computing and clouds: trends and future directions. J Parallel Distrib Comput 79:3–15CrossRef
19.
20.
Zurück zum Zitat Schuelke-Leech B-A, Barry B, Muratori M, Yurkovich BJ (2015) Big data issues and opportunities for electric utilities. Renew Sustain Energy Rev 52:937–947CrossRef Schuelke-Leech B-A, Barry B, Muratori M, Yurkovich BJ (2015) Big data issues and opportunities for electric utilities. Renew Sustain Energy Rev 52:937–947CrossRef
21.
Zurück zum Zitat O’Leary DE (2015) Big data and privacy: emerging issues. IEEE Intell Syst 30(6):92–96CrossRef O’Leary DE (2015) Big data and privacy: emerging issues. IEEE Intell Syst 30(6):92–96CrossRef
22.
Zurück zum Zitat Kune R, Konugurthi PK, Agarwal A, Chillarige RR, Buyya R (2016) The anatomy of big data computing. Softw: Pract Exp 46(1):79–105 Kune R, Konugurthi PK, Agarwal A, Chillarige RR, Buyya R (2016) The anatomy of big data computing. Softw: Pract Exp 46(1):79–105
23.
Zurück zum Zitat Bello-Orgaz G, Jung JJ, Camacho D (2016) Social big data: recent achievements and new challenges. Inf Fusion 28:45–59CrossRef Bello-Orgaz G, Jung JJ, Camacho D (2016) Social big data: recent achievements and new challenges. Inf Fusion 28:45–59CrossRef
24.
Zurück zum Zitat Bajaber F, Elshawi R, Batarfi O, Altalhi A, Barnawi A, Sakr S (2016) Big data 2.0 processing systems: taxonomy and open challenges. J Grid Comput 14(3):379–405CrossRef Bajaber F, Elshawi R, Batarfi O, Altalhi A, Barnawi A, Sakr S (2016) Big data 2.0 processing systems: taxonomy and open challenges. J Grid Comput 14(3):379–405CrossRef
25.
Zurück zum Zitat Nadal S, Herrero V, Romero O, Abell A, Franch X, Vansummeren S, Valerio D (2017) A software reference architecture for semantic-aware big data systems. Inf Softw Technol 90:75–92CrossRef Nadal S, Herrero V, Romero O, Abell A, Franch X, Vansummeren S, Valerio D (2017) A software reference architecture for semantic-aware big data systems. Inf Softw Technol 90:75–92CrossRef
27.
Zurück zum Zitat Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144CrossRef Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144CrossRef
28.
Zurück zum Zitat Lee I (2017) Big data: dimensions, evolution, impacts, and challenges. Bus Horiz 60(3):293–303CrossRef Lee I (2017) Big data: dimensions, evolution, impacts, and challenges. Bus Horiz 60(3):293–303CrossRef
29.
Zurück zum Zitat Kung S-Y (2015) Visualization of big data. In: Cognitive informatics and cognitive computing (ICCI* CC), 2015 IEEE 14th international conference on, pp 447–448 Kung S-Y (2015) Visualization of big data. In: Cognitive informatics and cognitive computing (ICCI* CC), 2015 IEEE 14th international conference on, pp 447–448
30.
Zurück zum Zitat Strohbach M, Ziekow H, Gazis V, Akiva N (2015) Towards a big data analytics framework for IoT and smart city applications. In: Modeling and processing for next-generation big-data technologies. pp 257–282. ISBN: 14-9783319385006 Strohbach M, Ziekow H, Gazis V, Akiva N (2015) Towards a big data analytics framework for IoT and smart city applications. In: Modeling and processing for next-generation big-data technologies. pp 257–282. ISBN: 14-9783319385006
31.
Zurück zum Zitat Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107CrossRef Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107CrossRef
32.
Zurück zum Zitat Wu X, Chen H, Wu G, Liu J, Zheng Q, He X, Zhou A, Zhao Z-Q, Wei B, Ming G (2015) Knowledge engineering with big data. IEEE Intell Syst 30(5):46–55CrossRef Wu X, Chen H, Wu G, Liu J, Zheng Q, He X, Zhou A, Zhao Z-Q, Wei B, Ming G (2015) Knowledge engineering with big data. IEEE Intell Syst 30(5):46–55CrossRef
33.
Zurück zum Zitat Wu X, Chen H, Liu J, Gongqing W, Ruqian L, Zheng N (2017) Knowledge engineering with big data (bigke): a 54-month, 45-million rmb, 15-institution national grand project. IEEE Access 5:12696–12701CrossRef Wu X, Chen H, Liu J, Gongqing W, Ruqian L, Zheng N (2017) Knowledge engineering with big data (bigke): a 54-month, 45-million rmb, 15-institution national grand project. IEEE Access 5:12696–12701CrossRef
34.
Zurück zum Zitat Venner J, Wadkar S, Siddalingaiah M (2014) Pro apache hadoop. ISBN-13: 9781430248637 Venner J, Wadkar S, Siddalingaiah M (2014) Pro apache hadoop. ISBN-13: 9781430248637
35.
Zurück zum Zitat Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M (2009) A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data, pp 165–178 Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M (2009) A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data, pp 165–178
37.
Zurück zum Zitat Chang L, Wang Z, Ma T, Jian L, Ma L, Goldshuv A, Lonergan L, Cohen J, Welton C, Sherry G et al (2014) HAWQ: a massively parallel processing SQL engine in hadoop. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, pp 1223–1234 Chang L, Wang Z, Ma T, Jian L, Ma L, Goldshuv A, Lonergan L, Cohen J, Welton C, Sherry G et al (2014) HAWQ: a massively parallel processing SQL engine in hadoop. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, pp 1223–1234
40.
Zurück zum Zitat Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
41.
Zurück zum Zitat Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111CrossRef Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111CrossRef
42.
Zurück zum Zitat Lenharth A, Nguyen D, Pingali K (2016) Parallel graph analytics. Commun ACM 59(5):78–87CrossRef Lenharth A, Nguyen D, Pingali K (2016) Parallel graph analytics. Commun ACM 59(5):78–87CrossRef
44.
Zurück zum Zitat Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 135–146 Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 135–146
46.
Zurück zum Zitat Zhang H, Chen G, Ooi BC, Tan K-L, Zhang M (2015) In-memory big data management and processing: a survey. IEEE Trans Knowl Data Eng 27(7):1920–1948CrossRef Zhang H, Chen G, Ooi BC, Tan K-L, Zhang M (2015) In-memory big data management and processing: a survey. IEEE Trans Knowl Data Eng 27(7):1920–1948CrossRef
47.
Zurück zum Zitat Cai Q, Zhang H, Guo W, Chen G, Ooi BC, Tan K-L, Wong WF (2018) Memepic: towards a unified in-memory big data management system. IEEE Trans Big Data Cai Q, Zhang H, Guo W, Chen G, Ooi BC, Tan K-L, Wong WF (2018) Memepic: towards a unified in-memory big data management system. IEEE Trans Big Data
48.
Zurück zum Zitat Lim H, Han D, Andersen DG, Kaminsky M (2014) Mica: a holistic approach to fast in-memory key-value storage. USENIX, pp 429–444 Lim H, Han D, Andersen DG, Kaminsky M (2014) Mica: a holistic approach to fast in-memory key-value storage. USENIX, pp 429–444
49.
Zurück zum Zitat Kuznetsov SD, Poskonin AV (2014) Nosql data management systems. Program Comput Softw 40(6):323–332CrossRef Kuznetsov SD, Poskonin AV (2014) Nosql data management systems. Program Comput Softw 40(6):323–332CrossRef
51.
Zurück zum Zitat Chen CLP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347CrossRef Chen CLP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347CrossRef
52.
Zurück zum Zitat Mazón J-N, Lechtenbörger J, Trujillo J (2009) A survey on summarizability issues in multidimensional modeling. Data Knowl Eng 68(12):1452–1469CrossRef Mazón J-N, Lechtenbörger J, Trujillo J (2009) A survey on summarizability issues in multidimensional modeling. Data Knowl Eng 68(12):1452–1469CrossRef
53.
Zurück zum Zitat Hu H, Wen Y, Chua T-S, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687CrossRef Hu H, Wen Y, Chua T-S, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687CrossRef
54.
Zurück zum Zitat Gantz J, Reinsel D (2011) Extracting value from chaos. IDC iview 1142:1–12 Gantz J, Reinsel D (2011) Extracting value from chaos. IDC iview 1142:1–12
55.
Zurück zum Zitat Kouzes RT, Anderson GA, Elbert ST, Gorton I, Gracio DK (2009) The changing paradigm of data-intensive computing. IEEE Comput 42(1):26–34CrossRef Kouzes RT, Anderson GA, Elbert ST, Gorton I, Gracio DK (2009) The changing paradigm of data-intensive computing. IEEE Comput 42(1):26–34CrossRef
56.
Zurück zum Zitat Labrinidis A, Jagadish HV (2012) Challenges and opportunities with big data. Proc VLDB Endow 5(12):2032–2033CrossRef Labrinidis A, Jagadish HV (2012) Challenges and opportunities with big data. Proc VLDB Endow 5(12):2032–2033CrossRef
57.
Zurück zum Zitat UN Global Pulse (2012) Big data for development: challenges and opportunities. UN Global Pulse, New York UN Global Pulse (2012) Big data for development: challenges and opportunities. UN Global Pulse, New York
58.
Zurück zum Zitat Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573CrossRef Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573CrossRef
59.
Zurück zum Zitat Chen Y, Qin X, Bian H, Chen J, Dong Z, Du X, Gao Y, Liu D, Lu J, Zhang H (2014) A study of SQL-on-hadoop systems. In: Workshop on big data benchmarks, performance optimization, and emerging hardware, pp 154–166 Chen Y, Qin X, Bian H, Chen J, Dong Z, Du X, Gao Y, Liu D, Lu J, Zhang H (2014) A study of SQL-on-hadoop systems. In: Workshop on big data benchmarks, performance optimization, and emerging hardware, pp 154–166
60.
Zurück zum Zitat Mohammed EA, Far BH, Naugler C (2014) Applications of the mapreduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min 7(1):1CrossRef Mohammed EA, Far BH, Naugler C (2014) Applications of the mapreduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min 7(1):1CrossRef
61.
Zurück zum Zitat Yang C, Huang Q, Li Z, Liu K, Hu F (2017) Big data and cloud computing: innovation opportunities and challenges. Int J Digit Earth 10(1):13–53CrossRef Yang C, Huang Q, Li Z, Liu K, Hu F (2017) Big data and cloud computing: innovation opportunities and challenges. Int J Digit Earth 10(1):13–53CrossRef
62.
Zurück zum Zitat Oussous A, Benjelloun F-Z, Lahcen AA, Belfkih S (2017) Big data technologies: a survey. J King Saud Univ-Comput Inf Sci Oussous A, Benjelloun F-Z, Lahcen AA, Belfkih S (2017) Big data technologies: a survey. J King Saud Univ-Comput Inf Sci
63.
Zurück zum Zitat Salloum S, Dautov R, Chen X, Peng PX, Huang JZ (2016) Big data analytics on apache spark. Int J Data Sci Anal, pp 1–20 Salloum S, Dautov R, Chen X, Peng PX, Huang JZ (2016) Big data analytics on apache spark. Int J Data Sci Anal, pp 1–20
64.
Zurück zum Zitat de Assuncao MD, da Silva Veith A, Buyya R (2018) Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J Netw Comput Appl 103:1–17CrossRef de Assuncao MD, da Silva Veith A, Buyya R (2018) Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J Netw Comput Appl 103:1–17CrossRef
65.
Zurück zum Zitat Krumm J, Davies N, Narayanaswami C (2008) User-generated content. IEEE Pervasive Comput 4(7):10–11CrossRef Krumm J, Davies N, Narayanaswami C (2008) User-generated content. IEEE Pervasive Comput 4(7):10–11CrossRef
67.
Zurück zum Zitat Shameer K, Badgeley MA, Miotto R, Glicksberg BS, Morgan JW, Dudley JT (2016) Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams. Briefings in Bioinformatics, bbv118 Shameer K, Badgeley MA, Miotto R, Glicksberg BS, Morgan JW, Dudley JT (2016) Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams. Briefings in Bioinformatics, bbv118
68.
Zurück zum Zitat Marx V (2013) Biology: the big challenges of big data. Nature 498(7453):255–260CrossRef Marx V (2013) Biology: the big challenges of big data. Nature 498(7453):255–260CrossRef
69.
Zurück zum Zitat Cook CE, Bergman MT, Cochrane G, Apweiler R, Birney E (2017) The european bioinformatics institute in 2017: data coordination and integration. Nucleic Acids Res 46(D1):D21–D29CrossRef Cook CE, Bergman MT, Cochrane G, Apweiler R, Birney E (2017) The european bioinformatics institute in 2017: data coordination and integration. Nucleic Acids Res 46(D1):D21–D29CrossRef
70.
Zurück zum Zitat Akter S, Wamba SF (2016) Big data analytics in e-commerce: a systematic review and agenda for future research. Electron Mark 26(2):173–194CrossRef Akter S, Wamba SF (2016) Big data analytics in e-commerce: a systematic review and agenda for future research. Electron Mark 26(2):173–194CrossRef
73.
Zurück zum Zitat Sun J, Reddy CK (2013) Big data analytics for healthcare. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1525–1525 Sun J, Reddy CK (2013) Big data analytics for healthcare. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1525–1525
74.
Zurück zum Zitat Ranjan R, Georgakopoulos D, Wang L (2016) A note on software tools and technologies for delivering smart media-optimized big data applications in the cloud. Computing 98(1–2):1–5MathSciNetMATHCrossRef Ranjan R, Georgakopoulos D, Wang L (2016) A note on software tools and technologies for delivering smart media-optimized big data applications in the cloud. Computing 98(1–2):1–5MathSciNetMATHCrossRef
79.
Zurück zum Zitat Rob Kitchin (2017) Big data. The International Encyclopedia of Geography Rob Kitchin (2017) Big data. The International Encyclopedia of Geography
80.
Zurück zum Zitat Gudivada VN, Baeza-Yates RA, Raghavan VV (2017) Big data: promises and problems. IEEE Comput 48(3):20–23CrossRef Gudivada VN, Baeza-Yates RA, Raghavan VV (2017) Big data: promises and problems. IEEE Comput 48(3):20–23CrossRef
81.
Zurück zum Zitat Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M (2015) Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun Surv Tutor 17(4):2347–2376CrossRef Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M (2015) Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun Surv Tutor 17(4):2347–2376CrossRef
82.
Zurück zum Zitat Raun J, Ahas R, Tiru M (2016) Measuring tourism destinations using mobile tracking data. Tour Manag 57:202–212CrossRef Raun J, Ahas R, Tiru M (2016) Measuring tourism destinations using mobile tracking data. Tour Manag 57:202–212CrossRef
83.
Zurück zum Zitat Kitchin R (2014) The data revolution: Big data, open data, data infrastructures and their consequences. Sage, ISBN: 13-9781446287484 Kitchin R (2014) The data revolution: Big data, open data, data infrastructures and their consequences. Sage, ISBN: 13-9781446287484
84.
Zurück zum Zitat Abiteboul S, Manolescu I, Rigaux P, Rousset M-C, Senellart P (2011) Web data management. Cambridge University Press, ISBN-13: 9781107012431 Abiteboul S, Manolescu I, Rigaux P, Rousset M-C, Senellart P (2011) Web data management. Cambridge University Press, ISBN-13: 9781107012431
85.
Zurück zum Zitat Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: ACM SIGOPS operating systems review, vol 37, pp 29–43 Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: ACM SIGOPS operating systems review, vol 37, pp 29–43
86.
Zurück zum Zitat Doctorow C (2008) Big data: welcome to the petacenre. Nat News 455(7209):16–21CrossRef Doctorow C (2008) Big data: welcome to the petacenre. Nat News 455(7209):16–21CrossRef
87.
Zurück zum Zitat Ovsiannikov M, Rus S, Reeves D, Sutter P, Rao S, Kelly J (2013) The quantcast file system. Proc VLDB Endow 6(11):1092–1101CrossRef Ovsiannikov M, Rus S, Reeves D, Sutter P, Rao S, Kelly J (2013) The quantcast file system. Proc VLDB Endow 6(11):1092–1101CrossRef
88.
Zurück zum Zitat Guerraoui R, Schiper A (1996) Fault-tolerance by replication in distributed systems. In: International conference on reliable software technologies, pp 38–57 Guerraoui R, Schiper A (1996) Fault-tolerance by replication in distributed systems. In: International conference on reliable software technologies, pp 38–57
89.
Zurück zum Zitat Wiesmann M, Pedone F, Schiper A, Kemme B, Alonso G (2000) Understanding replication in databases and distributed systems. In: Distributed computing systems, 2000. Proceedings of 20th international conference on, pp 464–474 Wiesmann M, Pedone F, Schiper A, Kemme B, Alonso G (2000) Understanding replication in databases and distributed systems. In: Distributed computing systems, 2000. Proceedings of 20th international conference on, pp 464–474
90.
Zurück zum Zitat Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pp 1–10 Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pp 1–10
92.
Zurück zum Zitat Schmuck FB, Haskin RL (2002) Gpfs: a shared-disk file system for large computing clusters. In: FAST, vol 2, pp 231–244 Schmuck FB, Haskin RL (2002) Gpfs: a shared-disk file system for large computing clusters. In: FAST, vol 2, pp 231–244
93.
Zurück zum Zitat Jones T, Koniges AE, Yates RK (2000) Performance of the IBM general parallel file system. In: IPDPS, pp 673–681 Jones T, Koniges AE, Yates RK (2000) Performance of the IBM general parallel file system. In: IPDPS, pp 673–681
95.
Zurück zum Zitat Thanh TD, Mohan S, Choi E, Kim SB, Kim P (2008) A taxonomy and survey on distributed file systems. In: Networked computing and advanced information management, 2008. NCM’08. Fourth international conference on 1, pp 144–149 Thanh TD, Mohan S, Choi E, Kim SB, Kim P (2008) A taxonomy and survey on distributed file systems. In: Networked computing and advanced information management, 2008. NCM’08. Fourth international conference on 1, pp 144–149
96.
Zurück zum Zitat Beaver D, Kumar S, Li HC, Sobel J, Vajgel P (2010) Finding a needle in haystack: facebook’s photo storage. OSDI 10:1–8 Beaver D, Kumar S, Li HC, Sobel J, Vajgel P (2010) Finding a needle in haystack: facebook’s photo storage. OSDI 10:1–8
97.
Zurück zum Zitat Fetterly D, Haridasan M, Isard M, Sundararaman S (2011) Tidyfs: a simple and small distributed file system. In: USENIX annual technical conference, pp 34–34 Fetterly D, Haridasan M, Isard M, Sundararaman S (2011) Tidyfs: a simple and small distributed file system. In: USENIX annual technical conference, pp 34–34
100.
Zurück zum Zitat Brewer E (2010) A certain freedom: thoughts on the cap theorem. In: Proceedings of the 29th ACM SIGACT-SIGOPS symposium on principles of distributed computing, pp 335–335 Brewer E (2010) A certain freedom: thoughts on the cap theorem. In: Proceedings of the 29th ACM SIGACT-SIGOPS symposium on principles of distributed computing, pp 335–335
101.
Zurück zum Zitat Lourenço JR, Cabral B, Carreiro P, Vieira M, Bernardino J (2015) Choosing the right nosql database for the job: a quality attribute evaluation. J Big Data 2(1):1–26CrossRef Lourenço JR, Cabral B, Carreiro P, Vieira M, Bernardino J (2015) Choosing the right nosql database for the job: a quality attribute evaluation. J Big Data 2(1):1–26CrossRef
102.
Zurück zum Zitat Buyya R, Calheiros RN, Dastjerdi AV (2016) Big data: principles and paradigms. Morgan Kaufmann, ISBN-13: 9780128053942 Buyya R, Calheiros RN, Dastjerdi AV (2016) Big data: principles and paradigms. Morgan Kaufmann, ISBN-13: 9780128053942
103.
Zurück zum Zitat Abadi D, Boncz P, Harizopoulos S, Idreos S, Madden S et al (2013) The design and implementation of modern column-oriented database systems. Now 5(3):197–280 Abadi D, Boncz P, Harizopoulos S, Idreos S, Madden S et al (2013) The design and implementation of modern column-oriented database systems. Now 5(3):197–280
104.
Zurück zum Zitat Matei G, Bank RC (2010) Column-oriented databases, an alternative for analytical environment. Database Syst J 1(2):3–16 Matei G, Bank RC (2010) Column-oriented databases, an alternative for analytical environment. Database Syst J 1(2):3–16
105.
Zurück zum Zitat Floratou A, Patel JM, Shekita EJ, Tata S (2011) Column-oriented storage techniques for mapreduce. Proc VLDB Endow 4(7):419–429CrossRef Floratou A, Patel JM, Shekita EJ, Tata S (2011) Column-oriented storage techniques for mapreduce. Proc VLDB Endow 4(7):419–429CrossRef
106.
Zurück zum Zitat Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):1–26CrossRef Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):1–26CrossRef
107.
Zurück zum Zitat Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40CrossRef Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40CrossRef
108.
Zurück zum Zitat Stonebraker M, Abadi DJ, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E et al. (2005) C-store: a column-oriented DBMS. In: Proceedings of the 31st international conference on very large data bases, pp 553–564 Stonebraker M, Abadi DJ, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E et al. (2005) C-store: a column-oriented DBMS. In: Proceedings of the 31st international conference on very large data bases, pp 553–564
109.
Zurück zum Zitat Boncz PA, Zukowski M, Nes N (2005) Monetdb/x100: hyper-pipelining query execution. CIDR 5:225–237 Boncz PA, Zukowski M, Nes N (2005) Monetdb/x100: hyper-pipelining query execution. CIDR 5:225–237
110.
Zurück zum Zitat Idreos S, Groffen F, Nes N, Manegold S, Mullender S, Kersten M (2012) Monetdb: two decades of research in column-oriented database architectures. Bull IEEE Comput Soc Tech Comm Data Eng 35(1):40–45 Idreos S, Groffen F, Nes N, Manegold S, Mullender S, Kersten M (2012) Monetdb: two decades of research in column-oriented database architectures. Bull IEEE Comput Soc Tech Comm Data Eng 35(1):40–45
111.
Zurück zum Zitat Sciore E (2007) Simpledb: a simple java-based multiuser syst for teaching database internals. ACM SIGCSE Bull 39(1):561–565CrossRef Sciore E (2007) Simpledb: a simple java-based multiuser syst for teaching database internals. ACM SIGCSE Bull 39(1):561–565CrossRef
112.
Zurück zum Zitat Zukowski M, Boncz P (2012) Vectorwise: beyond column stores. IEEE Data Eng Bull 35(1):21–27 Zukowski M, Boncz P (2012) Vectorwise: beyond column stores. IEEE Data Eng Bull 35(1):21–27
113.
Zurück zum Zitat Edward SG, Sabharwal N (2015) Mongodb limitations. In: Practical MongoDB, pp 227–232 Edward SG, Sabharwal N (2015) Mongodb limitations. In: Practical MongoDB, pp 227–232
116.
Zurück zum Zitat DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. ACM SIGOPS Oper Syst Rev 41(6):205–220CrossRef DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. ACM SIGOPS Oper Syst Rev 41(6):205–220CrossRef
118.
Zurück zum Zitat Sumbaly R, Kreps J, Gao L, Feinberg A, Soman C, Shah S (2012) Serving large-scale batch computed data with project voldemort. In: Proceedings of the 10th USENIX conference on file and storage technologies, pp 18–18 Sumbaly R, Kreps J, Gao L, Feinberg A, Soman C, Shah S (2012) Serving large-scale batch computed data with project voldemort. In: Proceedings of the 10th USENIX conference on file and storage technologies, pp 18–18
119.
Zurück zum Zitat Gudivada VN, Rao D, Raghavan VV (2014) NoSQL systems for big data management. In: 2014 IEEE World congress on services, pp 190–197 Gudivada VN, Rao D, Raghavan VV (2014) NoSQL systems for big data management. In: 2014 IEEE World congress on services, pp 190–197
123.
Zurück zum Zitat Moniruzzaman ABM, Hossain SA (2013) Nosql database: new era of databases for big data analytics-classification, characteristics and comparison. arXiv preprint arXiv:1307.0191 Moniruzzaman ABM, Hossain SA (2013) Nosql database: new era of databases for big data analytics-classification, characteristics and comparison. arXiv preprint arXiv:​1307.​0191
126.
Zurück zum Zitat Khetrapal A, Ganesh V (2006) Hbase and hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, pp 22–28 Khetrapal A, Ganesh V (2006) Hbase and hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, pp 22–28
128.
Zurück zum Zitat Ghaffari Amir, Chechina Natalia, Trinder Phil, Meredith Jon (2013) Scalable persistent storage for Erlang: theory and practice. In: Proceedings of the twelfth ACM SIGPLAN workshop on Erlang, pp 73–74 Ghaffari Amir, Chechina Natalia, Trinder Phil, Meredith Jon (2013) Scalable persistent storage for Erlang: theory and practice. In: Proceedings of the twelfth ACM SIGPLAN workshop on Erlang, pp 73–74
129.
141.
Zurück zum Zitat Meng X, Bradley J, Yuvaz B, Sparks E, Venkataraman S, Liu D, Freeman J, Tsai D, Amde M, Owen S et al (2016) Mllib: Machine learning in apache spark. JMLR 17(34):1–7MathSciNetMATH Meng X, Bradley J, Yuvaz B, Sparks E, Venkataraman S, Liu D, Freeman J, Tsai D, Amde M, Owen S et al (2016) Mllib: Machine learning in apache spark. JMLR 17(34):1–7MathSciNetMATH
142.
Zurück zum Zitat Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef
146.
Zurück zum Zitat Carbone P, Ewen S, Haridi S, Katsifodimos A, Markl V, Tzoumas K (2015) Apache flink: stream and batch processing in a single engine. Data Eng 38:28–38 Carbone P, Ewen S, Haridi S, Katsifodimos A, Markl V, Tzoumas K (2015) Apache flink: stream and batch processing in a single engine. Data Eng 38:28–38
149.
Zurück zum Zitat Raghavan UN, Réka A, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106CrossRef Raghavan UN, Réka A, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106CrossRef
150.
Zurück zum Zitat Chappell D (2015) Introducing azure machine learning. A guide for technical professionals, sponsored by microsoft corporation Chappell D (2015) Introducing azure machine learning. A guide for technical professionals, sponsored by microsoft corporation
160.
Zurück zum Zitat Huang L, Hu G, Lu X (2009) E-business ecosystem and its evolutionary path: the case of the alibaba group in china. Pacific Asia J Assoc Inf Syst 1(4) Huang L, Hu G, Lu X (2009) E-business ecosystem and its evolutionary path: the case of the alibaba group in china. Pacific Asia J Assoc Inf Syst 1(4)
162.
Zurück zum Zitat Gupta P, Sharma A, Jindal R (2016) Scalable machine-learning algorithms for big data analytics: a comprehensive review. Wiley Interdiscip Rev: Data Min Knowl Discov 6(6):194–214 Gupta P, Sharma A, Jindal R (2016) Scalable machine-learning algorithms for big data analytics: a comprehensive review. Wiley Interdiscip Rev: Data Min Knowl Discov 6(6):194–214
164.
Zurück zum Zitat Ji X, Chun SA, Cappellari P, Geller J (2017) Linking and using social media data for enhancing public health analytics. J Inf Sci 43(2):221–245CrossRef Ji X, Chun SA, Cappellari P, Geller J (2017) Linking and using social media data for enhancing public health analytics. J Inf Sci 43(2):221–245CrossRef
165.
Zurück zum Zitat Kanaujia PKM, Pandey M, Rautaray SS (2017) Real time financial analysis using big data technologies. In: I-SMAC (IoT in social, mobile, analytics and cloud)(I-SMAC), 2017 international conference on, pp 131–136 Kanaujia PKM, Pandey M, Rautaray SS (2017) Real time financial analysis using big data technologies. In: I-SMAC (IoT in social, mobile, analytics and cloud)(I-SMAC), 2017 international conference on, pp 131–136
166.
Zurück zum Zitat Moe WW, Schweidel DA (2017) Opportunities for innovation in social media analytics. J Prod Innov Manag 34(5):697–702CrossRef Moe WW, Schweidel DA (2017) Opportunities for innovation in social media analytics. J Prod Innov Manag 34(5):697–702CrossRef
167.
Zurück zum Zitat Psyllidis A, Bozzon A, Bocconi S, Bolivar CT (2015) A platform for urban analytics and semantic data integration in city planning. In: International conference on computer-aided architectural design futures, pp 21–36 Psyllidis A, Bozzon A, Bocconi S, Bolivar CT (2015) A platform for urban analytics and semantic data integration in city planning. In: International conference on computer-aided architectural design futures, pp 21–36
168.
Zurück zum Zitat Gust G, Flath C, Brandt T, Ströhle P, Neumann D (2016) Bringing analytics into practice: evidence from the power sector Gust G, Flath C, Brandt T, Ströhle P, Neumann D (2016) Bringing analytics into practice: evidence from the power sector
169.
Zurück zum Zitat Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471 Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471
170.
Zurück zum Zitat Baesens B, Van Vlasselaer V, Verbeke W (2015) Fraud analytics: a broader perspective. Fraud analytics using descriptive, predictive, and social network techniques: a guide to data science for fraud detection, pp 313–346 Baesens B, Van Vlasselaer V, Verbeke W (2015) Fraud analytics: a broader perspective. Fraud analytics using descriptive, predictive, and social network techniques: a guide to data science for fraud detection, pp 313–346
171.
Zurück zum Zitat Xu Z, Mei L, Chuanping H, Liu Y (2016) The big data analytics and applications of the surveillance system using video structured description technology. Cluster Comput 19(3):1283–1292CrossRef Xu Z, Mei L, Chuanping H, Liu Y (2016) The big data analytics and applications of the surveillance system using video structured description technology. Cluster Comput 19(3):1283–1292CrossRef
172.
Zurück zum Zitat Bisias D, Flood M, Lo AW, Valavanis S (2012) A survey of systemic risk analytics. Annu Rev Financ Econ 4(1):255–296CrossRef Bisias D, Flood M, Lo AW, Valavanis S (2012) A survey of systemic risk analytics. Annu Rev Financ Econ 4(1):255–296CrossRef
173.
Zurück zum Zitat Sagiroglu S, Sinanc D (2013) Big data: a review. In: Collaboration technologies and systems (CTS), 2013 international conference on, pp 42–47 Sagiroglu S, Sinanc D (2013) Big data: a review. In: Collaboration technologies and systems (CTS), 2013 international conference on, pp 42–47
174.
Zurück zum Zitat Rabkin A, Arye M, Sen S, Pai VS, Freedman MJ (2014) Aggregation and degradation in JetStream: streaming analytics in the wide area. In: NSDI vol 14, 275–288 Rabkin A, Arye M, Sen S, Pai VS, Freedman MJ (2014) Aggregation and degradation in JetStream: streaming analytics in the wide area. In: NSDI vol 14, 275–288
175.
Zurück zum Zitat Zhang L, Stoffel A, Behrisch M, Mittelstadt S, Schreck T, Pompl R, Weber S, Last H, Keim D (2012) Visual analytics for the big data era comparative review of state-of-the-art commercial systems. In: Visual analytics science and technology (VAST), 2012 IEEE conference on, pp 173–182 Zhang L, Stoffel A, Behrisch M, Mittelstadt S, Schreck T, Pompl R, Weber S, Last H, Keim D (2012) Visual analytics for the big data era comparative review of state-of-the-art commercial systems. In: Visual analytics science and technology (VAST), 2012 IEEE conference on, pp 173–182
176.
Zurück zum Zitat Waller MA, Fawcett SE (2013) Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. J Bus Logist 34(2):77–84CrossRef Waller MA, Fawcett SE (2013) Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. J Bus Logist 34(2):77–84CrossRef
177.
Zurück zum Zitat Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36(4):1165–1188CrossRef Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36(4):1165–1188CrossRef
178.
Zurück zum Zitat Raghupathi W, Raghupathi V (2013) An overview of health analytics. J Health Med Inform 4(3):1–11 Raghupathi W, Raghupathi V (2013) An overview of health analytics. J Health Med Inform 4(3):1–11
181.
Zurück zum Zitat Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) Graphx: a resilient distributed graph system on spark. In: First international workshop on graph data management experiences and systems 2(1–2):6 Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) Graphx: a resilient distributed graph system on spark. In: First international workshop on graph data management experiences and systems 2(1–2):6
182.
Zurück zum Zitat Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C (2011) Graphlab: A distributed framework for machine learning in the cloud. arXiv preprint arXiv:1107.0922 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C (2011) Graphlab: A distributed framework for machine learning in the cloud. arXiv preprint arXiv:​1107.​0922
184.
Zurück zum Zitat Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer, Berlin. ISBN-13: 9783642194597 Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer, Berlin. ISBN-13: 9783642194597
185.
Zurück zum Zitat Wesley R, Eldridge M, Terlecki PT (2011) An analytic data engine for visualization in tableau. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 1185–1194 Wesley R, Eldridge M, Terlecki PT (2011) An analytic data engine for visualization in tableau. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 1185–1194
186.
Zurück zum Zitat García M, Harmsen B (2012) Qlikview 11 for developers. Packt Publishing Ltd García M, Harmsen B (2012) Qlikview 11 for developers. Packt Publishing Ltd
190.
Zurück zum Zitat Abousalh-Neto NA, Kazgan S (2012) Big data exploration through visual analytics. In: Visual analytics science and technology (VAST), 2012 IEEE conference on, pp 285–286 Abousalh-Neto NA, Kazgan S (2012) Big data exploration through visual analytics. In: Visual analytics science and technology (VAST), 2012 IEEE conference on, pp 285–286
193.
Zurück zum Zitat Smoot ME, Ono K, Ruscheinski J, Wang P-L, Ideker T (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27(3):431–432CrossRef Smoot ME, Ono K, Ruscheinski J, Wang P-L, Ideker T (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27(3):431–432CrossRef
194.
Zurück zum Zitat Batagelj V, Mrvar A (1998) Pajek-program for large network analysis. Connections 21(2):47–57MATH Batagelj V, Mrvar A (1998) Pajek-program for large network analysis. Connections 21(2):47–57MATH
195.
Zurück zum Zitat Smith MA, Shneiderman B, Milic-Frayling N, Mendes Rodrigues E, Barash V, Dunne C, Capone T, Perer A, Gleave E (2009) Analyzing (social media) networks with NodeXL. In: Proceedings of the fourth international conference on communities and technologies, pp 255–264 Smith MA, Shneiderman B, Milic-Frayling N, Mendes Rodrigues E, Barash V, Dunne C, Capone T, Perer A, Gleave E (2009) Analyzing (social media) networks with NodeXL. In: Proceedings of the fourth international conference on communities and technologies, pp 255–264
196.
Zurück zum Zitat Bastian M, Heymann S, Jacomy M et al (2009) Gephi: an open source software for exploring and manipulating networks. ICWSM 8:361–362 Bastian M, Heymann S, Jacomy M et al (2009) Gephi: an open source software for exploring and manipulating networks. ICWSM 8:361–362
197.
Zurück zum Zitat Csardi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst 1695(5):1–9 Csardi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst 1695(5):1–9
199.
Zurück zum Zitat Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1):11CrossRef Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1):11CrossRef
200.
Zurück zum Zitat Lee K-H, Lee Y-J, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. AcM sIGMoD Rec 40(4):11–20CrossRef Lee K-H, Lee Y-J, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. AcM sIGMoD Rec 40(4):11–20CrossRef
201.
Zurück zum Zitat Chen Y, Kreulen J, Campbell M, Abrams C (2011) Analytics ecosystem transformation: a force for business model innovation. In: 2011 Annual SRII global conference, pp 11–20 Chen Y, Kreulen J, Campbell M, Abrams C (2011) Analytics ecosystem transformation: a force for business model innovation. In: 2011 Annual SRII global conference, pp 11–20
202.
Zurück zum Zitat Venner J, Wadkar S, Siddalingaiah M (2014) Pro apache Hadoop. ISBN: 9781430248637 Venner J, Wadkar S, Siddalingaiah M (2014) Pro apache Hadoop. ISBN: 9781430248637
205.
Zurück zum Zitat Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe Jason, Shah Hitesh, Seth Siddharth et al (2013) Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, pp 5:1–16 Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe Jason, Shah Hitesh, Seth Siddharth et al (2013) Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing, pp 5:1–16
208.
Zurück zum Zitat Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. HotCloud 10:10–10 Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. HotCloud 10:10–10
209.
Zurück zum Zitat Marcu O-C, Costan A, Antoniu G, Pérez-Hernández MS (2016) Spark versus flink: understanding performance in big data analytics frameworks. In: Cluster computing (CLUSTER), 2016 IEEE international conference on, pp 433–442 Marcu O-C, Costan A, Antoniu G, Pérez-Hernández MS (2016) Spark versus flink: understanding performance in big data analytics frameworks. In: Cluster computing (CLUSTER), 2016 IEEE international conference on, pp 433–442
211.
Zurück zum Zitat Rensin DK (2015) Kubernetes-scheduling the future at cloud scale Rensin DK (2015) Kubernetes-scheduling the future at cloud scale
212.
Zurück zum Zitat Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010) Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), pp 996–1005 Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010) Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), pp 996–1005
214.
Zurück zum Zitat Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, et al (2015) Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1383–1394 Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, et al (2015) Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1383–1394
215.
Zurück zum Zitat Traverso M (2013) Presto: interacting with petabytes of data at facebook. Retrieved February 4:2014 Traverso M (2013) Presto: interacting with petabytes of data at facebook. Retrieved February 4:2014
216.
Zurück zum Zitat Hausenblas M, Nadeau J (2013) Apache drill: interactive ad-hoc analysis at scale. Big Data 1(2):100–104CrossRef Hausenblas M, Nadeau J (2013) Apache drill: interactive ad-hoc analysis at scale. Big Data 1(2):100–104CrossRef
218.
Zurück zum Zitat Ho L-Y, Li T-H, Wu J-J, Liu P (2013) Kylin: an efficient and scalable graph data processing system. In: Big data, 2013 IEEE international conference on, pp 193–198 Ho L-Y, Li T-H, Wu J-J, Liu P (2013) Kylin: an efficient and scalable graph data processing system. In: Big data, 2013 IEEE international conference on, pp 193–198
219.
Zurück zum Zitat Lamb A, Fuller M, Varadarajan R, Tran N, Vandiver B, Doshi L, Bear C (2012) The vertica analytic database: C-store 7 years later. Proc VLDB Endow 5(12):1790–1801CrossRef Lamb A, Fuller M, Varadarajan R, Tran N, Vandiver B, Doshi L, Bear C (2012) The vertica analytic database: C-store 7 years later. Proc VLDB Endow 5(12):1790–1801CrossRef
220.
Zurück zum Zitat Chattopadhyay B, Lin L, Liu W, Mittal S, Aragonda P, Lychagina V, Kwon Y, Wong M (2011) Tenzing a SQL implementation on the mapreduce framework Chattopadhyay B, Lin L, Liu W, Mittal S, Aragonda P, Lychagina V, Kwon Y, Wong M (2011) Tenzing a SQL implementation on the mapreduce framework
221.
Zurück zum Zitat Floratou A, Minhas UF, Özcan F (2014) Sql-on-hadoop: full circle back to shared-nothing database architectures. Proc VLDB Endow 7(12):1295–1306CrossRef Floratou A, Minhas UF, Özcan F (2014) Sql-on-hadoop: full circle back to shared-nothing database architectures. Proc VLDB Endow 7(12):1295–1306CrossRef
225.
Zurück zum Zitat van der Veen JS, van der Waaij B, Lazovik E, Wijbrandi W, Meijer RJ (2015) Dynamically scaling apache storm for the analysis of streaming data. In: Big data computing service and applications (BigDataService), 2015 IEEE first international conference on, pp 154–161 van der Veen JS, van der Waaij B, Lazovik E, Wijbrandi W, Meijer RJ (2015) Dynamically scaling apache storm for the analysis of streaming data. In: Big data computing service and applications (BigDataService), 2015 IEEE first international conference on, pp 154–161
226.
Zurück zum Zitat Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J et al (2014) Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, pp 147–156 Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J et al (2014) Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, pp 147–156
230.
Zurück zum Zitat Bockermann C (2014) A survey of the stream processing landscape. Lehrstuhl fork unstliche Intelligenz Technische Universit. at Dortmund Bockermann C (2014) A survey of the stream processing landscape. Lehrstuhl fork unstliche Intelligenz Technische Universit. at Dortmund
231.
Zurück zum Zitat Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: Data mining workshops (ICDMW), 2010 IEEE international conference on, pp 170–177 Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: Data mining workshops (ICDMW), 2010 IEEE international conference on, pp 170–177
232.
Zurück zum Zitat Zaharia M, Das T, Li H, Shenker S, Stoica I (2012) Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. HotCloud 12:10–10 Zaharia M, Das T, Li H, Shenker S, Stoica I (2012) Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. HotCloud 12:10–10
233.
Zurück zum Zitat Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 423–438 Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 423–438
237.
Zurück zum Zitat Chandy KM, Lamport L (1985) Distributed snapshots: determining global states of distributed systems. ACM Trans Comput Syst 3(1):63–75CrossRef Chandy KM, Lamport L (1985) Distributed snapshots: determining global states of distributed systems. ACM Trans Comput Syst 3(1):63–75CrossRef
239.
Zurück zum Zitat Alexandrov A, Bergmann R, Ewen S, Freytag J-C, Hueske F, Heise A, Kao O, Leich M, Leser U, Markl V (2014) The stratosphere platform for big data analytics. VLDB J 23(6):939–964CrossRef Alexandrov A, Bergmann R, Ewen S, Freytag J-C, Hueske F, Heise A, Kao O, Leich M, Leser U, Markl V (2014) The stratosphere platform for big data analytics. VLDB J 23(6):939–964CrossRef
243.
Zurück zum Zitat De Morales GF, Bifet A (2015) Samoa: scalable advanced massive online analysis. J Mach Learn Res 16(1):149–153 De Morales GF, Bifet A (2015) Samoa: scalable advanced massive online analysis. J Mach Learn Res 16(1):149–153
246.
Zurück zum Zitat Akidau T, Balikov A, Bekiroğlu K, Chernyak S, Haberman J, Lax R, McVeety S, Mills D, Nordstrom P, Whittle S (2013) Millwheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 6(11):1033–1044CrossRef Akidau T, Balikov A, Bekiroğlu K, Chernyak S, Haberman J, Lax R, McVeety S, Mills D, Nordstrom P, Whittle S (2013) Millwheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 6(11):1033–1044CrossRef
247.
Zurück zum Zitat Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S (2015) Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 239–250 Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S (2015) Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 239–250
248.
Zurück zum Zitat Abadi D, Carney D, Cetintemel U, Cherniack M, Convey C, Erwin C, Galvez E, Hatoun M, Maskey A, Rasin A et al (2003) Aurora: a data stream management system. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, pp 666–666 Abadi D, Carney D, Cetintemel U, Cherniack M, Convey C, Erwin C, Galvez E, Hatoun M, Maskey A, Rasin A et al (2003) Aurora: a data stream management system. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, pp 666–666
252.
Zurück zum Zitat Fu M, Agrawal A, Floratou A, Graham B, Jorgensen A, Li M, Lu N, Ramasamy K, Rao S, Wang C (2017) Twitter heron: towards extensible streaming engines. In: Data engineering (ICDE), 2017 IEEE 33rd international conference on, pp 1165–1172 Fu M, Agrawal A, Floratou A, Graham B, Jorgensen A, Li M, Lu N, Ramasamy K, Rao S, Wang C (2017) Twitter heron: towards extensible streaming engines. In: Data engineering (ICDE), 2017 IEEE 33rd international conference on, pp 1165–1172
258.
Zurück zum Zitat Shukla A, Chaturvedi S, Simmhan Y (2017) Riotbench: a real-time iot benchmark for distributed stream processing platforms. arXiv preprint arXiv:1701.08530 Shukla A, Chaturvedi S, Simmhan Y (2017) Riotbench: a real-time iot benchmark for distributed stream processing platforms. arXiv preprint arXiv:​1701.​08530
259.
Zurück zum Zitat Dreissig F, Pollner N (2017) A data center infrastructure monitoring platform based on storm and trident. Datenbanksysteme für Business, Technologie und Web (BTW 2017)-Workshopband Dreissig F, Pollner N (2017) A data center infrastructure monitoring platform based on storm and trident. Datenbanksysteme für Business, Technologie und Web (BTW 2017)-Workshopband
260.
Zurück zum Zitat Saha B, Shah H, Seth S, Vijayaraghavan G, Murthy A, Curino C (2015) Apache tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1357–1369 Saha B, Shah H, Seth S, Vijayaraghavan G, Murthy A, Curino C (2015) Apache tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp 1357–1369
265.
Zurück zum Zitat Sebastio S, Ghosh R, Mukherjee T (2018) An availability analysis approach for deployment configurations of containers. IEEE Trans Serv Comput Sebastio S, Ghosh R, Mukherjee T (2018) An availability analysis approach for deployment configurations of containers. IEEE Trans Serv Comput
266.
Zurück zum Zitat Medel V, Rana O, Bañares JÁ, Arronategui Unai (2016) Modelling performance and resource management in kubernetes. In: Utility and cloud computing (UCC), 2016 IEEE/ACM 9th international conference on, pp 257–262 Medel V, Rana O, Bañares JÁ, Arronategui Unai (2016) Modelling performance and resource management in kubernetes. In: Utility and cloud computing (UCC), 2016 IEEE/ACM 9th international conference on, pp 257–262
267.
Zurück zum Zitat Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph AD, Katz RH, Shenker S, Stoica I (2011) Mesos: a platform for fine-grained resource sharing in the data center. In: NSDI, vol 11, pp 295–308 Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph AD, Katz RH, Shenker S, Stoica I (2011) Mesos: a platform for fine-grained resource sharing in the data center. In: NSDI, vol 11, pp 295–308
269.
Zurück zum Zitat Kreps J, Narkhede N, Rao J et al (2011) Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, pp 1–7 Kreps J, Narkhede N, Rao J et al (2011) Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, pp 1–7
273.
Zurück zum Zitat Lampesberger H (2016) Technologies for web and cloud service interaction: a survey. Serv Oriented Comput Appl 10(2):71–110CrossRef Lampesberger H (2016) Technologies for web and cloud service interaction: a survey. Serv Oriented Comput Appl 10(2):71–110CrossRef
275.
Zurück zum Zitat Sangat P, Indrawan-Santiago M, Taniar D (2018) Sensor data management in the cloud: data storage, data ingestion, and data retrieval. Concurr Comput: Pract Exp 30(1) Sangat P, Indrawan-Santiago M, Taniar D (2018) Sensor data management in the cloud: data storage, data ingestion, and data retrieval. Concurr Comput: Pract Exp 30(1)
276.
Zurück zum Zitat Hoffman S (2013) Apache flume: distributed log collection for hadoop. Packt Publishing Ltd Hoffman S (2013) Apache flume: distributed log collection for hadoop. Packt Publishing Ltd
277.
Zurück zum Zitat Ting K, Cecho JJ (2013) Apache Sqoop Cookbook. O’Reilly Media, Inc Ting K, Cecho JJ (2013) Apache Sqoop Cookbook. O’Reilly Media, Inc
278.
Zurück zum Zitat Rabkin A, Katz RH (2010) Chukwa: a system for reliable large-scale log collection. LISA 10:1–15 Rabkin A, Katz RH (2010) Chukwa: a system for reliable large-scale log collection. LISA 10:1–15
280.
Zurück zum Zitat Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arxiv preprint. arXiv preprint arXiv:1006.4990 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arxiv preprint. arXiv preprint arXiv:​1006.​4990
281.
Zurück zum Zitat Aver C (2011) Giraph: large-scale graph processing infrastructure on hadoop. In: Proceedings of the Hadoop summit. Santa Clara 11(3), 5–9 Aver C (2011) Giraph: large-scale graph processing infrastructure on hadoop. In: Proceedings of the Hadoop summit. Santa Clara 11(3), 5–9
282.
Zurück zum Zitat Gonzalez JE, Low Y, Haijie G, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. OSDI 12(1):2–2 Gonzalez JE, Low Y, Haijie G, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. OSDI 12(1):2–2
283.
Zurück zum Zitat Salihoglu S, Widom J (2013) Gps: a graph processing system. In: Proceedings of the 25th international conference on scientific and statistical database management 22, pp 1–12 Salihoglu S, Widom J (2013) Gps: a graph processing system. In: Proceedings of the 25th international conference on scientific and statistical database management 22, pp 1–12
284.
Zurück zum Zitat Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. OSDI 14:599–613 Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. OSDI 14:599–613
285.
Zurück zum Zitat Xin RS, Crankshaw D, Dave A, Gonzalez JE, Franklin MJ, Stoica I (2014) Graphx: unifying data-parallel and graph-parallel analytics. arXiv preprint arXiv:1402.2394 Xin RS, Crankshaw D, Dave A, Gonzalez JE, Franklin MJ, Stoica I (2014) Graphx: unifying data-parallel and graph-parallel analytics. arXiv preprint arXiv:​1402.​2394
287.
Zurück zum Zitat Junghanns M, Petermann A, Gómez K, Rahm E (2015) Gradoop: scalable graph data management and analytics with hadoop. arXiv preprint arXiv:1506.00548 Junghanns M, Petermann A, Gómez K, Rahm E (2015) Gradoop: scalable graph data management and analytics with hadoop. arXiv preprint arXiv:​1506.​00548
288.
Zurück zum Zitat Hunt P, Konar M, Junqueira FP, Reed B (2010) Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX annual technical conference 8(9) Hunt P, Konar M, Junqueira FP, Reed B (2010) Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX annual technical conference 8(9)
291.
Zurück zum Zitat Hu W, Qu Y (2008) Falcon-AO: a practical ontology matching system. Web Semant: Sci Serv Agents World Wide Web 6(3):237–239CrossRef Hu W, Qu Y (2008) Falcon-AO: a practical ontology matching system. Web Semant: Sci Serv Agents World Wide Web 6(3):237–239CrossRef
293.
Zurück zum Zitat Islam M, Huang AK, Battisha M, Chiang M, Srinivasan S, Peters C, Neumann A, Abdelnur A (2012) Oozie: towards a scalable workflow management system for hadoop. In: Proceedings of the 1st ACM SIGMOD workshop on scalable workflow execution engines and technologies 4:1–4:10 Islam M, Huang AK, Battisha M, Chiang M, Srinivasan S, Peters C, Neumann A, Abdelnur A (2012) Oozie: towards a scalable workflow management system for hadoop. In: Proceedings of the 1st ACM SIGMOD workshop on scalable workflow execution engines and technologies 4:1–4:10
Metadaten
Titel
The big data system, components, tools, and technologies: a survey
verfasst von
T. Ramalingeswara Rao
Pabitra Mitra
Ravindara Bhatt
A. Goswami
Publikationsdatum
18.09.2018
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2019
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-018-1248-0

Weitere Artikel der Ausgabe 3/2019

Knowledge and Information Systems 3/2019 Zur Ausgabe

Premium Partner