Abstract
The present work uses Hadoop as the core processing in the Big Data environment. There are several open sources tools from the Hadoop ecosystem that facilitate the processing of enormous volumes of data. In this paper, we have worked with Apache Flume and Apache Hive tools for the study case of the 2017 presidential elections in Ecuador. The analysis of data generated from Twitter social network focuses mainly in the first round of balloting of Ecuador’s 2017 presidential election. These generated data have been obtained, stored, processed and analyzed to comply with the characteristics of the information that is considered Big Data. The selected tools have been evaluated in their architecture, installation, and use. Finally, the data have been evaluated under certain quality criteria or dimensions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Riffat M (2014) Big data—not a panacea. ISACA 3:19–21
Dasoriya R (2017) A review of big data analytics over cloud. In: 2017 IEEE international conference on consumer electronics-Asia (ICCE-Asia), pp 1–6
Tenesaca Luna GA, Chicaiza J, Mora Arciniegas MB, Torres JP, Segarra-Faggioni V, Vinan MS (2016) Contribution of big data in E-leaming. A methodology to process academic data from heterogeneous sources. In: 35th international conference of the chilean computer science society (SCCC), pp 1–12
Mayer-Schönberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston
Sehgal D, Agarwal AK (2016) Sentiment analysis of big data applications using Twitter data with the help of HADOOP framework. In: 2016 international conference system modeling & advancement in research trends (SMART), pp 251–255
Sabar NR, Yi X, Song A (2018) A bi-objective hyper-heuristic support vector machines for big data cyber-security. IEEE Access 6:10421–10431
Mazhar Rathore M, Ahmad A, Paul A, Hong W-H, Seo H (2017) Advanced computing model for geosocial media using big data analytics. Multimed Tools Appl 76(23):24767–24787
Soche López S (2016) Metodología para el modelamiento de datos basado en big data, enfocados al consumo de tráfico (voz-datos) generado por los clientes. Universidad Militar Nueva Granada, Bogota, Colombia, p 17
Sehgal D, Agarwal AK (2018) Real-time sentiment analysis of big data applications using Twitter data with hadoop framework. In: Soft computing: theories and applications, pp 765–772
Vohra D (2016) Introduction. In: Practical hadoop ecosystem. Apress, Berkeley, pp 3–162
Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big data for dummies. Wiley, New Jersey
Bhardwaj A, Vanraj, Kumar A, Narayan Y, Kumar P (2016) Big data emerging technologies: a CaseStudy with analyzing Twitter data using Apache Hive. In: 2015 2nd international conference on recent advances in engineering & computational science, RAECS 2015, December 2016
Batini C, Rula A, Scannapieco M, Viscusi G (2015) From data quality to big data quality. J Database Manag 26(1):60–82
Taleb I, El Kassabi HT, Serhani MA, Dssouli R, Bouhaddioui C (2016) Big data quality: a quality dimensions evaluation. In: 2016 international IEEE conferences on ubiquitous intelligence & computing, advanced and trusted computing, scalable computing and communications, cloud and big data computing, internet of people, and smart world congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), pp 759–765
Suma S, Mehmood R, Albugami N, Katib I, Albeshri A (2017) Enabling next generation logistics and planning for smarter societies. Procedia Comput Sci 109:1122–1127
Tenesaca-Luna GA, Chicaiza J, Mora-Arciniegas M-B, Ureña-Torres J-P, Faggioni AS, Santiago M, Ludeña V (2016) Contribution of big data in e-learning. A methodology to process academic data from heterogeneous sources
Logicalis (2015) Redes sociales como fuentes de datos: el caso de Twitter. https://blog.es.logicalis.com/analytics/redes-sociales-como-fuentes-de-datos-el-caso-de-tweeter. Accessed 24 Mar 2018
The Apache Software Foundation (2017) Apache Flume. https://flume.apache.org/. Accessed 16 Apr 2018
Thusoo A, Sen Sarma J, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive - a warehousing solution over a map-reduce framework. Sort 2:1626–1629
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tenesaca-Luna, GA., Imba, D., Mora-Arciniegas, MB., Segarra-Faggioni, V., Ramírez-Coronel, R.L. (2019). Use of Apache Flume in the Big Data Environment for Processing and Evaluation of the Data Quality of the Twitter Social Network. In: Botto-Tobar, M., Barba-Maggi, L., González-Huerta, J., Villacrés-Cevallos, P., S. Gómez, O., Uvidia-Fassler, M. (eds) Information and Communication Technologies of Ecuador (TIC.EC). TICEC 2018. Advances in Intelligent Systems and Computing, vol 884. Springer, Cham. https://doi.org/10.1007/978-3-030-02828-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-02828-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02827-5
Online ISBN: 978-3-030-02828-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)