Use of Apache Flume in the Big Data Environment for Processing and Evaluation of the Data Quality of the Twitter Social Network

Tenesaca-Luna, Gladys-Alicia; Imba, Diego; Mora-Arciniegas, María-Belén; Segarra-Faggioni, Verónica; Ramírez-Coronel, Ramiro Leonardo

doi:10.1007/978-3-030-02828-2_23

Gladys-Alicia Tenesaca-Luna²⁰,
Diego Imba²⁰,
María-Belén Mora-Arciniegas²⁰,
Verónica Segarra-Faggioni²⁰ &
…
Ramiro Leonardo Ramírez-Coronel²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 884))

Included in the following conference series:

Conference on Information Technologies and Communication of Ecuador

550 Accesses
2 Citations

Abstract

The present work uses Hadoop as the core processing in the Big Data environment. There are several open sources tools from the Hadoop ecosystem that facilitate the processing of enormous volumes of data. In this paper, we have worked with Apache Flume and Apache Hive tools for the study case of the 2017 presidential elections in Ecuador. The analysis of data generated from Twitter social network focuses mainly in the first round of balloting of Ecuador’s 2017 presidential election. These generated data have been obtained, stored, processed and analyzed to comply with the characteristics of the information that is considered Big Data. The selected tools have been evaluated in their architecture, installation, and use. Finally, the data have been evaluated under certain quality criteria or dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://twitter.com/.
2.
https://hadoop.apache.org/.

References

Riffat M (2014) Big data—not a panacea. ISACA 3:19–21
Google Scholar
Dasoriya R (2017) A review of big data analytics over cloud. In: 2017 IEEE international conference on consumer electronics-Asia (ICCE-Asia), pp 1–6
Google Scholar
Tenesaca Luna GA, Chicaiza J, Mora Arciniegas MB, Torres JP, Segarra-Faggioni V, Vinan MS (2016) Contribution of big data in E-leaming. A methodology to process academic data from heterogeneous sources. In: 35th international conference of the chilean computer science society (SCCC), pp 1–12
Google Scholar
Mayer-Schönberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston
Google Scholar
Sehgal D, Agarwal AK (2016) Sentiment analysis of big data applications using Twitter data with the help of HADOOP framework. In: 2016 international conference system modeling & advancement in research trends (SMART), pp 251–255
Google Scholar
Sabar NR, Yi X, Song A (2018) A bi-objective hyper-heuristic support vector machines for big data cyber-security. IEEE Access 6:10421–10431
Article Google Scholar
Mazhar Rathore M, Ahmad A, Paul A, Hong W-H, Seo H (2017) Advanced computing model for geosocial media using big data analytics. Multimed Tools Appl 76(23):24767–24787
Article Google Scholar
Soche López S (2016) Metodología para el modelamiento de datos basado en big data, enfocados al consumo de tráfico (voz-datos) generado por los clientes. Universidad Militar Nueva Granada, Bogota, Colombia, p 17
Google Scholar
Sehgal D, Agarwal AK (2018) Real-time sentiment analysis of big data applications using Twitter data with hadoop framework. In: Soft computing: theories and applications, pp 765–772
Google Scholar
Vohra D (2016) Introduction. In: Practical hadoop ecosystem. Apress, Berkeley, pp 3–162
Chapter Google Scholar
Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big data for dummies. Wiley, New Jersey
Google Scholar
Bhardwaj A, Vanraj, Kumar A, Narayan Y, Kumar P (2016) Big data emerging technologies: a CaseStudy with analyzing Twitter data using Apache Hive. In: 2015 2nd international conference on recent advances in engineering & computational science, RAECS 2015, December 2016
Google Scholar
Batini C, Rula A, Scannapieco M, Viscusi G (2015) From data quality to big data quality. J Database Manag 26(1):60–82
Article Google Scholar
Taleb I, El Kassabi HT, Serhani MA, Dssouli R, Bouhaddioui C (2016) Big data quality: a quality dimensions evaluation. In: 2016 international IEEE conferences on ubiquitous intelligence & computing, advanced and trusted computing, scalable computing and communications, cloud and big data computing, internet of people, and smart world congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), pp 759–765
Google Scholar
Suma S, Mehmood R, Albugami N, Katib I, Albeshri A (2017) Enabling next generation logistics and planning for smarter societies. Procedia Comput Sci 109:1122–1127
Article Google Scholar
Tenesaca-Luna GA, Chicaiza J, Mora-Arciniegas M-B, Ureña-Torres J-P, Faggioni AS, Santiago M, Ludeña V (2016) Contribution of big data in e-learning. A methodology to process academic data from heterogeneous sources
Google Scholar
Logicalis (2015) Redes sociales como fuentes de datos: el caso de Twitter. https://blog.es.logicalis.com/analytics/redes-sociales-como-fuentes-de-datos-el-caso-de-tweeter. Accessed 24 Mar 2018
The Apache Software Foundation (2017) Apache Flume. https://flume.apache.org/. Accessed 16 Apr 2018
Thusoo A, Sen Sarma J, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive - a warehousing solution over a map-reduce framework. Sort 2:1626–1629
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Ciencias de la Computación y Electrónica, Universidad Técnica Particular de Loja, San Cayetano Alto y Marcelino Champagnat S/N, Loja, Ecuador
Gladys-Alicia Tenesaca-Luna, Diego Imba, María-Belén Mora-Arciniegas, Verónica Segarra-Faggioni & Ramiro Leonardo Ramírez-Coronel

Authors

Gladys-Alicia Tenesaca-Luna
View author publications
You can also search for this author in PubMed Google Scholar
Diego Imba
View author publications
You can also search for this author in PubMed Google Scholar
María-Belén Mora-Arciniegas
View author publications
You can also search for this author in PubMed Google Scholar
Verónica Segarra-Faggioni
View author publications
You can also search for this author in PubMed Google Scholar
Ramiro Leonardo Ramírez-Coronel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gladys-Alicia Tenesaca-Luna .

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, Noord-Brabant, The Netherlands
Miguel Botto-Tobar
Facultad de Ingeniería, Universidad Nacional de Chimborazo, Riobamba, Ecuador
Lida Barba-Maggi
Department of Software Engineering, Blekinge Tekniska Högskola, Karlskrona, Blekinge Län, Sweden
Javier González-Huerta
Facultad de Ingeniería, Universidad Nacional de Chimborazo, Riobamba, Ecuador
Patricio Villacrés-Cevallos
Escuela Superior Politécnica de Chimborazo, Riobamba, Ecuador
Omar S. Gómez
Facultad de Ingeniería, Universidad Nacional de Chimborazo, Riobamba, Ecuador
María I. Uvidia-Fassler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tenesaca-Luna, GA., Imba, D., Mora-Arciniegas, MB., Segarra-Faggioni, V., Ramírez-Coronel, R.L. (2019). Use of Apache Flume in the Big Data Environment for Processing and Evaluation of the Data Quality of the Twitter Social Network. In: Botto-Tobar, M., Barba-Maggi, L., González-Huerta, J., Villacrés-Cevallos, P., S. Gómez, O., Uvidia-Fassler, M. (eds) Information and Communication Technologies of Ecuador (TIC.EC). TICEC 2018. Advances in Intelligent Systems and Computing, vol 884. Springer, Cham. https://doi.org/10.1007/978-3-030-02828-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-02828-2_23
Published: 18 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02827-5
Online ISBN: 978-3-030-02828-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics