Skip to main content

Use of Apache Flume in the Big Data Environment for Processing and Evaluation of the Data Quality of the Twitter Social Network

  • Conference paper
  • First Online:
Information and Communication Technologies of Ecuador (TIC.EC) (TICEC 2018)

Abstract

The present work uses Hadoop as the core processing in the Big Data environment. There are several open sources tools from the Hadoop ecosystem that facilitate the processing of enormous volumes of data. In this paper, we have worked with Apache Flume and Apache Hive tools for the study case of the 2017 presidential elections in Ecuador. The analysis of data generated from Twitter social network focuses mainly in the first round of balloting of Ecuador’s 2017 presidential election. These generated data have been obtained, stored, processed and analyzed to comply with the characteristics of the information that is considered Big Data. The selected tools have been evaluated in their architecture, installation, and use. Finally, the data have been evaluated under certain quality criteria or dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://twitter.com/.

  2. 2.

    https://hadoop.apache.org/.

References

  1. Riffat M (2014) Big data—not a panacea. ISACA 3:19–21

    Google Scholar 

  2. Dasoriya R (2017) A review of big data analytics over cloud. In: 2017 IEEE international conference on consumer electronics-Asia (ICCE-Asia), pp 1–6

    Google Scholar 

  3. Tenesaca Luna GA, Chicaiza J, Mora Arciniegas MB, Torres JP, Segarra-Faggioni V, Vinan MS (2016) Contribution of big data in E-leaming. A methodology to process academic data from heterogeneous sources. In: 35th international conference of the chilean computer science society (SCCC), pp 1–12

    Google Scholar 

  4. Mayer-Schönberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston

    Google Scholar 

  5. Sehgal D, Agarwal AK (2016) Sentiment analysis of big data applications using Twitter data with the help of HADOOP framework. In: 2016 international conference system modeling & advancement in research trends (SMART), pp 251–255

    Google Scholar 

  6. Sabar NR, Yi X, Song A (2018) A bi-objective hyper-heuristic support vector machines for big data cyber-security. IEEE Access 6:10421–10431

    Article  Google Scholar 

  7. Mazhar Rathore M, Ahmad A, Paul A, Hong W-H, Seo H (2017) Advanced computing model for geosocial media using big data analytics. Multimed Tools Appl 76(23):24767–24787

    Article  Google Scholar 

  8. Soche López S (2016) Metodología para el modelamiento de datos basado en big data, enfocados al consumo de tráfico (voz-datos) generado por los clientes. Universidad Militar Nueva Granada, Bogota, Colombia, p 17

    Google Scholar 

  9. Sehgal D, Agarwal AK (2018) Real-time sentiment analysis of big data applications using Twitter data with hadoop framework. In: Soft computing: theories and applications, pp 765–772

    Google Scholar 

  10. Vohra D (2016) Introduction. In: Practical hadoop ecosystem. Apress, Berkeley, pp 3–162

    Chapter  Google Scholar 

  11. Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big data for dummies. Wiley, New Jersey

    Google Scholar 

  12. Bhardwaj A, Vanraj, Kumar A, Narayan Y, Kumar P (2016) Big data emerging technologies: a CaseStudy with analyzing Twitter data using Apache Hive. In: 2015 2nd international conference on recent advances in engineering & computational science, RAECS 2015, December 2016

    Google Scholar 

  13. Batini C, Rula A, Scannapieco M, Viscusi G (2015) From data quality to big data quality. J Database Manag 26(1):60–82

    Article  Google Scholar 

  14. Taleb I, El Kassabi HT, Serhani MA, Dssouli R, Bouhaddioui C (2016) Big data quality: a quality dimensions evaluation. In: 2016 international IEEE conferences on ubiquitous intelligence & computing, advanced and trusted computing, scalable computing and communications, cloud and big data computing, internet of people, and smart world congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), pp 759–765

    Google Scholar 

  15. Suma S, Mehmood R, Albugami N, Katib I, Albeshri A (2017) Enabling next generation logistics and planning for smarter societies. Procedia Comput Sci 109:1122–1127

    Article  Google Scholar 

  16. Tenesaca-Luna GA, Chicaiza J, Mora-Arciniegas M-B, Ureña-Torres J-P, Faggioni AS, Santiago M, Ludeña V (2016) Contribution of big data in e-learning. A methodology to process academic data from heterogeneous sources

    Google Scholar 

  17. Logicalis (2015) Redes sociales como fuentes de datos: el caso de Twitter. https://blog.es.logicalis.com/analytics/redes-sociales-como-fuentes-de-datos-el-caso-de-tweeter. Accessed 24 Mar 2018

  18. The Apache Software Foundation (2017) Apache Flume. https://flume.apache.org/. Accessed 16 Apr 2018

  19. Thusoo A, Sen Sarma J, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive - a warehousing solution over a map-reduce framework. Sort 2:1626–1629

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gladys-Alicia Tenesaca-Luna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tenesaca-Luna, GA., Imba, D., Mora-Arciniegas, MB., Segarra-Faggioni, V., Ramírez-Coronel, R.L. (2019). Use of Apache Flume in the Big Data Environment for Processing and Evaluation of the Data Quality of the Twitter Social Network. In: Botto-Tobar, M., Barba-Maggi, L., González-Huerta, J., Villacrés-Cevallos, P., S. Gómez, O., Uvidia-Fassler, M. (eds) Information and Communication Technologies of Ecuador (TIC.EC). TICEC 2018. Advances in Intelligent Systems and Computing, vol 884. Springer, Cham. https://doi.org/10.1007/978-3-030-02828-2_23

Download citation

Publish with us

Policies and ethics