Skip to main content

2018 | OriginalPaper | Buchkapitel

Appraising SPARK on Large-Scale Social Media Analysis

verfasst von : Loris Belcastro, Fabrizio Marozzo, Domenico Talia, Paolo Trunfio

Erschienen in: Euro-Par 2017: Parallel Processing Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Software systems for social media analysis provide algorithms and tools for extracting useful knowledge from user-generated social media data. ParSoDA (Parallel Social Data Analytics) is a Java library for developing parallel data analysis applications based on the extraction of useful knowledge from social media data. This library aims at reducing the programming skills necessary to implement scalable social data analysis applications. This work describes how the ParSoDA library has been extended to execute applications on Apache Spark. Using a cluster of 12 workers, the Spark version of the library reduces the execution time of two case study applications exploiting social media data up to 42%, compared to the Hadoop version of the library.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Amer-Yahia, S., Ibrahim, N., Kengne, C.K., Ulliana, F., Rousset, M.C.: Socle: towards a framework for data preparation in social applications. Ingénierie des Systèmes d’Information 19(3), 49–72 (2014)CrossRef Amer-Yahia, S., Ibrahim, N., Kengne, C.K., Ulliana, F., Rousset, M.C.: Socle: towards a framework for data preparation in social applications. Ingénierie des Systèmes d’Information 19(3), 49–72 (2014)CrossRef
2.
Zurück zum Zitat Anstead, N., O’Loughlin, B.: Social media analysis and public opinion: the 2010 UK general election. J. Comput.-Mediated Commun. 20(2), 204–220 (2015)CrossRef Anstead, N., O’Loughlin, B.: Social media analysis and public opinion: the 2010 UK general election. J. Comput.-Mediated Commun. 20(2), 204–220 (2015)CrossRef
4.
Zurück zum Zitat Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: A parallel library for social media analytics. In: The 2017 International Conference on High Performance Computing and Simulation (HPCS 2017), Genoa, Italy, 17–21 July 2017 Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: A parallel library for social media analytics. In: The 2017 International Conference on High Performance Computing and Simulation (HPCS 2017), Genoa, Italy, 17–21 July 2017
5.
Zurück zum Zitat Cesario, E., Congedo, C., Marozzo, F., Riotta, G., Spada, A., Talia, D., Trunfio, P., Turri, C.: Following soccer fans from geotagged tweets at FIFA world Cup 2014. In: Proceedings of the 2nd IEEE Conference on Spatial Data Mining and Geographical Knowledge Services, Fuzhou, China, pp. 33–38, July 2015. ISBN 978-1- 4799-7748-2 Cesario, E., Congedo, C., Marozzo, F., Riotta, G., Spada, A., Talia, D., Trunfio, P., Turri, C.: Following soccer fans from geotagged tweets at FIFA world Cup 2014. In: Proceedings of the 2nd IEEE Conference on Spatial Data Mining and Geographical Knowledge Services, Fuzhou, China, pp. 33–38, July 2015. ISBN 978-1- 4799-7748-2
6.
Zurück zum Zitat Cesario, E., Iannazzo, A.R., Marozzo, F., Morello, F., Riotta, G., Spada, A., Talia, D., Trunfio, P.: Analyzing social media data to discover mobility patterns at expo 2015: methodology and results. In: The 2016 International Conference on High Performance Computing and Simulation (HPCS 2016), Innsbruck, Austria, 18–22 July 2016 Cesario, E., Iannazzo, A.R., Marozzo, F., Morello, F., Riotta, G., Spada, A., Talia, D., Trunfio, P.: Analyzing social media data to discover mobility patterns at expo 2015: methodology and results. In: The 2016 International Conference on High Performance Computing and Simulation (HPCS 2016), Innsbruck, Austria, 18–22 July 2016
7.
Zurück zum Zitat Chodorow, K.: MongoDB: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2013) Chodorow, K.: MongoDB: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2013)
8.
Zurück zum Zitat Chu, C., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. Adv. Neural Inf. Process. Syst. 19, 281 (2007) Chu, C., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. Adv. Neural Inf. Process. Syst. 19, 281 (2007)
9.
Zurück zum Zitat Cuesta, Á., Barrero, D.F., R-Moreno, M.D.: A framework for massive Twitter data extraction and analysis. Malays. J. Comput. Sci. 27, 1 (2014) Cuesta, Á., Barrero, D.F., R-Moreno, M.D.: A framework for massive Twitter data extraction and analysis. Malays. J. Comput. Sci. 27, 1 (2014)
10.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, OSDI 2004, Berkeley, USA, p. 10 (2004) Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, OSDI 2004, Berkeley, USA, p. 10 (2004)
11.
Zurück zum Zitat Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl. Discov. 8(1), 53–87 (2004)MathSciNetCrossRef Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl. Discov. 8(1), 53–87 (2004)MathSciNetCrossRef
13.
Zurück zum Zitat Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: Parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, New York, NY, USA, pp. 107–114 (2008) Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: Parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, New York, NY, USA, pp. 107–114 (2008)
14.
Zurück zum Zitat Miliaraki, I., Berberich, K., Gemulla, R., Zoupanos, S.: Mind the gap: large-scale frequent sequence mining. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 797–808 (2013) Miliaraki, I., Berberich, K., Gemulla, R., Zoupanos, S.: Mind the gap: large-scale frequent sequence mining. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 797–808 (2013)
15.
Zurück zum Zitat Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(12), 1–135 (2008)CrossRef Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(12), 1–135 (2008)CrossRef
16.
Zurück zum Zitat Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)CrossRef Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)CrossRef
17.
Zurück zum Zitat Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
18.
Zurück zum Zitat Talia, D., Trunfio, P., Marozzo, F.: Data Analysis in the Cloud. Elsevier, Amsterdam, October 2015 Talia, D., Trunfio, P., Marozzo, F.: Data Analysis in the Cloud. Elsevier, Amsterdam, October 2015
19.
Zurück zum Zitat White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009) White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)
20.
Zurück zum Zitat Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: Proceedings of the 2013 ACM SIGMOD Conference on Management of Data, pp. 13–24. ACM (2013) Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: Proceedings of the 2013 ACM SIGMOD Conference on Management of Data, pp. 13–24. ACM (2013)
21.
Zurück zum Zitat You, L., Motta, G., Sacco, D., Ma, T.: Social data analysis framework in cloud and mobility analyzer for smarter cities. In: IEEE International Conference on Service Operations and Logistics, and Informatics, pp. 96–101, October 2014 You, L., Motta, G., Sacco, D., Ma, T.: Social data analysis framework in cloud and mobility analyzer for smarter cities. In: IEEE International Conference on Service Operations and Logistics, and Informatics, pp. 96–101, October 2014
22.
Zurück zum Zitat Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef
Metadaten
Titel
Appraising SPARK on Large-Scale Social Media Analysis
verfasst von
Loris Belcastro
Fabrizio Marozzo
Domenico Talia
Paolo Trunfio
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-75178-8_39

Neuer Inhalt