Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2019

01.12.2019 | Original Article

ParSoDA: high-level parallel programming for social data mining

verfasst von: Loris Belcastro, Fabrizio Marozzo, Domenico Talia, Paolo Trunfio

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Software systems for social data mining provide algorithms and tools for extracting useful knowledge from user-generated social media data. ParSoDA (Parallel Social Data Analytics) is a high-level library for developing parallel data mining applications based on the extraction of useful knowledge from large data set gathered from social media. The library aims at reducing the programming skills needed for implementing scalable social data analysis applications. To reach this goal, ParSoDA defines a general structure for a social data analysis application that includes a number of configurable steps and provides a predefined (but extensible) set of functions that can be used for each step. User applications based on the ParSoDA library can be run on both Apache Hadoop and Spark clusters. The paper describes the ParSoDA library and presents two social data analysis applications to assess its usability and scalability. Concerning usability, we compare the programming effort required for coding a social media application using versus not using the ParSoDA library. The comparison shows that ParSoDA leads to a drastic reduction (i.e., about 65%) of lines of code, since the programmer only has to implement the application logic without worrying about configuring the environment and related classes. About scalability, using a cluster with 300 cores and 1.2 TB of RAM, ParSoDA is able to reduce the execution time of such applications up to 85%, compared to a cluster with 25 cores and 100 GB of RAM.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Amer-Yahia S, Ibrahim N, Kengne CK, Ulliana F, Rousset MC (2014) SOCLE: towards a framework for data preparation in social applications. Ingénierie des Systèmes d’Information 19(3):49–72CrossRef Amer-Yahia S, Ibrahim N, Kengne CK, Ulliana F, Rousset MC (2014) SOCLE: towards a framework for data preparation in social applications. Ingénierie des Systèmes d’Information 19(3):49–72CrossRef
Zurück zum Zitat Belcastro L, Marozzo F, Talia D, Trunfio P (2017a) Appraising spark on large-scale social media analysis. In: Euro-Par workshops. Lecture notes in computer science. Santiago de Compostela, Spain, pp 483–495. ISBN:978-3-319-75178-8 Belcastro L, Marozzo F, Talia D, Trunfio P (2017a) Appraising spark on large-scale social media analysis. In: Euro-Par workshops. Lecture notes in computer science. Santiago de Compostela, Spain, pp 483–495. ISBN:978-3-319-75178-8
Zurück zum Zitat Belcastro L, Marozzo F, Talia D, Trunfio P (2017b) Big data analysis on clouds. In: Sakr S, Zomaya A (eds) Handbook of big data technologies. Springer, Berlin, pp 101–142. ISBN:978-3-319-49339-8 Belcastro L, Marozzo F, Talia D, Trunfio P (2017b) Big data analysis on clouds. In: Sakr S, Zomaya A (eds) Handbook of big data technologies. Springer, Berlin, pp 101–142. ISBN:978-3-319-49339-8
Zurück zum Zitat Belcastro L, Marozzo F, Talia D, Trunfio P (2017c) A parallel library for social media analytics. In: The 2017 international conference on high performance computing & simulation (HPCS 2017), Genoa, Italy Belcastro L, Marozzo F, Talia D, Trunfio P (2017c) A parallel library for social media analytics. In: The 2017 international conference on high performance computing & simulation (HPCS 2017), Genoa, Italy
Zurück zum Zitat Casalino G, Castiello C, Del Buono N, Mencar C (2018) A framework for intelligent twitter data analysis with nonnegative matrix factorization. Int J Web Inf Syst 14(3):334–356CrossRef Casalino G, Castiello C, Del Buono N, Mencar C (2018) A framework for intelligent twitter data analysis with nonnegative matrix factorization. Int J Web Inf Syst 14(3):334–356CrossRef
Zurück zum Zitat Cesario E, Iannazzo A R, Marozzo F, Morello F, Riotta G, Spada A, Talia D, Trunfio P (2016) Analyzing social media data to discover mobility patterns at EXPO 2015: methodology and results. In: The 2016 international conference on high performance computing and simulation (HPCS 2016), Innsbruck, Austria Cesario E, Iannazzo A R, Marozzo F, Morello F, Riotta G, Spada A, Talia D, Trunfio P (2016) Analyzing social media data to discover mobility patterns at EXPO 2015: methodology and results. In: The 2016 international conference on high performance computing and simulation (HPCS 2016), Innsbruck, Austria
Zurück zum Zitat Chodorow K (2013) MongoDB: the definitive guide. O’Reilly Media, Inc., Newton Chodorow K (2013) MongoDB: the definitive guide. O’Reilly Media, Inc., Newton
Zurück zum Zitat Chu C, Kim SK, Lin YA, Yu Y, Bradski G, Ng AY, Olukotun K (2007) Map-reduce for machine learning on multicore. Adv Neural Inf Process. Syst. 19:281 Chu C, Kim SK, Lin YA, Yu Y, Bradski G, Ng AY, Olukotun K (2007) Map-reduce for machine learning on multicore. Adv Neural Inf Process. Syst. 19:281
Zurück zum Zitat Cuesta Á, Barrero DF, R-Moreno MD (2014) A framework for massive Twitter data extraction and analysis. Malays J Comput Sci 27:50–67 Cuesta Á, Barrero DF, R-Moreno MD (2014) A framework for massive Twitter data extraction and analysis. Malays J Comput Sci 27:50–67
Zurück zum Zitat Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation, OSDI’04, Berkeley, USA, p 10 Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation, OSDI’04, Berkeley, USA, p 10
Zurück zum Zitat ECMA (2009) ECMA-262: ECMAscript language specification, 5th edn. ECMA (European Association for Standardizing Information and Communication Systems), Geneva ECMA (2009) ECMA-262: ECMAscript language specification, 5th edn. ECMA (European Association for Standardizing Information and Communication Systems), Geneva
Zurück zum Zitat Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87MathSciNetCrossRef Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87MathSciNetCrossRef
Zurück zum Zitat Hussain A, Vatrapu R (2014) Social data analytics tool (SODATO). Springer International Publishing, Cham, pp 368–372 Hussain A, Vatrapu R (2014) Social data analytics tool (SODATO). Springer International Publishing, Cham, pp 368–372
Zurück zum Zitat Li H, Wang Y, Zhang D, Zhang M, Chang EY (2008) PFP: parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM conference on recommender systems, New York, NY, USA, pp 107–114 Li H, Wang Y, Zhang D, Zhang M, Chang EY (2008) PFP: parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM conference on recommender systems, New York, NY, USA, pp 107–114
Zurück zum Zitat Marozzo F, Bessi A (2018) Analyzing polarization of social media users and news sites during political campaigns. Soc Netw Anal Min 8(1):1CrossRef Marozzo F, Bessi A (2018) Analyzing polarization of social media users and news sites during political campaigns. Soc Netw Anal Min 8(1):1CrossRef
Zurück zum Zitat Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(12):1–135CrossRef Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(12):1–135CrossRef
Zurück zum Zitat Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the prefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the prefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef
Zurück zum Zitat Talia D, Trunfio P, Marozzo F (2015) Data analysis in the cloud. Elsevier, Amsterdam Talia D, Trunfio P, Marozzo F (2015) Data analysis in the cloud. Elsevier, Amsterdam
Zurück zum Zitat White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., Newton White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., Newton
Zurück zum Zitat You L, Motta G, Sacco D, Ma T (2014) Social data analysis framework in cloud and mobility analyzer for smarter cities. In: Proceedings of 2014 IEEE international conference on service operations and logistics, and informatics, Qingdao, China, pp 96–101 You L, Motta G, Sacco D, Ma T (2014) Social data analysis framework in cloud and mobility analyzer for smarter cities. In: Proceedings of 2014 IEEE international conference on service operations and logistics, and informatics, Qingdao, China, pp 96–101
Zurück zum Zitat Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef
Zurück zum Zitat Zhou D, Chen L, He Y (2015) An unsupervised framework of exploring events on twitter: filtering, extraction and categorization. In: Proceedings of the 29th AAAI conference on artificial intelligence, Austin, Texas, USA, pp 2468–2475 Zhou D, Chen L, He Y (2015) An unsupervised framework of exploring events on twitter: filtering, extraction and categorization. In: Proceedings of the 29th AAAI conference on artificial intelligence, Austin, Texas, USA, pp 2468–2475
Metadaten
Titel
ParSoDA: high-level parallel programming for social data mining
verfasst von
Loris Belcastro
Fabrizio Marozzo
Domenico Talia
Paolo Trunfio
Publikationsdatum
01.12.2019
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2019
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-018-0547-5

Weitere Artikel der Ausgabe 1/2019

Social Network Analysis and Mining 1/2019 Zur Ausgabe