Skip to main content
Erschienen in: Distributed and Parallel Databases 1/2018

20.10.2017

Web-scale provenance reconstruction of implicit information diffusion on social media

verfasst von: Io Taxidou, Sven Lieber, Peter M. Fischer, Tom De Nies, Ruben Verborgh

Erschienen in: Distributed and Parallel Databases | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Fast, massive, and viral data diffused on social media affects a large share of the online population, and thus, the (prospective) information diffusion mechanisms behind it are of great interest to researchers. The (retrospective) provenance of such data is equally important because it contributes to the understanding of the relevance and trustworthiness of the information. Furthermore, computing provenance in a timely way is crucial for particular use cases and practitioners, such as online journalists that promptly need to assess specific pieces of information. Social media currently provide insufficient mechanisms for provenance tracking, publication and generation, while state-of-the-art on social media research focuses mainly on explicit diffusion mechanisms (like retweets in Twitter or reshares in Facebook).The implicit diffusion mechanisms remain understudied due to the difficulties of being captured and properly understood. From a technical side, the state of the art for provenance reconstruction evaluates small datasets after the fact, sidestepping requirements for scale and speed of current social media data. In this paper, we investigate the mechanisms of implicit information diffusion by computing its fine-grained provenance. We prove that explicit mechanisms are insufficient to capture influence and our analysis unravels a significant part of implicit interactions and influence in social media. Our approach works incrementally and can be scaled up to cover a truly Web-scale scenario like major events. We can process datasets consisting of up to several millions of messages on a single machine at rates that cover bursty behaviour, without compromising result quality. By doing that, we provide to online journalists and social media users in general, fine grained provenance reconstruction which sheds lights on implicit interactions not captured by social media providers. These results are provided in an online fashion which also allows for fast relevance and trustworthiness assessment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aierken, A., Davis, D.B., Zhang, Q., Gupta, K., Wong, A., Asuncion, H.U.: A multi-level funneling approach to data provenance reconstruction. In: IEEE 10th International Conference on e-Science, vol 2, pp. 71–74, IEEE (2014) Aierken, A., Davis, D.B., Zhang, Q., Gupta, K., Wong, A., Asuncion, H.U.: A multi-level funneling approach to data provenance reconstruction. In: IEEE 10th International Conference on e-Science, vol 2, pp. 71–74, IEEE (2014)
2.
Zurück zum Zitat Al Hasan, M., Salem, S., Zaki, M.J.: Simclus: an effective algorithm for clustering with a lower bound on similarity. Knowl. Inf. Syst. 28(3), 665–685 (2011)CrossRef Al Hasan, M., Salem, S., Zaki, M.J.: Simclus: an effective algorithm for clustering with a lower bound on similarity. Knowl. Inf. Syst. 28(3), 665–685 (2011)CrossRef
3.
Zurück zum Zitat Azzopardi, J., Staff, C.: Incremental clustering of news reports. Algorithms 5(3), 364–378 (2012)CrossRef Azzopardi, J., Staff, C.: Incremental clustering of news reports. Algorithms 5(3), 364–378 (2012)CrossRef
4.
Zurück zum Zitat Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining, pp. 65–74 (2011) Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining, pp. 65–74 (2011)
5.
Zurück zum Zitat Baños, R.A., Borge-Holthoefer, J., Moreno, Y.: The role of hidden influentials in the diffusion of online information cascades. EPJ Data Sci. 2(1), 1–16 (2013)CrossRef Baños, R.A., Borge-Holthoefer, J., Moreno, Y.: The role of hidden influentials in the diffusion of online information cascades. EPJ Data Sci. 2(1), 1–16 (2013)CrossRef
6.
Zurück zum Zitat Barbier, G., Feng, Z., Gundecha, P., Liu, H.: Provenance data in social media. Synth. Lect. Data Min. Knowl. Discov. 4(1), 1–84 (2013)CrossRef Barbier, G., Feng, Z., Gundecha, P., Liu, H.: Provenance data in social media. Synth. Lect. Data Min. Knowl. Discov. 4(1), 1–84 (2013)CrossRef
7.
Zurück zum Zitat Barbosa, S., Cesar, R.M. Jr., Cosley, D.: Using text similarity to detect social interactions not captured by formal reply mechanisms. In: 2015 IEEE 11th International Conference on e-Science (e-Science), pp. 36–46. IEEE (2015) Barbosa, S., Cesar, R.M. Jr., Cosley, D.: Using text similarity to detect social interactions not captured by formal reply mechanisms. In: 2015 IEEE 11th International Conference on e-Science (e-Science), pp. 36–46. IEEE (2015)
8.
Zurück zum Zitat Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006) Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006)
9.
Zurück zum Zitat Cha, M., Haddadi, H., Benevenuto, F., Gummadi, P.K.: Measuring user influence in Twitter: the million follower fallacy. ICWSM 10(10–17), 30 (2010) Cha, M., Haddadi, H., Benevenuto, F., Gummadi, P.K.: Measuring user influence in Twitter: the million follower fallacy. ICWSM 10(10–17), 30 (2010)
10.
Zurück zum Zitat Cheney, J., Chiticariu, L., Tan, W.C., et al.: Provenance in databases: why, how, and where. Found. Trends® Databases 1(4), 379–474 (2009)CrossRef Cheney, J., Chiticariu, L., Tan, W.C., et al.: Provenance in databases: why, how, and where. Found. Trends® Databases 1(4), 379–474 (2009)CrossRef
11.
Zurück zum Zitat Comarela, G., Crovella, M., Almeida, V., Benevenuto, F.: Understanding factors that affect response rates in Twitter. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, pp. 123–132 (2012) Comarela, G., Crovella, M., Almeida, V., Benevenuto, F.: Understanding factors that affect response rates in Twitter. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, pp. 123–132 (2012)
12.
Zurück zum Zitat Davidson, S.B., Boulakia, S.C., Eyal, A., Ludäscher, B., McPhillips, T.M., Bowers, S., Anand, M.K., Freire, J.: Provenance in scientific workflow systems. IEEE Data Eng. Bull. 30(4), 44–50 (2007) Davidson, S.B., Boulakia, S.C., Eyal, A., Ludäscher, B., McPhillips, T.M., Bowers, S., Anand, M.K., Freire, J.: Provenance in scientific workflow systems. IEEE Data Eng. Bull. 30(4), 44–50 (2007)
13.
Zurück zum Zitat De Nies, T., Coppens, S., Van Deursen, D., Mannens, E., Van de Walle, R.: Automatic discovery of high-level provenance using semantic similarity. In: IPAW (2012) De Nies, T., Coppens, S., Van Deursen, D., Mannens, E., Van de Walle, R.: Automatic discovery of high-level provenance using semantic similarity. In: IPAW (2012)
14.
Zurück zum Zitat De Nies, T., Taxidou, I., Dimou, A., Verborgh, R., Fischer, P.M., Mannens, E., Van de Walle, R.: Towards multi-level provenance reconstruction of information diffusion on social media. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1823–1826 (2015) De Nies, T., Taxidou, I., Dimou, A., Verborgh, R., Fischer, P.M., Mannens, E., Van de Walle, R.: Towards multi-level provenance reconstruction of information diffusion on social media. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1823–1826 (2015)
15.
Zurück zum Zitat De Nies, T., Mannens, E., Van de Walle, R.: Reconstructing human-generated provenance through similarity-based clustering. In: International Provenance and Annotation Workshop, Springer, pp. 191–194 (2016) De Nies, T., Mannens, E., Van de Walle, R.: Reconstructing human-generated provenance through similarity-based clustering. In: International Provenance and Annotation Workshop, Springer, pp. 191–194 (2016)
16.
Zurück zum Zitat Feng, Z., Gundecha, P., Liu, H.: Recovering information recipients in social media via provenance. In: ASONAM, pp. 706–711 (2013) Feng, Z., Gundecha, P., Liu, H.: Recovering information recipients in social media via provenance. In: ASONAM, pp. 706–711 (2013)
17.
Zurück zum Zitat Glavic, B., Sheykh Esmaili, K., Fischer, P.M., Tatbul, N.: Ariadne: Managing fine-grained provenance on data streams. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, pp. 39–50 (2013) Glavic, B., Sheykh Esmaili, K., Fischer, P.M., Tatbul, N.: Ariadne: Managing fine-grained provenance on data streams. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, pp. 39–50 (2013)
18.
Zurück zum Zitat Gundecha, P., Liu, H.: Mining social media: a brief introduction. In: New Directions in Informatics, Optimization, Logistics, and Production, Informs, pp. 1–17 (2012) Gundecha, P., Liu, H.: Mining social media: a brief introduction. In: New Directions in Informatics, Optimization, Logistics, and Production, Informs, pp. 1–17 (2012)
19.
Zurück zum Zitat Gundecha, P., Ranganath, S., Feng, Z., Liu, H.: A tool for collecting provenance data in social media. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1462–1465. ACM (2013b) Gundecha, P., Ranganath, S., Feng, Z., Liu, H.: A tool for collecting provenance data in social media. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1462–1465. ACM (2013b)
20.
Zurück zum Zitat Jaho, E., Tzoannos, E., Papadopoulos, A., Sarris, N.: Alethiometer: a framework for assessing trustworthiness and content validity in social media. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 749–752. ACM (2014) Jaho, E., Tzoannos, E., Papadopoulos, A., Sarris, N.: Alethiometer: a framework for assessing trustworthiness and content validity in social media. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 749–752. ACM (2014)
21.
Zurück zum Zitat Khy, S., Ishikawa, Y., Kitagawa, H.: A novelty-based clustering method for on-line documents. World Wide Web 11(1), 1–37 (2008)CrossRef Khy, S., Ishikawa, Y., Kitagawa, H.: A novelty-based clustering method for on-line documents. World Wide Web 11(1), 1–37 (2008)CrossRef
22.
Zurück zum Zitat Kovács, F., Legány, C., Babos, A.: Cluster validity measurement techniques. In: 6th International Symposium of Hungarian Researchers on Computational Intelligence, Citeseer (2005) Kovács, F., Legány, C., Babos, A.: Cluster validity measurement techniques. In: 6th International Symposium of Hungarian Researchers on Computational Intelligence, Citeseer (2005)
23.
Zurück zum Zitat Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRef Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRef
24.
Zurück zum Zitat Kwon, S., Cha, M., Jung, K., Chen, W., Wang, Y.: Prominent features of rumor propagation in online social media. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 1103–1108. IEEE (2013) Kwon, S., Cha, M., Jung, K., Chen, W., Wang, Y.: Prominent features of rumor propagation in online social media. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 1103–1108. IEEE (2013)
25.
Zurück zum Zitat Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 497–506 (2009) Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 497–506 (2009)
26.
Zurück zum Zitat Magliacane, S.: Reconstructing provenance. In: Proceedings of the 11th International Conference on The Semantic Web-Volume Part II, pp. 399–406. Springer, New York (2012) Magliacane, S.: Reconstructing provenance. In: Proceedings of the 11th International Conference on The Semantic Web-Volume Part II, pp. 399–406. Springer, New York (2012)
27.
Zurück zum Zitat Metaxas, P.T., Mustafaraj, E.: Social media and the elections. Science 338(6106), 472–473 (2012)CrossRef Metaxas, P.T., Mustafaraj, E.: Social media and the elections. Science 338(6106), 472–473 (2012)CrossRef
28.
Zurück zum Zitat Moreau, L.: The foundations for provenance on the web. Found. Trends Web Sci. 2(2–3), 99–241 (2010)CrossRef Moreau, L.: The foundations for provenance on the web. Found. Trends Web Sci. 2(2–3), 99–241 (2010)CrossRef
29.
Zurück zum Zitat Moreau, L., Missier, P.: (Eds) W3C Provenance Working Group (2013) PROV-DM: The PROV Data Model. W3C Moreau, L., Missier, P.: (Eds) W3C Provenance Working Group (2013) PROV-DM: The PROV Data Model. W3C
30.
Zurück zum Zitat Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefMATH Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefMATH
31.
Zurück zum Zitat Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide web, pp. 851–860. ACM (2010) Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide web, pp. 851–860. ACM (2010)
32.
Zurück zum Zitat Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)MATH Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)MATH
33.
Zurück zum Zitat Sharma, S.: Applied Multivariate Techniques. Wiley, New York (1995) Sharma, S.: Applied Multivariate Techniques. Wiley, New York (1995)
34.
Zurück zum Zitat Simmons, M.P., Adamic, L.A., Adar, E.: Memes online: Extracted, subtracted, injected, and recollected. In: Fifth International AAAI Conference on Weblogs and Social Media (2011) Simmons, M.P., Adamic, L.A., Adar, E.: Memes online: Extracted, subtracted, injected, and recollected. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)
35.
Zurück zum Zitat Suen, C., Huang, S., Eksombatchai, C., Sosic, R., Leskovec, J.: NIFTY: a system for large scale information flow tracking and clustering. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1237–1248. ACM (2013) Suen, C., Huang, S., Eksombatchai, C., Sosic, R., Leskovec, J.: NIFTY: a system for large scale information flow tracking and clustering. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1237–1248. ACM (2013)
36.
Zurück zum Zitat Taxidou, I., Fischer, P.M.: Online analysis of information diffusion in twitter. In: Proceedings of the 23rd International Conference on World Wide Web, WWW ’14 Companion (2014) Taxidou, I., Fischer, P.M.: Online analysis of information diffusion in twitter. In: Proceedings of the 23rd International Conference on World Wide Web, WWW ’14 Companion (2014)
37.
Zurück zum Zitat Taxidou, I., De Nies, T., Verborgh, R., Fischer, P., Mannens, E., Van de Walle, R.: Modeling information diffusion in social media as provenance with W3C PROV. In: Proceedings of the 6th International Workshop on Modeling Social Media, pp. 819–824 (2015) Taxidou, I., De Nies, T., Verborgh, R., Fischer, P., Mannens, E., Van de Walle, R.: Modeling information diffusion in social media as provenance with W3C PROV. In: Proceedings of the 6th International Workshop on Modeling Social Media, pp. 819–824 (2015)
38.
Zurück zum Zitat Taxidou, I., Fischer, PM., De Nies, T., Mannens, E., Van de Walle, R.: Information diffusion and provenance of interactions in twitter: Is it only about retweets? In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 113–114 (2016) Taxidou, I., Fischer, PM., De Nies, T., Mannens, E., Van de Walle, R.: Information diffusion and provenance of interactions in twitter: Is it only about retweets? In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 113–114 (2016)
39.
Zurück zum Zitat Webber, W., Moffat, A., Zobel, J.: A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28(4), 1–20 (2010)CrossRef Webber, W., Moffat, A., Zobel, J.: A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28(4), 1–20 (2010)CrossRef
40.
Zurück zum Zitat Yang, J., Leskovec, J.: Modeling information diffusion in implicit networks. In: 2010 IEEE International Conference on Data Mining, pp. 599–608 (2010) Yang, J., Leskovec, J.: Modeling information diffusion in implicit networks. In: 2010 IEEE International Conference on Data Mining, pp. 599–608 (2010)
Metadaten
Titel
Web-scale provenance reconstruction of implicit information diffusion on social media
verfasst von
Io Taxidou
Sven Lieber
Peter M. Fischer
Tom De Nies
Ruben Verborgh
Publikationsdatum
20.10.2017
Verlag
Springer US
Erschienen in
Distributed and Parallel Databases / Ausgabe 1/2018
Print ISSN: 0926-8782
Elektronische ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-017-7211-3

Weitere Artikel der Ausgabe 1/2018

Distributed and Parallel Databases 1/2018 Zur Ausgabe