Skip to main content

2018 | OriginalPaper | Buchkapitel

Wiki-MID: A Very Large Multi-domain Interests Dataset of Twitter Users with Mappings to Wikipedia

verfasst von : Giorgia Di Tommaso, Stefano Faralli, Giovanni Stilo, Paola Velardi

Erschienen in: The Semantic Web – ISWC 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents Wiki-MID, a LOD compliant multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset from Twitter messages in English and Italian. Our English dataset includes an average of 90 multi-domain preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million users traced during six months in 2017. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their “topical” friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to categorize preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Linden, G., Smith, B., York, J.: Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7(1), 76–80 (2003)CrossRef Linden, G., Smith, B., York, J.: Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7(1), 76–80 (2003)CrossRef
2.
Zurück zum Zitat Davidson, J., Liebald, B., Liu, J., et al.: The youtube video recommendation system. In: Proceedings of the 4th RecSys, pp. 293–296. ACM (2010) Davidson, J., Liebald, B., Liu, J., et al.: The youtube video recommendation system. In: Proceedings of the 4th RecSys, pp. 293–296. ACM (2010)
3.
Zurück zum Zitat Fouss, F., Saerens, M.: Evaluating performance of recommender systems: an experimental comparison. In: International Conference on WI-IAT 2008, vol. 1, pp. 735–738. IEEE (2008) Fouss, F., Saerens, M.: Evaluating performance of recommender systems: an experimental comparison. In: International Conference on WI-IAT 2008, vol. 1, pp. 735–738. IEEE (2008)
4.
Zurück zum Zitat Felfernig, A., Jeran, M., Ninaus, G., Reinfrank, F., Reiterer, S., Stettinger, M.: Basic approaches in recommendation systems. In: Robillard, M.P., Maalej, W., Walker, R.J., Zimmermann, T. (eds.) Recommendation Systems in Software Engineering, pp. 15–37. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45135-5_2CrossRef Felfernig, A., Jeran, M., Ninaus, G., Reinfrank, F., Reiterer, S., Stettinger, M.: Basic approaches in recommendation systems. In: Robillard, M.P., Maalej, W., Walker, R.J., Zimmermann, T. (eds.) Recommendation Systems in Software Engineering, pp. 15–37. Springer, Heidelberg (2014). https://​doi.​org/​10.​1007/​978-3-642-45135-5_​2CrossRef
7.
Zurück zum Zitat Trewin, S.: Knowledge-based recommender systems. Encycl. Libr. Inf. Sci. 69(Suppl. 32), 180 (2000) Trewin, S.: Knowledge-based recommender systems. Encycl. Libr. Inf. Sci. 69(Suppl. 32), 180 (2000)
8.
Zurück zum Zitat Burke, R.: Hybrid recommender systems: survey and experiments. User Model. User Adapt. Interact. 12(4), 331–370 (2002)CrossRefMATH Burke, R.: Hybrid recommender systems: survey and experiments. User Model. User Adapt. Interact. 12(4), 331–370 (2002)CrossRefMATH
9.
Zurück zum Zitat Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. JMLR 10, 2935–2962 (2009)MathSciNetMATH Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. JMLR 10, 2935–2962 (2009)MathSciNetMATH
10.
Zurück zum Zitat Dror, G., Koenigstein, N., Koren, Y., Weimer, M.: The yahoo! music dataset and KDD-cup’11. In: Proceedings of KDD Cup 2011, pp. 3–18 (2012) Dror, G., Koenigstein, N., Koren, Y., Weimer, M.: The yahoo! music dataset and KDD-cup’11. In: Proceedings of KDD Cup 2011, pp. 3–18 (2012)
11.
Zurück zum Zitat Shepitsen, A., Gemmell, J., Mobasher, B., Burke, R.: Personalized recommendation in social tagging systems using hierarchical clustering. In: RecSys 2008. ACM (2008) Shepitsen, A., Gemmell, J., Mobasher, B., Burke, R.: Personalized recommendation in social tagging systems using hierarchical clustering. In: RecSys 2008. ACM (2008)
12.
Zurück zum Zitat Kamishima, T., Akaho, S.: Nantonac collaborative filtering: a model-based approach. In: Proceedings of the 4th RecSys, pp. 273–276. ACM (2010) Kamishima, T., Akaho, S.: Nantonac collaborative filtering: a model-based approach. In: Proceedings of the 4th RecSys, pp. 273–276. ACM (2010)
13.
Zurück zum Zitat Sawant, S., Pai, G.: Yelp food recommendation system (2013) Sawant, S., Pai, G.: Yelp food recommendation system (2013)
14.
Zurück zum Zitat Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD, pp. 783–792 (2010) Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD, pp. 783–792 (2010)
15.
Zurück zum Zitat Mavalankar, A.A., et al.: Hotel recommendation system. Internal Report (2017) Mavalankar, A.A., et al.: Hotel recommendation system. Internal Report (2017)
16.
Zurück zum Zitat Çano, E., Morisio, M.: Characterization of public datasets for recommender systems. In: IEEE 1st International Forum on RTSI, pp. 249–257. IEEE (2015) Çano, E., Morisio, M.: Characterization of public datasets for recommender systems. In: IEEE 1st International Forum on RTSI, pp. 249–257. IEEE (2015)
17.
Zurück zum Zitat Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. In: TiiS 2016 (2016) Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. In: TiiS 2016 (2016)
18.
Zurück zum Zitat McFee, B., Bertin-Mahieux, T., Ellis, D.P., Lanckriet, G.R.: The million song dataset challenge. In: Proceedings of the 21st WWW, pp. 909–916. ACM (2012) McFee, B., Bertin-Mahieux, T., Ellis, D.P., Lanckriet, G.R.: The million song dataset challenge. In: Proceedings of the 21st WWW, pp. 909–916. ACM (2012)
19.
Zurück zum Zitat Bennett, J., Lanning, S., et al.: The netflix prize. In: Proceedings of KDD, New York (2007) Bennett, J., Lanning, S., et al.: The netflix prize. In: Proceedings of KDD, New York (2007)
20.
Zurück zum Zitat Yan, M., Sang, J., Xu, C.: Mining cross-network association for youtube video promotion. In: Proceedings of the 22nd ACM MM, pp. 557–566. ACM (2014) Yan, M., Sang, J., Xu, C.: Mining cross-network association for youtube video promotion. In: Proceedings of the 22nd ACM MM, pp. 557–566. ACM (2014)
22.
Zurück zum Zitat Chaabane, A., Acs, G., Kaafar, M.A., et al.: You are what you like! information leakage through users’ interests. In: Proceedings of the 19th NDSS Symposium (2012) Chaabane, A., Acs, G., Kaafar, M.A., et al.: You are what you like! information leakage through users’ interests. In: Proceedings of the 19th NDSS Symposium (2012)
23.
Zurück zum Zitat Faralli, S., Stilo, G., Velardi, P.: Large scale homophily analysis in Twitter using a twixonomy. In: Proceedings of 24th IJCAI, Buenos Aires, 25–31 July 2015, pp. 2334–2340 (2015) Faralli, S., Stilo, G., Velardi, P.: Large scale homophily analysis in Twitter using a twixonomy. In: Proceedings of 24th IJCAI, Buenos Aires, 25–31 July 2015, pp. 2334–2340 (2015)
26.
Zurück zum Zitat Kapanipathi, P., Jain, P., Venkataramani, C., Sheth, A.: User interests identification on Twitter using a hierarchical knowledge base. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 99–113. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07443-6_8CrossRef Kapanipathi, P., Jain, P., Venkataramani, C., Sheth, A.: User interests identification on Twitter using a hierarchical knowledge base. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 99–113. Springer, Cham (2014). https://​doi.​org/​10.​1007/​978-3-319-07443-6_​8CrossRef
27.
Zurück zum Zitat Schinas, E., et al.: Eventsense: capturing the pulse of large-scale events by mining social media streams. In: Proceedings of the 17th PCI, pp. 17–24. ACM (2013) Schinas, E., et al.: Eventsense: capturing the pulse of large-scale events by mining social media streams. In: Proceedings of the 17th PCI, pp. 17–24. ACM (2013)
28.
Zurück zum Zitat Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using Twitter. In: Proceedings of the 2012 International Conference on Intelligent User Interfaces, pp. 189–198. ACM (2012) Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using Twitter. In: Proceedings of the 2012 International Conference on Intelligent User Interfaces, pp. 189–198. ACM (2012)
29.
Zurück zum Zitat Dooms, S., De Pessemier, T., Martens, L.: Mining cross-domain rating datasets from structured data on Twitter. In: Proceedings of the 23rd WWW, pp. 621–624. ACM (2014) Dooms, S., De Pessemier, T., Martens, L.: Mining cross-domain rating datasets from structured data on Twitter. In: Proceedings of the 23rd WWW, pp. 621–624. ACM (2014)
30.
Zurück zum Zitat Barbieri, N., Bonchi, F., Manco, G.: Who to follow and why: link prediction with explanations. In: Proceedings of the 20th ACM SIGKDD, pp. 1266–1275. ACM (2014) Barbieri, N., Bonchi, F., Manco, G.: Who to follow and why: link prediction with explanations. In: Proceedings of the 20th ACM SIGKDD, pp. 1266–1275. ACM (2014)
31.
Zurück zum Zitat Myers, S.A., Leskovec, J.: The bursty dynamics of the Twitter information network. In: Proceedings of the 23rd WWW, pp. 913–924. ACM (2014) Myers, S.A., Leskovec, J.: The bursty dynamics of the Twitter information network. In: Proceedings of the 23rd WWW, pp. 913–924. ACM (2014)
32.
Zurück zum Zitat Pichl, M., Zangerle, E., Specht, G.: Combining spotify and Twitter data for generating a recent and public dataset for music recommendation. In: Grundlagen von Datenbanken, pp. 35–40 (2014) Pichl, M., Zangerle, E., Specht, G.: Combining spotify and Twitter data for generating a recent and public dataset for music recommendation. In: Grundlagen von Datenbanken, pp. 35–40 (2014)
33.
Zurück zum Zitat Besel, C., Schlötterer, J., Granitzer, M.: Inferring semantic interest profiles from Twitter followees: does twitter know better than your friends? In: SAC 2016 (2016) Besel, C., Schlötterer, J., Granitzer, M.: Inferring semantic interest profiles from Twitter followees: does twitter know better than your friends? In: SAC 2016 (2016)
35.
Zurück zum Zitat Faralli, S., Stilo, G., Velardi, P.: Automatic acquisition of a taxonomy of microblogs users’ interests. J. Web Semant. 45, 23–40 (2017)CrossRef Faralli, S., Stilo, G., Velardi, P.: Automatic acquisition of a taxonomy of microblogs users’ interests. J. Web Semant. 45, 23–40 (2017)CrossRef
36.
Zurück zum Zitat Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. AI 193, 217–250 (2012)MathSciNetMATH Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. AI 193, 217–250 (2012)MathSciNetMATH
37.
Zurück zum Zitat Delli Bovi, L., Telesca, L., Navigli, R.: Large-scale information extraction from textual definitions through deep syntactic and semantic analysis. TACL 3, 529–543 (2015)CrossRef Delli Bovi, L., Telesca, L., Navigli, R.: Large-scale information extraction from textual definitions through deep syntactic and semantic analysis. TACL 3, 529–543 (2015)CrossRef
Metadaten
Titel
Wiki-MID: A Very Large Multi-domain Interests Dataset of Twitter Users with Mappings to Wikipedia
verfasst von
Giorgia Di Tommaso
Stefano Faralli
Giovanni Stilo
Paola Velardi
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-00668-6_3