Skip to main content
Top

2014 | OriginalPaper | Chapter

Vector Space Models for the Classification of Short Messages on Social Network Services

Authors : Ricardo Lage, Peter Dolog, Martin Leginus

Published in: Web Information Systems and Technologies

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this chapter we review vector space models to propose a new one based on the Jensen-Shannon divergence with the goal of classifying ignored short messages on a social network service. We assume that ignored messages are those published ones that were not interacted with. Our goal then is to attempt to classify messages to be published as ignored to discard them from a set messages that can be used by a recommender system. To evaluate our model, we conduct experiments comparing different models on a Twitter dataset with more than 13,000 Twitter accounts. Results show that our best model tested obtained an average accuracy of 0.77, compared to 0.74 from a model from the literature. Similarly, this method obtained an average precision of 0.74 compared to 0.58 from the second best performing model.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
These numbers are based on discussions in blog posts such as in http://​thenextweb.​com/​twitter/​2012/​01/​07/​interesting-fact-most-tweets-posted-are -approximately-30-characters-long/​ and http://​www.​ayman-naaman.​net/​2010/​04/​21/​how-many-characters-do-you-tweet/​. But they do not provide an average. In our own dataset presented in Sect. 4.1, the average number of characters in a tweet is 84.
 
3
Note that we do not normalize our tf-idf model based on message length since all tend to have similar sizes [18].
 
Literature
1.
go back to reference Bell, R., Volinsky, C., Koren, Y.: Matrix factorization techniques for recommender systems. IEEE Comput. 42(8), 30–37 (2009)CrossRef Bell, R., Volinsky, C., Koren, Y.: Matrix factorization techniques for recommender systems. IEEE Comput. 42(8), 30–37 (2009)CrossRef
2.
go back to reference Chen, K., Chen, T., Zheng, G., Jin, O., Yao, E., Yu, Y.: Collaborative personalized tweet recommendation. In: Proceedings of the 35th international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 661–670. SIGIR ’12, ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348372 Chen, K., Chen, T., Zheng, G., Jin, O., Yao, E., Yu, Y.: Collaborative personalized tweet recommendation. In: Proceedings of the 35th international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 661–670. SIGIR ’12, ACM, New York (2012). http://​doi.​acm.​org/​10.​1145/​2348283.​2348372
4.
go back to reference Combarro, E., Montanes, E., Diaz, I., Ranilla, J., Mones, R.: Introducing a family of linear measures for feature selection in text categorization. IEEE Trans. Knowl. Data Eng. 17(9), 1223–1232 (2005)CrossRef Combarro, E., Montanes, E., Diaz, I., Ranilla, J., Mones, R.: Introducing a family of linear measures for feature selection in text categorization. IEEE Trans. Knowl. Data Eng. 17(9), 1223–1232 (2005)CrossRef
6.
go back to reference Díaz, I., Ranilla, J., Montañes, E., Fernández, J., Combarro, E.: Improving performance of text categorization by combining filtering and support vector machines. J. Am. Soc. Inf. Sci. Technol. 55(7), 579–592 (2004)CrossRef Díaz, I., Ranilla, J., Montañes, E., Fernández, J., Combarro, E.: Improving performance of text categorization by combining filtering and support vector machines. J. Am. Soc. Inf. Sci. Technol. 55(7), 579–592 (2004)CrossRef
11.
go back to reference Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998) Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
13.
go back to reference Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and naive Bayes. In: Machine Learning-International Workshop then Conference, pp. 258–267. Morgan Kaufmann Publishers, INC (1999) Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and naive Bayes. In: Machine Learning-International Workshop then Conference, pp. 258–267. Morgan Kaufmann Publishers, INC (1999)
15.
go back to reference Robertson, S.E., Walker, S., Beaulieu, M., Willett, P.: Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In: TREC, pp. 199–210 (1998) Robertson, S.E., Walker, S., Beaulieu, M., Willett, P.: Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In: TREC, pp. 199–210 (1998)
19.
go back to reference Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Machine Learning-International Workshop then Conference, pp. 412–420. Morgan Kaufmann Publishers, INC. (1997) Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Machine Learning-International Workshop then Conference, pp. 412–420. Morgan Kaufmann Publishers, INC. (1997)
Metadata
Title
Vector Space Models for the Classification of Short Messages on Social Network Services
Authors
Ricardo Lage
Peter Dolog
Martin Leginus
Copyright Year
2014
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-44300-2_13

Premium Partner