Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2016

01.10.2016

RedTweet: recommendation engine for reddit

verfasst von: Hoang Nguyen, Rachel Richards, Chien-Chung Chan, Kathy J. Liszka

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Twitter and Reddit are two of the most popular social media sites used today. In this paper, we study the use of machine learning and WordNet-based classifiers to generate an interest profile from a user’s tweets and use this to recommend loosely related Reddit threads which the reader is most likely to be interested in. We introduce a genre classification algorithm using a similarity measure derived from WordNet lexical database for English to label genres for nouns in tweets. The proposed algorithm generates a user’s interest profile from their tweets based on a referencing taxonomy of genres derived from the genre-tagged Brown Corpus augmented with a technology genre. The top K genres of a user’s interest profile can be used for recommending subreddit articles in those genres. Experiments using real life test cases collected from Twitter have been done to compare the performance on genre classification by using the WordNet classifier and machine learning classifiers such as SVM, Random Forests, and an ensemble of Bayesian classifiers. Empirically, we have obtained similar results from the two different approaches with a sufficient number of tweets. It seems that machine learning algorithms as well as the WordNet ontology are viable tools for developing recommendation engine based on genre classification. One advantage of the WordNet approach is simplicity and no learning is required. However, the WordNet classifier tends to have poor precision on users with very few tweets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
The numbers were extracted from the raw data downloaded from Reddit with the help of Dr. Arvind Srinivasan of ZL Technologies in San Jose, CA.
 
Literatur
Zurück zum Zitat Boser, B.E., Guyon, I.M., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Haussler, D. (Ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 144–152). Pittsburgh, PA: ACM Press. Boser, B.E., Guyon, I.M., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Haussler, D. (Ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 144–152). Pittsburgh, PA: ACM Press.
Zurück zum Zitat Dietterich, T. (2000). Ensemble Methods in machine learning. Multiple Classifier Systems, 1857, 1–15. Dietterich, T. (2000). Ensemble Methods in machine learning. Multiple Classifier Systems, 1857, 1–15.
Zurück zum Zitat Feldman, S., Marin, M.A., Ostendorf, M., & Gupta, M.R. (2009). Part-of-speech histograms for genre classification of text. In 2009. ICASSP 2009. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4781–4784): IEEE. Feldman, S., Marin, M.A., Ostendorf, M., & Gupta, M.R. (2009). Part-of-speech histograms for genre classification of text. In 2009. ICASSP 2009. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4781–4784): IEEE.
Zurück zum Zitat Fellbaum, C. (1998). WordNet: An Electronic Lexical Database: MIT Press. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database: MIT Press.
Zurück zum Zitat Finn, A., & Kushmerick, N. (2006). Learning to classify documents according to genre. Journal of the American Society for Information Science and Technology, 57 (11), 1506–1518.CrossRef Finn, A., & Kushmerick, N. (2006). Learning to classify documents according to genre. Journal of the American Society for Information Science and Technology, 57 (11), 1506–1518.CrossRef
Zurück zum Zitat Francis, W., & Kucera, H. (1979). Brown Corpus Manual, 1st edn. Providen ce: Brown University. Francis, W., & Kucera, H. (1979). Brown Corpus Manual, 1st edn. Providen ce: Brown University.
Zurück zum Zitat Freund, L., Clarke, C.L.A., & Toms, E.G. (2006). Towards genre classification for IR in the workplace. Proceedings of the 1st International Conference on Information Interaction in Context, (p. 3036). New York, NY. Freund, L., Clarke, C.L.A., & Toms, E.G. (2006). Towards genre classification for IR in the workplace. Proceedings of the 1st International Conference on Information Interaction in Context, (p. 3036). New York, NY.
Zurück zum Zitat Karlgren, J., & Cutting, D. (1994). Recognizing text genres with simple metrics using discriminant analysis. Proceedings of the 15th Annual Meeting of the Association for Computational Linguistics, (p. 10711075). Morristown, NJ. Karlgren, J., & Cutting, D. (1994). Recognizing text genres with simple metrics using discriminant analysis. Proceedings of the 15th Annual Meeting of the Association for Computational Linguistics, (p. 10711075). Morristown, NJ.
Zurück zum Zitat Kessler, B., Nunberg, G., & Schtze, H. (1997). Automatic detection of text genre. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, (pp. 32–38). Morristown, NJ. Kessler, B., Nunberg, G., & Schtze, H. (1997). Automatic detection of text genre. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, (pp. 32–38). Morristown, NJ.
Zurück zum Zitat Lewis, D.D. (1992). Feature selection and feature extraction for text categorization. Proceedings of the workshop on Speech and Natural Language, 212–217. Lewis, D.D. (1992). Feature selection and feature extraction for text categorization. Proceedings of the workshop on Speech and Natural Language, 212–217.
Zurück zum Zitat Manning, C., Raghavan, P., & Schutze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefMATH Manning, C., Raghavan, P., & Schutze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefMATH
Zurück zum Zitat Meyer zu Eissen, S., & Stein, B. (2004). Genre classification of web pages. KI 2004: Advances in Artificial Intelligence, 256–269. Meyer zu Eissen, S., & Stein, B. (2004). Genre classification of web pages. KI 2004: Advances in Artificial Intelligence, 256–269.
Zurück zum Zitat Nguyen, H., Richards, R., Chan, C.-C., & Liszka, K.J. (2015). RedTweet: Recommendation Engine for Reddit. Paris, France: MSNDS Workshop 2015. (to appear Proceedings of IEEE/ACM ASONAM 2015).CrossRef Nguyen, H., Richards, R., Chan, C.-C., & Liszka, K.J. (2015). RedTweet: Recommendation Engine for Reddit. Paris, France: MSNDS Workshop 2015. (to appear Proceedings of IEEE/ACM ASONAM 2015).CrossRef
Zurück zum Zitat Pennacchiotti, M., & Popescu, Ana-Maria (2011). A machine learning approach to twitter user classification. ICWSM, 11, 281–288. Pennacchiotti, M., & Popescu, Ana-Maria (2011). A machine learning approach to twitter user classification. ICWSM, 11, 281–288.
Zurück zum Zitat Qi, X., & Davison, B.D. (2009). Web page classification: Features and algorithms. ACM Computing Surveys (CSUR), 41(2), 12.CrossRef Qi, X., & Davison, B.D. (2009). Web page classification: Features and algorithms. ACM Computing Surveys (CSUR), 41(2), 12.CrossRef
Zurück zum Zitat Salton, G., Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.CrossRefMATH Salton, G., Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.CrossRefMATH
Zurück zum Zitat Stamatatos, E., Fakotakis, N., & Kokkinakis, G. (2000). Text genre detection using common word frequencies. Proceedings of the 18th Conference on Computational Linguistics, 808–814. Stamatatos, E., Fakotakis, N., & Kokkinakis, G. (2000). Text genre detection using common word frequencies. Proceedings of the 18th Conference on Computational Linguistics, 808–814.
Zurück zum Zitat Stein, B., & Meyer zu Eissen, S. (2006). Distinguishing topic from genre. Proceedings of the 6th International Conference on Knowledge Management (I-KNOW 06). Graz: Journal of Universal Computer Science. Stein, B., & Meyer zu Eissen, S. (2006). Distinguishing topic from genre. Proceedings of the 6th International Conference on Knowledge Management (I-KNOW 06). Graz: Journal of Universal Computer Science.
Zurück zum Zitat Westman, S, & Freund, L. (2010). Information interaction in 140 characters or less: genres on twitter. Proceedings of the third symposium on Information interaction in context: ACM. Westman, S, & Freund, L. (2010). Information interaction in 140 characters or less: genres on twitter. Proceedings of the third symposium on Information interaction in context: ACM.
Metadaten
Titel
RedTweet: recommendation engine for reddit
verfasst von
Hoang Nguyen
Rachel Richards
Chien-Chung Chan
Kathy J. Liszka
Publikationsdatum
01.10.2016
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2016
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-016-0410-y

Weitere Artikel der Ausgabe 2/2016

Journal of Intelligent Information Systems 2/2016 Zur Ausgabe

Premium Partner