Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 2/2016

01-04-2016 | Original Article

Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

Authors: Christos Bouras, Vassilis Tsogkas

Published in: International Journal of Machine Learning and Cybernetics | Issue 2/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Collaborative filtering systems typically need to acquire some data about the new user in order to start making personalized suggestions, a situation commonly referred to as the “new user problem”. In this work we attempt to address the new user problem via a unique personalized strategy for prompting the user with articles to rate. Our approach makes use of hypernyms extracted from the WordNet database and proves to be converging fast to the actual user interests based on minimal user ratings, which are provided during the registration process. In addition, we explore the possible enhancement of the document clustering results, and in particular clustering of news articles from the web, when using word-based n-grams during the keyword extraction phase. We present and evaluate a weighting approach that combines clustering of news articles derived from the web, using n-grams that are extracted from the articles at an offline stage. This technique is then compared with the single minded “bag-of-words” representation that our clustering algorithm, W-kmeans, previously used. Our experimentation reveals that via fine tuning the weighting parameters between keyword and n-grams, as well as the n value itself, a significant improvement regarding the clustering results metrics can be achieved.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Abou-Assaleh T, Cercone N, Keselj V, Sweidan R (2004) Detection of new malicious code using n-grams signatures. Second annual conference on privacy, security and trust. Fredericton, NB, pp 193–196 Abou-Assaleh T, Cercone N, Keselj V, Sweidan R (2004) Detection of new malicious code using n-grams signatures. Second annual conference on privacy, security and trust. Fredericton, NB, pp 193–196
2.
go back to reference Barron-Cedeno A, Rosso P (2009) On automatic plagiarism detection based on n-grams comparisons. In: Proceedings of the European conference on information retrieval, ECIR-2009, pp 696–700 Barron-Cedeno A, Rosso P (2009) On automatic plagiarism detection based on n-grams comparisons. In: Proceedings of the European conference on information retrieval, ECIR-2009, pp 696–700
3.
go back to reference Balabanovie M, Shoham Y (1997) Fab: content-based collaborative recommendation. Commun ACM 40:66–72CrossRef Balabanovie M, Shoham Y (1997) Fab: content-based collaborative recommendation. Commun ACM 40:66–72CrossRef
4.
go back to reference Bouras C, Poulopoulos V, Tsogkas V (2008) PeRSSonal’s core functionality evaluation: enhancing text labeling through personalized summaries. Data Knowl Eng J 64(1):330–345 Elsevier ScienceCrossRef Bouras C, Poulopoulos V, Tsogkas V (2008) PeRSSonal’s core functionality evaluation: enhancing text labeling through personalized summaries. Data Knowl Eng J 64(1):330–345 Elsevier ScienceCrossRef
5.
go back to reference Bouras C, Tsogkas V, (2010) W-kmeans: clustering news articles using wordnet. In: Proceedings of KES vol. 3, pp. 379–388 Bouras C, Tsogkas V, (2010) W-kmeans: clustering news articles using wordnet. In: Proceedings of KES vol. 3, pp. 379–388
6.
go back to reference Bouras C, Tsogkas V (2011) Clustering user preferences using W-kmeans. In: Proceedings of the seventh international conference on signal-image technology and internet-based systems (SITIS), pp. 75–82 Bouras C, Tsogkas V (2011) Clustering user preferences using W-kmeans. In: Proceedings of the seventh international conference on signal-image technology and internet-based systems (SITIS), pp. 75–82
7.
go back to reference Cavnar W, Trenkle J (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94 Cavnar W, Trenkle J (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94
8.
go back to reference Cleary J, Bell T, Witten I (1990) Text Compression, Prentice Hall Cleary J, Bell T, Witten I (1990) Text Compression, Prentice Hall
9.
go back to reference Crane M (2011) The new user problem in collaborative filtering. Thesis for the degree of Master of Science, Department of Computer Science, University of Otago Crane M (2011) The new user problem in collaborative filtering. Thesis for the degree of Master of Science, Department of Computer Science, University of Otago
10.
go back to reference Damerau F, Apte C, Weiss S (1994) Toward language independent automated learning of text categorization models. In: Proceedings SIGIR-94 Damerau F, Apte C, Weiss S (1994) Toward language independent automated learning of text categorization models. In: Proceedings SIGIR-94
11.
go back to reference Ekstrand M.D, Riedl J.T,.Konstan J.A (2011). In: Collaborative filtering recommender systems, Found. Trends Hum. Comput. Interact 4 Ekstrand M.D, Riedl J.T,.Konstan J.A (2011). In: Collaborative filtering recommender systems, Found. Trends Hum. Comput. Interact 4
12.
go back to reference Furnkranz J (1998) A study using n-grams features for text categorization. Technical Report OEFAI-TR-98-30, Austrian research institute for artificial intelligence Furnkranz J (1998) A study using n-grams features for text categorization. Technical Report OEFAI-TR-98-30, Austrian research institute for artificial intelligence
13.
go back to reference Golbandi N, Koren Y, Lempel R (2010) On bootstrapping recommender systems. In: Proceedings of the 19th ACM International Conference of Information and Knowledge Management, ACM, pp. 1805–1808 Golbandi N, Koren Y, Lempel R (2010) On bootstrapping recommender systems. In: Proceedings of the 19th ACM International Conference of Information and Knowledge Management, ACM, pp. 1805–1808
14.
go back to reference Golbandi N Koren Y, Lempel R (2011) Adaptive bootstrapping of recommender systems using decision trees. In: Proceedings of the forth acm international conference on web search and data mining, pp. 595–604 Golbandi N Koren Y, Lempel R (2011) Adaptive bootstrapping of recommender systems using decision trees. In: Proceedings of the forth acm international conference on web search and data mining, pp. 595–604
15.
go back to reference Good N, Schafer J. B, Konstan J. A, Borchers A. Sarwar B. J, Herlocker, Riedl J (1990) Combining collaborative filtering with personal agents for better recommendations. In: Proccedings of the 16th international conference on artificial intelligence and the 11th innovative applications of artificial intelligence conference innovative applications of artificial intelligence, Orlando, Florida, United States, pp.439–446 Good N, Schafer J. B, Konstan J. A, Borchers A. Sarwar B. J, Herlocker, Riedl J (1990) Combining collaborative filtering with personal agents for better recommendations. In: Proccedings of the 16th international conference on artificial intelligence and the 11th innovative applications of artificial intelligence conference innovative applications of artificial intelligence, Orlando, Florida, United States, pp.439–446
16.
go back to reference Jung K.–Y, Park D., Lee J (2004) Hybrid collaborative filtering and content-based filtering for improved recommender system. Computational Science-ICCS, pp. 295–302 Jung K.–Y, Park D., Lee J (2004) Hybrid collaborative filtering and content-based filtering for improved recommender system. Computational Science-ICCS, pp. 295–302
17.
go back to reference Jurafsky D, James H. M (2001) Speech and language processing. Prentice-Hall, Inc, 2000 Jurafsky D, James H. M (2001) Speech and language processing. Prentice-Hall, Inc, 2000
18.
go back to reference Kohrs A, Merialdo B (2001) Improving collaborative filtering for new-users by smart object selection. In: Proceedings of international conference on media features, international conference on media futures, Florence, Italy Kohrs A, Merialdo B (2001) Improving collaborative filtering for new-users by smart object selection. In: Proceedings of international conference on media features, international conference on media futures, Florence, Italy
19.
go back to reference Koren Y, Bell R. M (2011) Advances in collaborative filtering. Recommender Systems Handbook, pages 145–186 Koren Y, Bell R. M (2011) Advances in collaborative filtering. Recommender Systems Handbook, pages 145–186
20.
go back to reference Mahoui M, Witten I, Bray Z, Teahan W (1999) Text mining: a new frontier for lossless compression. In: Proceedings of the IEEE Data Compression Conference (DCC) Mahoui M, Witten I, Bray Z, Teahan W (1999) Text mining: a new frontier for lossless compression. In: Proceedings of the IEEE Data Compression Conference (DCC)
21.
go back to reference Nguyen H, Haddawy P (1998) The decision-theoretic video advisor. AAAI-98. Workshop on recommender systems, Madison, pp 77–80 Nguyen H, Haddawy P (1998) The decision-theoretic video advisor. AAAI-98. Workshop on recommender systems, Madison, pp 77–80
22.
go back to reference Park S, Pennock D, Madani O, Good N. DeCoste D (2006) Naïve filterbots for robust cold-start recommendations. In: Proceedings of the 12th ACM SIGKDD International conference on knowledge discovery and data mining, ACM, pp. 699–705 Park S, Pennock D, Madani O, Good N. DeCoste D (2006) Naïve filterbots for robust cold-start recommendations. In: Proceedings of the 12th ACM SIGKDD International conference on knowledge discovery and data mining, ACM, pp. 699–705
23.
go back to reference Pilaszy I. Tikk D (2009) Recommending new movies: even a few ratings are more valuable than metadata. In: Proceedings of the third acm conference on recommender systems, pp. 93–100 Pilaszy I. Tikk D (2009) Recommending new movies: even a few ratings are more valuable than metadata. In: Proceedings of the third acm conference on recommender systems, pp. 93–100
24.
go back to reference Rana S, Jasola S, Kumar R (2013) A boundary restricted adaptive particle swarm optimization for data clustering. Int J Mach Learn Cybernet 4(4):391–400CrossRef Rana S, Jasola S, Kumar R (2013) A boundary restricted adaptive particle swarm optimization for data clustering. Int J Mach Learn Cybernet 4(4):391–400CrossRef
25.
go back to reference Rashid AM, Istvan A, Cosley D, Lam SK, McNee SM, Konstan JA, Riedl J (2002) Getting to know you: learning new user preferences in recommender systems. Proceedings of the 7th international conference on Intelligent user interfaces. California, San Francisco, pp 127–134 Rashid AM, Istvan A, Cosley D, Lam SK, McNee SM, Konstan JA, Riedl J (2002) Getting to know you: learning new user preferences in recommender systems. Proceedings of the 7th international conference on Intelligent user interfaces. California, San Francisco, pp 127–134
26.
go back to reference Rashid AM, Karypis G, Riedl J (2008) Learning preferences of new users in recommender systems: an information theoretic approach. ACM SIGKDD Explor Newsl 10(2):90–100CrossRef Rashid AM, Karypis G, Riedl J (2008) Learning preferences of new users in recommender systems: an information theoretic approach. ACM SIGKDD Explor Newsl 10(2):90–100CrossRef
27.
go back to reference Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the k-means clustering method. Int J Mach Learn Cybernet 4(2):107–117CrossRef Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the k-means clustering method. Int J Mach Learn Cybernet 4(2):107–117CrossRef
28.
go back to reference Wang X, Wang Y, Wang L (2004) Improving fuzzy c-means clustering based on feature-weight learning. Pattern Recognit Letters 25(10):1123–1132CrossRef Wang X, Wang Y, Wang L (2004) Improving fuzzy c-means clustering based on feature-weight learning. Pattern Recognit Letters 25(10):1123–1132CrossRef
29.
go back to reference Yeung DS, Wang XZ (2002) Improving performance of similarity-based clustering by feature weight learning. Pattern Anal Mach Intell, IEEE Transactions on 24(4):556–561MathSciNetCrossRef Yeung DS, Wang XZ (2002) Improving performance of similarity-based clustering by feature weight learning. Pattern Anal Mach Intell, IEEE Transactions on 24(4):556–561MathSciNetCrossRef
31.
go back to reference Zhao Y, Karypi G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn 55(3):311–331CrossRef Zhao Y, Karypi G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn 55(3):311–331CrossRef
Metadata
Title
Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem
Authors
Christos Bouras
Vassilis Tsogkas
Publication date
01-04-2016
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 2/2016
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-014-0264-y

Other articles of this Issue 2/2016

International Journal of Machine Learning and Cybernetics 2/2016 Go to the issue