short-paper

How does high dimensionality affect collaborative filtering?

Authors:
Alexandros Nanopoulos

University of Hildesheim, Hildesheim, Germany

University of Hildesheim, Hildesheim, Germany
View Profile

,
Miloš Radovanović

University of Novi Sad, Novi Sad, Serbia

University of Novi Sad, Novi Sad, Serbia
View Profile

,
Mirjana Ivanović

University of Novi Sad, Novi Sad, Serbia

University of Novi Sad, Novi Sad, Serbia
View Profile

RecSys '09: Proceedings of the third ACM conference on Recommender systemsOctober 2009Pages 293–296https://doi.org/10.1145/1639714.1639771

Published:23 October 2009Publication History

RecSys '09: Proceedings of the third ACM conference on Recommender systems

Pages 293–296

ABSTRACT

A crucial operation in memory-based collaborative filtering (CF) is determining nearest neighbors (NNs) of users/items. This paper addresses two phenomena that emerge when CF algorithms perform NN search in high-dimensional spaces that are typical in CF applications. The first is similarity concentration and the second is the appearance of hubs (i.e. points which appear in $k$-NN lists of many other points). Through theoretical analysis and experimental evaluation we show that these phenomena are inherent properties of high-dimensional space, unrelated to other data properties like sparsity, and that they can impact CF algorithms by questioning the meaning and representativeness of discovered NNs. Moreover, we show that it is not easy to mitigate the phenomena using dimensionality reduction. Studying these phenomena aims to provide a better understanding of the limitations of memory-based CF and motivate the development of new algorithms that would overcome them.

References

G. Casella and R. L. Berger. Statistical Inference, 2nd ed. Duxbury, 2002.Google Scholar
D. Francois, V. Wertz, and M. Verleysen. The concentration of fractional distances. IEEE T. Knowl. Data. En., 19(7):873--886, 2007. Google ScholarDigital Library
L. A. Goodman. On the exact variance of products. J. Am. Stat. Assoc., 55(292):708--713, 1960.Google ScholarCross Ref
M. Grcar, D. Mladenic, B. Fortuna, and M. Grobelnik. Data sparsity issues in the collaborative filtering framework. In Proc. WebKDD Workshop, pages 58--76, 2005. Google ScholarDigital Library
A. Hinneburg, C. C. Aggarwal, and D. A. Keim. What is the nearest neighbor in high dimensional spaces? In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 506--515, 2000. Google ScholarDigital Library
M. Radovanovic, A. Nanopoulos, and M. Ivanovic. Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In Proc. Int. Conf. on Machine Learning (ICML), pages 865--872, 2009. Google ScholarDigital Library
B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Application of dimensionality reduction in recommender system. In Proc. WebKDD Workshop, 2000.Google Scholar
B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proc. World Wide Web Conf. (WWW), pages 285--295, 2001. Google ScholarDigital Library
J. Wang, A. P. de Vries, and M. J. T. Reinders. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proc. ACM Conf. on Research and Development in Information Retrieval (SIGIR), pages 501--508, 2006. Google ScholarDigital Library
K. Yu, X. Xu, M. Ester, and H.-P. Kriegel. Feature weighting and instance selection for collaborative filtering: An information--theoretic approach. Knowl. Inf. Syst., 5(2):201--224, 2003. Google ScholarDigital Library

Index Terms

How does high dimensionality affect collaborative filtering?
1. Information systems
  1. Information retrieval
  2. Information systems applications

Recommendations

Improving Neighborhood-Based Collaborative Filtering by Reducing Hubness
ICMR '14: Proceedings of International Conference on Multimedia Retrieval

For recommending multimedia items, collaborative filtering (CF) denotes the technique of automatically predicting a user's rating or preference for an item by exploiting item preferences of a (large) group of other users. In traditional memory-based (or ...
Read More
The wisdom of the few: a collaborative filtering approach based on expert opinions from the web
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Nearest-neighbor collaborative filtering provides a successful means of generating recommendations for web users. However, this approach suffers from several shortcomings, including data sparsity and noise, the cold-start problem, and scalability. In ...
Read More
On the existence of obstinate results in vector space models
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

The vector space model (VSM) is a popular and widely applied model in information retrieval (IR). VSM creates vector spaces whose dimensionality is usually high (e.g., tens of thousands of terms). This may cause various problems, such as susceptibility ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RecSys '09: Proceedings of the third ACM conference on Recommender systems
October 2009
442 pages
ISBN:9781605584355
DOI:10.1145/1639714
General Chairs:
Lawrence Bergman
IBM Research, NY, USA
,
Alex Tuzhilin
New York University, NY, USA
,
Program Chairs:
Robin Burke
DePaul University, USA
,
Alexander Felfernig
Graz University of Technology, Austria
,
Lars Schmidt-Thieme
University of Hildesheim, Germany
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
collaborative filtering
cosine similarity
curse of dimensionality
hubs
nearest neighbors
similarity concentration
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate254of1,295submissions,20%
Upcoming Conference
RecSys '24

Sponsor:

sigchi

18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 406
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How does high dimensionality affect collaborative filtering?

RecSys '09: Proceedings of the third ACM conference on Recommender systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving Neighborhood-Based Collaborative Filtering by Reducing Hubness

The wisdom of the few: a collaborative filtering approach based on expert opinions from the web

On the existence of obstinate results in vector space models