skip to main content
10.1145/1639714.1639771acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
short-paper

How does high dimensionality affect collaborative filtering?

Published:23 October 2009Publication History

ABSTRACT

A crucial operation in memory-based collaborative filtering (CF) is determining nearest neighbors (NNs) of users/items. This paper addresses two phenomena that emerge when CF algorithms perform NN search in high-dimensional spaces that are typical in CF applications. The first is similarity concentration and the second is the appearance of hubs (i.e. points which appear in $k$-NN lists of many other points). Through theoretical analysis and experimental evaluation we show that these phenomena are inherent properties of high-dimensional space, unrelated to other data properties like sparsity, and that they can impact CF algorithms by questioning the meaning and representativeness of discovered NNs. Moreover, we show that it is not easy to mitigate the phenomena using dimensionality reduction. Studying these phenomena aims to provide a better understanding of the limitations of memory-based CF and motivate the development of new algorithms that would overcome them.

References

  1. G. Casella and R. L. Berger. Statistical Inference, 2nd ed. Duxbury, 2002.Google ScholarGoogle Scholar
  2. D. Francois, V. Wertz, and M. Verleysen. The concentration of fractional distances. IEEE T. Knowl. Data. En., 19(7):873--886, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. A. Goodman. On the exact variance of products. J. Am. Stat. Assoc., 55(292):708--713, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  4. M. Grcar, D. Mladenic, B. Fortuna, and M. Grobelnik. Data sparsity issues in the collaborative filtering framework. In Proc. WebKDD Workshop, pages 58--76, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Hinneburg, C. C. Aggarwal, and D. A. Keim. What is the nearest neighbor in high dimensional spaces? In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 506--515, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Radovanovic, A. Nanopoulos, and M. Ivanovic. Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In Proc. Int. Conf. on Machine Learning (ICML), pages 865--872, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Application of dimensionality reduction in recommender system. In Proc. WebKDD Workshop, 2000.Google ScholarGoogle Scholar
  8. B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proc. World Wide Web Conf. (WWW), pages 285--295, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Wang, A. P. de Vries, and M. J. T. Reinders. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proc. ACM Conf. on Research and Development in Information Retrieval (SIGIR), pages 501--508, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Yu, X. Xu, M. Ester, and H.-P. Kriegel. Feature weighting and instance selection for collaborative filtering: An information--theoretic approach. Knowl. Inf. Syst., 5(2):201--224, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How does high dimensionality affect collaborative filtering?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        RecSys '09: Proceedings of the third ACM conference on Recommender systems
        October 2009
        442 pages
        ISBN:9781605584355
        DOI:10.1145/1639714

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 October 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate254of1,295submissions,20%

        Upcoming Conference

        RecSys '24
        18th ACM Conference on Recommender Systems
        October 14 - 18, 2024
        Bari , Italy

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader