ABSTRACT
We consider the problem of producing recommendations from collective user behavior while simultaneously providing guarantees of privacy for these users. Specifically, we consider the Netflix Prize data set, and its leading algorithms, adapted to the framework of differential privacy.
Unlike prior privacy work concerned with cryptographically securing the computation of recommendations, differential privacy constrains a computation in a way that precludes any inference about the underlying records from its output. Such algorithms necessarily introduce uncertainty--i.e., noise--to computations, trading accuracy for privacy.
We find that several of the leading approaches in the Netflix Prize competition can be adapted to provide differential privacy, without significantly degrading their accuracy. To adapt these algorithms, we explicitly factor them into two parts, an aggregation/learning phase that can be performed with differential privacy guarantees, and an individual recommendation phase that uses the learned correlations and an individual's data to provide personalized recommendations. The adaptations are non-trivial, and involve both careful analysis of the per-record sensitivity of the algorithms to calibrate noise, as well as new post-processing steps to mitigate the impact of this noise.
We measure the empirical trade-off between accuracy and privacy in these adaptations, and find that we can provide non-trivial formal privacy guarantees while still outperforming the Cinematch baseline Netflix provides.
Supplemental Material
- C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In K. Bohm, C. S. Jensen, L. M. Haas, M. L. Kersten, P.-A. Larson, and B. C. Ooi, editors, VLDB, pages 901--909. ACM, 2005. Google ScholarDigital Library
- E. Aimeur, G. Brassard, J. M. Fernandez, and F. S. M. Onana. Alambic: a privacy-preserving recommender system for electronic commerce. Int. J. Information Security, 7(5):307--334, 2008. Google ScholarDigital Library
- E. Aimeur, G. Brassard, J. M. Fernandez, F. S. M. Onana, and Z. Rakowski. Experimental demonstration of a hybrid privacy-preserving recommender system. In ARES, pages 161--170. IEEE Computer Society, 2008. Google ScholarDigital Library
- R. M. Bell and Y. Koren. Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In ICDM, pages 43--52. IEEE Computer Society, 2007. Google ScholarDigital Library
- R. M. Bell, Y. Koren, and C. Volinsky. The BellKor solution to the Netflix Prize. Available from http://www.netflixprize.com, 2007.Google Scholar
- R. M. Bell, Y. Koren, and C. Volinsky. The BellKor 2008 solution to the Netflix Prize. Available from http://www.netflixprize.com, 2008.Google Scholar
- A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the SuLQ framework. In C. Li, editor, PODS, pages 128--138. ACM, 2005. Google ScholarDigital Library
- J. Brickell and V. Shmatikov. The cost of privacy: destruction of data-mining utility in anonymized data publishing. In Li et al. {20}, pages 70--78. Google ScholarDigital Library
- J. Calandrino, A. Narayanan, E. Felten, and V. Shmatikov. Don't review that book: Privacy risks of collaborative filtering. Manuscript, 2009.Google Scholar
- J. F. Canny. Collaborative filtering with privacy. In IEEE Symposium on Security and Privacy, pages 45--57, 2002. Google ScholarDigital Library
- J. F. Canny. Collaborative filtering with privacy via factor analysis. In SIGIR, pages 238--245. ACM, 2002. Google ScholarDigital Library
- A. Dasgupta, J. E. Hopcroft, and F. McSherry. Spectral analysis of random graphs with skewed degree distributions. In FOCS, pages 602--610. IEEE Computer Society, 2004. Google ScholarDigital Library
- C. Dwork. Differential privacy. Invited talk. In Automata, Languages and Programming--ICALP (2), volume 4052 of Lecture Notes in Computer Science, pages 1--12. Springer, 2006. Google ScholarDigital Library
- C. Dwork. An ad omnia approach to defining and achieving private data analysis. In F. Bonchi, E. Ferrari, B. Malin, and Y. Saygin, editors, PinKDD, volume 4890 of Lecture Notes in Computer Science, pages 1--13. Springer, 2007. Google ScholarDigital Library
- C. Dwork. Differential privacy: A survey of results. In M. Agrawal, D.-Z. Du, Z. Duan, and A. Li, editors, Theory and Applications of Models of Computation, 5th International Conference, TAMC 2008, Xi'an, China, April 25-29, 2008. Proceedings, volume 4978 of Lecture Notes in Computer Science, pages 1--19. Springer, 2008. Google ScholarDigital Library
- C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In S. Vaudenay, editor, Advances in Cryptology--EUROCRYPT 2006, volume 4004 of Lecture Notes in Computer Science, pages 486--503. Springer, May 2006. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In S. Halevi and T. Rabin, editors, Theory of Cryptography Conference--TCC 2006, volume 3876 of Lecture Notes in Computer Science, pages 265--284. Springer, 2006. Google ScholarDigital Library
- S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In Li et al. {20}, pages 265--273. Google ScholarDigital Library
- Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Li et al. {20}, pages 426--434. Google ScholarDigital Library
- Y. Li, B. Liu, and S. Sarawagi, editors. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008. ACM, 2008.Google ScholarDigital Library
- A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE, pages 277--286. IEEE, 2008. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, pages 111--125. IEEE Computer Society, 2008. Google ScholarDigital Library
- L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002. Google ScholarDigital Library
Index Terms
- Differentially private recommender systems: Building privacy into the Netflix Prize contenders
Recommendations
Privacy-preserving topic model for tagging recommender systems
Tagging recommender systems provide users the freedom to explore tags and obtain recommendations. The releasing and sharing of these tagging datasets will accelerate both commercial and research work on recommender systems. However, releasing the ...
ReuseKNN: Neighborhood Reuse for Differentially Private KNN-Based Recommendations
User-based KNN recommender systems (UserKNN) utilize the rating data of a target user’s k nearest neighbors in the recommendation process. This, however, increases the privacy risk of the neighbors, since the recommendations could expose the neighbors’ ...
A differentially private algorithm for location data release
The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Comments