ABSTRACT
Monitoring web browsing behavior has benefited many data mining applications, such as top-K discovery and anomaly detection. However, releasing private user data to the greater public would concern web users about their privacy, especially after the incident of AOL search log release where anonymization was not correctly done. In this paper, we adopt differential privacy, a strong, provable privacy definition, and show that differentially private aggregates of web browsing activities can be released in real-time while preserving the utility of shared data. Our proposed algorithms utilize the rich correlation of the time series of aggregated data and adopt a state-space approach to estimate the underlying, true aggregates from the perturbed values by the differential privacy mechanism. We evaluate our algorithms with real-world web browsing data. Utility evaluations with three metrics demonstrate that the quality of the private, released data by our solutions closely resembles that of the original, unperturbed aggregates.
- M. Barbaro and T. Zeller. A face is exposed for aol searcher no. 4417749. The New York Times, Aug. 2006.Google Scholar
- A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th annual ACM symposium on Theory of computing, pages 609--618, New York, 2008. ACM. Google ScholarDigital Library
- L. Bonomi, L. Xiong, and J. J. Lu. Linkit: privacy preserving record linkage and integration via transformations. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 1029--1032, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. White. Visualization of navigation patterns on a web site using model-based clustering. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '00, pages 280--284, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- D. Canali and D. Balzarotti. Behind the scenes of online attacks: an analysis of exploitation behaviors on the web. In NDSS 2013, 20th Annual Network and Distributed System Security Symposium, February 24--27, 2013, San Diego, CA, United States, San Diego, UNITED STATES, 02 2013.Google Scholar
- T.-H. Chan, M. Li, E. Shi, and W. Xu. Differentially private continual monitoring of heavy hitters from distributed streams. In S. Fischer-Hübner and M. Wright, editors, Privacy Enhancing Technologies, volume 7384 of Lecture Notes in Computer Science, pages 140--159. Springer Berlin Heidelberg, 2012. Google ScholarDigital Library
- T.-H. H. Chan, E. Shi, and D. Song. Private and continual release of statistics. In Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II, pages 405--417, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
- E. Chlebus and J. Brazier. Nonstationary poisson modeling of web browsing session arrivals. Inf. Process. Lett., 102(5):187--190, May 2007. Google ScholarDigital Library
- R. Cooley, B. Mobasher, and J. Srivastava. Web mining: information and pattern discovery on the world wide web. In Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE International Conference on, pages 558--567, 1997. Google ScholarDigital Library
- C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In In Proceedings of the 3rd Theory of Cryptography Conference, pages 265--284, Heidelberg, 2006. Springer-Verlag. Google ScholarDigital Library
- C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential privacy under continual observation. In Proceedings of the 42nd ACM symposium on Theory of computing, pages 715--724, New York, 2010. ACM. Google ScholarDigital Library
- S. Egelman, L. F. Cranor, and J. Hong. You've been warned: an empirical study of the effectiveness of web browser phishing warnings. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '08, pages 1065--1074, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- M. Eirinaki and M. Vazirgiannis. Web mining for web personalization. ACM Trans. Internet Technol., 3(1):1--27, Feb. 2003. Google ScholarDigital Library
- L. Fan and L. Xiong. Real-time aggregate monitoring with differential privacy. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 2169--2173, New York, 2012. ACM. Google ScholarDigital Library
- L. Fan and L. Xiong. An adaptive approach to real-time aggregate monitoring with differential privacy. IEEE Transactions on Knowledge and Data Engineering, 99(PrePrints):1, 2013.Google Scholar
- M. Götz, S. Nath, and J. Gehrke. Maskit: privately releasing user context streams for personalized mobile applications. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 289--300, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- R. E. Kalman et al. A new approach to linear filtering and prediction problems. Journal of basic Engineering, 82(1):35--45, 1960.Google Scholar
- A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th international conference on World wide web, WWW '09, pages 171--180, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- R. Kosala and H. Blockeel. Web mining research: a survey. SIGKDD Explor. Newsl., 2(1):1--15, June 2000. Google ScholarDigital Library
- R. Kumar and A. Tomkins. A characterization of online browsing behavior. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 561--570, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. volume 53, pages 89--97, New York, 2010. ACM. Google ScholarDigital Library
- S. Papadimitriou, F. Li, G. Kollios, and P. S. Yu. Time series compressibility and privacy. VLDB '07, pages 459--470. VLDB Endowment, 2007. Google ScholarDigital Library
- V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 735--746, New York, 2010. ACM. Google ScholarDigital Library
- J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl., 1(2):12--23, Jan. 2000. Google ScholarDigital Library
- D. Wang, Y. He, E. Rundensteiner, and J. F. Naughton. Utility-maximizing event stream suppression. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 589--600, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- O. Williams and F. McSherry. Probabilistic inference and differential privacy. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2451--2459. 2010.Google Scholar
- J. Xu, Z. Zhang, X. Xiao, Y. Yang, and G. Yu. Differentially private histogram publication. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, pages 32--43, Washington, DC, 2012. IEEE Computer Society. Google ScholarDigital Library
- J. Yan, D. Yuan, X. Xing, and Q. Jia. Kalman filtering parameter optimization techniques based on genetic algorithm. In Automation and Logistics, 2008. ICAL 2008. IEEE International Conference on, pages 1717--1720, 2008.Google Scholar
- H. Yu, D. Zheng, B. Y. Zhao, and W. Zheng. Understanding user behavior in large-scale video-on-demand systems. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, EuroSys '06, pages 333--344, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
Index Terms
- Monitoring web browsing behavior with differential privacy
Recommendations
A Novel Differential Privacy Approach that Enhances Classification Accuracy
C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software EngineeringIn the recent past, there has been a tremendous increase of large repositories of data, examples being in healthcare data, consumer data from retailers, and airline passenger data. These data are continually being shared with interested parties, either ...
Differential Privacy: Now it's Getting Personal
POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesDifferential privacy provides a way to get useful information about sensitive data without revealing much about any one individual. It enjoys many nice compositionality properties not shared by other approaches to privacy, including, in particular, ...
Applying Differential Privacy to Matrix Factorization
RecSys '15: Proceedings of the 9th ACM Conference on Recommender SystemsRecommender systems are increasingly becoming an integral part of on-line services. As the recommendations rely on personal user information, there is an inherent loss of privacy resulting from the use of such systems. While several works studied ...
Comments