skip to main content
10.1145/2566486.2568038acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Monitoring web browsing behavior with differential privacy

Published:07 April 2014Publication History

ABSTRACT

Monitoring web browsing behavior has benefited many data mining applications, such as top-K discovery and anomaly detection. However, releasing private user data to the greater public would concern web users about their privacy, especially after the incident of AOL search log release where anonymization was not correctly done. In this paper, we adopt differential privacy, a strong, provable privacy definition, and show that differentially private aggregates of web browsing activities can be released in real-time while preserving the utility of shared data. Our proposed algorithms utilize the rich correlation of the time series of aggregated data and adopt a state-space approach to estimate the underlying, true aggregates from the perturbed values by the differential privacy mechanism. We evaluate our algorithms with real-world web browsing data. Utility evaluations with three metrics demonstrate that the quality of the private, released data by our solutions closely resembles that of the original, unperturbed aggregates.

References

  1. M. Barbaro and T. Zeller. A face is exposed for aol searcher no. 4417749. The New York Times, Aug. 2006.Google ScholarGoogle Scholar
  2. A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th annual ACM symposium on Theory of computing, pages 609--618, New York, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Bonomi, L. Xiong, and J. J. Lu. Linkit: privacy preserving record linkage and integration via transformations. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 1029--1032, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. White. Visualization of navigation patterns on a web site using model-based clustering. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '00, pages 280--284, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Canali and D. Balzarotti. Behind the scenes of online attacks: an analysis of exploitation behaviors on the web. In NDSS 2013, 20th Annual Network and Distributed System Security Symposium, February 24--27, 2013, San Diego, CA, United States, San Diego, UNITED STATES, 02 2013.Google ScholarGoogle Scholar
  6. T.-H. Chan, M. Li, E. Shi, and W. Xu. Differentially private continual monitoring of heavy hitters from distributed streams. In S. Fischer-Hübner and M. Wright, editors, Privacy Enhancing Technologies, volume 7384 of Lecture Notes in Computer Science, pages 140--159. Springer Berlin Heidelberg, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T.-H. H. Chan, E. Shi, and D. Song. Private and continual release of statistics. In Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II, pages 405--417, Heidelberg, 2010. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Chlebus and J. Brazier. Nonstationary poisson modeling of web browsing session arrivals. Inf. Process. Lett., 102(5):187--190, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Cooley, B. Mobasher, and J. Srivastava. Web mining: information and pattern discovery on the world wide web. In Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE International Conference on, pages 558--567, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In In Proceedings of the 3rd Theory of Cryptography Conference, pages 265--284, Heidelberg, 2006. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential privacy under continual observation. In Proceedings of the 42nd ACM symposium on Theory of computing, pages 715--724, New York, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Egelman, L. F. Cranor, and J. Hong. You've been warned: an empirical study of the effectiveness of web browser phishing warnings. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '08, pages 1065--1074, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Eirinaki and M. Vazirgiannis. Web mining for web personalization. ACM Trans. Internet Technol., 3(1):1--27, Feb. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Fan and L. Xiong. Real-time aggregate monitoring with differential privacy. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 2169--2173, New York, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Fan and L. Xiong. An adaptive approach to real-time aggregate monitoring with differential privacy. IEEE Transactions on Knowledge and Data Engineering, 99(PrePrints):1, 2013.Google ScholarGoogle Scholar
  16. M. Götz, S. Nath, and J. Gehrke. Maskit: privately releasing user context streams for personalized mobile applications. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 289--300, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. E. Kalman et al. A new approach to linear filtering and prediction problems. Journal of basic Engineering, 82(1):35--45, 1960.Google ScholarGoogle Scholar
  18. A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th international conference on World wide web, WWW '09, pages 171--180, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Kosala and H. Blockeel. Web mining research: a survey. SIGKDD Explor. Newsl., 2(1):1--15, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Kumar and A. Tomkins. A characterization of online browsing behavior. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 561--570, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. volume 53, pages 89--97, New York, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Papadimitriou, F. Li, G. Kollios, and P. S. Yu. Time series compressibility and privacy. VLDB '07, pages 459--470. VLDB Endowment, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 735--746, New York, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl., 1(2):12--23, Jan. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Wang, Y. He, E. Rundensteiner, and J. F. Naughton. Utility-maximizing event stream suppression. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 589--600, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Williams and F. McSherry. Probabilistic inference and differential privacy. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2451--2459. 2010.Google ScholarGoogle Scholar
  27. J. Xu, Z. Zhang, X. Xiao, Y. Yang, and G. Yu. Differentially private histogram publication. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, pages 32--43, Washington, DC, 2012. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Yan, D. Yuan, X. Xing, and Q. Jia. Kalman filtering parameter optimization techniques based on genetic algorithm. In Automation and Logistics, 2008. ICAL 2008. IEEE International Conference on, pages 1717--1720, 2008.Google ScholarGoogle Scholar
  29. H. Yu, D. Zheng, B. Y. Zhao, and W. Zheng. Understanding user behavior in large-scale video-on-demand systems. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, EuroSys '06, pages 333--344, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Monitoring web browsing behavior with differential privacy

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WWW '14: Proceedings of the 23rd international conference on World wide web
          April 2014
          926 pages
          ISBN:9781450327442
          DOI:10.1145/2566486

          Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 April 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WWW '14 Paper Acceptance Rate84of645submissions,13%Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader