skip to main content
10.1145/2939672.2939806acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Privacy-preserving Class Ratio Estimation

Published:13 August 2016Publication History

ABSTRACT

In this paper we present learning models for the class ratio estimation problem, which takes as input an unlabeled set of instances and predicts the proportions of instances in the set belonging to the different classes. This problem has applications in social and commercial data analysis. Existing models for class-ratio estimation however require instance-level supervision. Whereas in domains like politics, and demography, set-level supervision is more common. We present a new method for directly estimating class-ratios using set-level supervision. Another serious limitation in applying these techniques to sensitive domains like health is data privacy. We propose a novel label privacy-preserving mechanism that is well-suited for supervised class ratio estimation and has guarantees for achieving efficient differential privacy, provided the per-class counts are large enough. We derive learning bounds for the estimation with and without privacy constraints, which lead to important insights for the data-publisher. Extensive empirical evaluation shows that our model is more accurate than existing methods and that the proposed privacy mechanism and learning model are well-suited for each other.

Skip Supplemental Material Section

Supplemental Material

kdd2016_iyer_ratio_estimation_01-acm.mp4

mp4

360.1 MB

References

  1. R. Bassily, A. Smith, and A. Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 464--473. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: The sulq framework. In PODS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Chaudhuri and D. Hsu. Sample complexity bounds for differentially private learning. In COLT, 2011.Google ScholarGoogle Scholar
  4. K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12:1069--1109, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Chen, N. Mohammed, B. C. M. Fung, B. C. Desai, and L. Xiong. Publishing set-valued data via differential privacy. PVLDB, 4(11), 2011.Google ScholarGoogle Scholar
  6. R. Chen, Q. Xiao, Y. Zhang, and J. Xu. Differentially private high-dimensional data publication via sampling-based inference. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 129--138. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Privacy aware learning. In NIPS. 2012.Google ScholarGoogle Scholar
  8. C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3--4):211--407, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. J. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723--773, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Gretton, A. J. Smola, J. Huang, M. Schmittfull, K. M. Borgwardt, and B. Schölkopf. Covariate shift and local learning by distribution matching, pages 131--160. MIT Press, Cambridge, MA, USA, 2 2009.Google ScholarGoogle Scholar
  11. C. Guttmann, X. Sun, C. Rao, C. Queiroz, and B. I. Rubinstein. On the challenges of balancing privacy and utility of open health data. In Joint Proceedings of the Workshop on AI Problems and Approaches for Intelligent Environments and Workshop on Semantic Cities, pages 43--47. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth. Differential privacy: An economic method for choosing epsilon. In IEEE 27th Computer Security Foundations Symposium, CSF, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Iyer, S. Nath, and S. Sarawagi. Maximum mean discrepancy for class ratio estimation: Convergence bounds and kernel selection. In ICML, pages 530--538, 2014.Google ScholarGoogle Scholar
  14. P. Jain and A. Thakurta. Differentially private learning with kernels. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 118--126, 2013.Google ScholarGoogle Scholar
  15. Z. Ji, Z. C. Lipton, and C. Elkan. Differential privacy and machine learning: a survey and review. CoRR, 2014.Google ScholarGoogle Scholar
  16. D. Kifer, A. Smith, and A. Thakurta. Private convex empirical risk minimization and high-dimensional regression. JMLR, 1:41, 2012.Google ScholarGoogle Scholar
  17. S. Le. Learning via Hilbert space embedding of distributions. PhD thesis, University of Sydney, School of Information Technologies, 2008.Google ScholarGoogle Scholar
  18. J. Lee, Y. Wang, and D. Kifer. Maximum likelihood postprocessing for differential privacy under consistency constraints. In ACM SIGKDD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Li, K. Swersky, and R. S. Zemel. Generative moment matching networks. In ICML, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Nock, G. Patrini, and A. Friedman. Rademacher observations, private data, and boosting. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 948--956, 2015.Google ScholarGoogle Scholar
  22. G. Patrini, R. Nock, T. Caetano, and P. Rivera. (almost) no label no cry. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 190--198, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Quadrianto, A. J. Smola, T. S. Caetano, and Q. V. Le. Estimating labels from label proportions. Journal of Machine Learning Research, 10:2349--2374, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. I. P. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. Learning in a large function space: Privacy-preserving mechanisms for svm learning. Journal of Privacy and Confidentiality, 4(1), 2012.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Rüping. SVM classifier estimation from group probabilities. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 911--918, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D.-T. Vo and Y. Zhang. Target-dependent twitter sentiment classification with rich automatic features. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, pages 1347--1353. AAAI Press, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Xin and T. Jaakkola. Controlling privacy in recommender systems. In Advances in Neural Information Processing Systems, pages 2618--2626, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. X. Yu, D. Liu, S. Kumar, T. Jebara, and S. Chang. ∝SVM for learning with label proportions. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16--21 June 2013, pages 504--512, 2013.Google ScholarGoogle Scholar
  29. K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In ICML, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Zhou, J. D. Lafferty, and L. A. Wasserman. Compressed and privacy-sensitive sparse regression. IEEE Transactions on Information Theory, 55(2):846--866, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy-preserving Class Ratio Estimation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2016
        2176 pages
        ISBN:9781450342322
        DOI:10.1145/2939672

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 August 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader