ABSTRACT
The large-scale monitoring of computer users' software activities has become commonplace, e.g., for application telemetry, error reporting, or demographic profiling. This paper describes a principled systems architecture---Encode, Shuffle, Analyze (ESA)---for performing such monitoring with high utility while also protecting user privacy. The ESA design, and its Prochlo implementation, are informed by our practical experiences with an existing, large deployment of privacy-preserving software monitoring.
With ESA, the privacy of monitored users' data is guaranteed by its processing in a three-step pipeline. First, the data is encoded to control scope, granularity, and randomness. Second, the encoded data is collected in batches subject to a randomized threshold, and blindly shuffled, to break linkability and to ensure that individual data items get "lost in the crowd" of the batch. Third, the anonymous, shuffled data is analyzed by a specific analysis engine that further prevents statistical inference attacks on analysis results.
ESA extends existing best-practice methods for sensitive-data analytics, by using cryptography and statistical techniques to make explicit how data is elided and reduced in precision, how only common-enough, anonymous data is analyzed, and how this is done for only specific, permitted purposes. As a result, ESA remains compatible with the established workflows of traditional database analysis.
Strong privacy guarantees, including differential privacy, can be established at each processing step to defend against malice or compromise at one or more of those steps. Prochlo develops new techniques to harden those steps, including the Stash Shuffle, a novel scalable and efficient oblivious-shuffling algorithm based on Intel's SGX, and new applications of cryptographic secret sharing and blinding. We describe ESA and Prochlo, as well as experiments that validate their ability to balance utility and privacy.
Supplemental Material
- Abadi, M. Trusted Computing, Trusted Third Parties, and Verified Communications. In Security and Protection in Information Processing Systems (2004), pp. 291--308.Google ScholarCross Ref
- Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. Tensorflow: A system for large-scale machine learning. In Proc. of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2016), pp. 265--283. Google ScholarDigital Library
- Abadi, M., Boneh, D., Mironov, I., Raghunathan, A., and Segev, G. Message-Locked Encryption for Lock-Dependent Messages. In Advances in Cryptology--CRYPTO (2013), pp. 374--391.Google Scholar
- Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep Learning with Differential Privacy. In Proc. of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS) (2016), pp. 308--318. Google ScholarDigital Library
- Abadi, M., Erlingsson, Ú., Goodfellow, I., McMahan, H. B., Papernot, N., Mironov, I., Talwar, K., and Zhang, L. On the Protection of Private Information in Machine Learning Systems: Two Recent Approaches. In Proc. of IEEE 30th Computer Security Foundations Symposium (CSF) (2017), pp. 1--6.Google Scholar
- Avent, B., Korolova, A., Zeber, D., Hovden, T., and Livshits, B. BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model. In Proc. of the 26th USENIX Security Symposium (2017), pp. 747--764.Google Scholar
- Bassily, R., Nissim, K., Stemmer, U., and Thakurta, A. Practical Locally Private Heavy Hitters. CoRR abs/1707.0498 (2017). http://arxiv.org/abs/1707.04982.Google Scholar
- Batcher, K. E. Sorting Networks and their Applications. In AFIPS Spring Joint Computer Conference (1968), vol. 32, pp. 307--314. Google ScholarDigital Library
- Bellare, M., Keelveedhi, S., and Ristenpart, T. Message-Locked Encryption and Secure Deduplication. In Advances in Cryptology---EUROCRYPT (2013), pp. 296--312.Google Scholar
- Bindschaedler, V., Shokri, R., and Gunter, C. A. Plausible Deniability for Privacy-Preserving Data Synthesis. PVLDB 10, 5 (2017), 481--492. Google ScholarDigital Library
- Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., and Seth, K. Practical Secure Aggregation for Privacy Preserving Machine Learning. In Proc. of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS) (2017).Google ScholarDigital Library
- Bulck, J. V., Weichbrodt, N., Kapitza, R., Piessens, F., and Strackx, R. Telling Your Secrets without Page Faults: Stealthy Page Table-Based Attacks on Enclaved Execution. In Proc. of 26th USENIX Security Symposium (2017), pp. 1041--1056.Google Scholar
- Buse, R. P. L., and Zimmermann, T. Information Needs for Software Development Analytics. In Proc. of the 34th International Conference on Software Engineering (ICSE) (2012), pp. 987--996. Google ScholarDigital Library
- Chaudhry, G., Hamon, E. A., and Cormen, T. H. Relaxing the Problem-Size Bound for Out-Of-Core Column-Sort. In Proc. of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA) (2003), pp. 250--251. Google ScholarDigital Library
- Chen, R., Reznichenko, A., Francis, P., and Gehrke, J. Towards Statistical Queries over Distributed Private User Data. In Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI) (2012), pp. 169--182. Google ScholarDigital Library
- Chromium Projects. RAPPOR (Randomized Aggregatable Privacy Preserving Ordinal Responses). https://www.chromium.org/developers/design-documents/rappor, 2017.Google Scholar
- Cito, J., Leitner, P., Fritz, T., and Gall, H. C. The Making of Cloud Applications: An Empirical Study on Software Development for the Cloud. In Proc. of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE) (2015), pp. 393--403. Google ScholarDigital Library
- Corrigan-Gibbs, H., and Boneh, D. Prio: Private, Robust, and Scalable Computation of Aggregate Statistics. In Proc. of the 14th USENIX Symposium on Networked Systems Design and Implementation (NDSI) (2017), pp. 259--282. Google ScholarDigital Library
- Corrigan-Gibbs, H., Boneh, D., and Mazières, D. Riposte: An Anonymous Messaging System Handling Millions of Users. In Proc. of the 2015 IEEE Symposium on Security and Privacy (2015), pp. 321--338. Google ScholarDigital Library
- Dang, H., Dinh, T. T. A., Chang, E.-C., and Ooi, B. C. Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute. Proceedings on Privacy Enhancing Technologies, 3 (2017), 18--35.Google Scholar
- Demetriou, S., Merrill, W., Yang, W., Zhang, A., and Gunter, C. A. Free for All! Assessing User Data Exposure to Advertising Libraries on Android. In Proc. of the 23nd Annual Network and Distributed System Security Symposium (NDSS) (2016).Google ScholarCross Ref
- Denning, D. E. R. Cryptography and Data Security. Addison-Wesley, Boston, MA, USA, 1982.Google ScholarDigital Library
- Dinh, T. T. A., Saxena, P., Chang, E.-C., Ooi, B. C., and Zhang, C. M2R: Enabling Stronger Privacy in MapReduce Computation. In Proc. of the 24th USENIX Security Symposium (2015), pp. 447--462. Google ScholarDigital Library
- Douceur, J. R. The Sybil attack. In The First International Workshop on Peer-to-Peer Systems (IPTPS) (2002), pp. 251--260. Google ScholarDigital Library
- Dwork, C. Differential Privacy: A Survey of Results. In International Conference on Theory and Applications of Models of Computation (TAMC) (2008), pp. 1--19. Google ScholarDigital Library
- Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Proc. of the Third Conference on Theory of Cryptography (TCC) (2006), pp. 265--284. Google ScholarDigital Library
- Dwork, C., and Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9, 3--4 (Aug. 2014), 211--407. Google ScholarDigital Library
- Erlingsson, Ú., Pihur, V., and Korolova, A. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proc. of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS) (2014), pp. 1054--1067. Google ScholarDigital Library
- Evfimievski, A., Gehrke, J., and Srikant, R. Limiting Privacy Breaches in Privacy Preserving Data Mining. In Proc. of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) (2003), pp. 211--222. Google ScholarDigital Library
- Fanti, G., Pihur, V., and Erlingsson, U. Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries. Proceedings on Privacy Enhancing Technologies, 3 (2016), 41--61.Google ScholarCross Ref
- Gaboardi, M., Honaker, J., King, G., Nissim, K., Ullman, J., and Vadhan, S. P. PSI (Ψ): A Private data Sharing Interface. CoRR abs/1609.04340 (2016). http://arxiv.org/abs/1609.04340.Google Scholar
- Ganta, S. R., Kasiviswanathan, S. P., and Smith, A. Composition Attacks and Auxiliary Information in Data Privacy. In Proc. of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2008), pp. 265--273. Google ScholarDigital Library
- Gehrke, J., Hay, M., Lui, E., and Pass, R. Crowd-Blending Privacy. In Advances in Cryptology---CRYPTO (2012), pp. 479--496. Google ScholarDigital Library
- Glerum, K., Kinshumann, K., Greenberg, S., Aul, G., Orgovan, V., Nichols, G., Grant, D., Loihle, G., and Hunt, G. Debugging in the (Very) Large: Ten Years of Implementation and Experience. In Proc. of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP) (2009), ACM, pp. 103--116. Google ScholarDigital Library
- Google. gRPC: A High Performance, Open-Source Universal RPC Framework. http://grpc.io, 2017.Google Scholar
- Intel® Software Guard Extensions Programming Reference. https://software.intel.com/sites/default/files/managed/48/88/329298-002.pdf, Oct. 2014.Google Scholar
- Intel® Software Guard Extensions (Intel® SGX) SDK. https://software.intel.com/en-us/sgx-sdk, 2017.Google Scholar
- Johnson, N., Near, J. P., and Song, D. Towards Practical Differential Privacy for SQL Queries. CoRR abs/1706.09479 (2017). https://arxiv.org/abs/1706.09479.Google Scholar
- Jovic, M., Adamoli, A., and Hauswirth, M. Catch Me if You Can: Performance Bug Detection in the Wild. In Proc. of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA) (2011), pp. 155--170. Google ScholarDigital Library
- Klonowski, M., and Kutyłowski, M. Provable Anonymity for Networks of Mixes. In Information Hiding: 7th International Workshop (2005), pp. 26--38. Google ScholarDigital Library
- Koerner, B. I. Inside the Cyberattack That Shocked the US Government. Wired (Oct. 2016). https://www.wired.com/2016/10/inside-cyberattack-shocked-us-government/.Google Scholar
- Laurie, B. Certificate Transparency. Commun. ACM 57, 10 (Oct. 2014), 40--46. https://github.com/google/certificate-transparency. Google ScholarDigital Library
- Lee, S., Shih, M.-W., Gera, P., Kim, T., Kim, H., and Peinado, M. Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing. In Proc. of the 26th USENIX Security Symposium (2017), pp. 557--574.Google Scholar
- Leighton, T. Tight Bounds on the Complexity of Parallel Sorting. IEEE Trans. Comput. 34, 4 (Apr. 1985), 344--354. Google ScholarDigital Library
- Li, N., Qardaji, W., and Su, D. On Sampling, Anonymization, and Differential Privacy or, k-anonymization Meets Differential Privacy. In Proc. of the 7th ACM Symposium on Information, Computer and Communications Security (ASIACCS) (2012), pp. 32--33. Google ScholarDigital Library
- Liblit, B., Naik, M., Zheng, A. X., Aiken, A., and Jordan, M. I. Scalable Statistical Bug Isolation. In Proc. of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation (2005), pp. 15--26. Google ScholarDigital Library
- Lie, D., and Maniatis, P. Glimmers: Resolving the Privacy/Trust Quagmire. In Proc. of the 16th Workshop on Hot Topics in Operating Systems (HotOS) (2017), pp. 94--99. Google ScholarDigital Library
- Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. l-diversity: Privacy Beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (Mar. 2007). Google ScholarDigital Library
- Maniatis, P., and Baker, M. Secure History Preservation Through Timeline Entanglement. In Proc. of the 11th USENIX Security Symposium (2002), pp. 297--312. Google ScholarDigital Library
- Maniatis, P., Mironov, I., and Talwar, K. Oblivious Stash Shuffle. CoRR abs/1709.07553 (2017). https://arxiv.org/abs/1709.07553.Google Scholar
- McSherry, F., and Mahajan, R. Differentially-Private Network Trace Analysis. SIGCOMM Comput. Commun. Rev. 40, 4 (Aug. 2010), 123--134. Google ScholarDigital Library
- McSherry, F. D. Privacy Integrated Queries: An Extensible Platform for Privacy-Preserving Data Analysis. In Proc. of the 2009 ACM SIGMOD International Conference on Management of Data (2009), pp. 19--30. Google ScholarDigital Library
- Mohan, P., Thakurta, A., Shi, E., Song, D., and Culler, D. GUPT: Privacy Preserving Data Analysis Made Easy. In Proc. of the 2012 ACM SIGMOD International Conference on Management of Data (2012), pp. 349--360. Google ScholarDigital Library
- Narayanan, A., and Shmatikov, V. Robust De-anonymization of Large Sparse Datasets. In Proc. of the 2008 IEEE Symposium on Security and Privacy (2008), pp. 111--125. Google ScholarDigital Library
- National Institute of Standards and Technology. Digital Signature Standard (DSS). FIPS PUB 186-2, Jan. 2000.Google Scholar
- Newman, L. H. How to Protect Yourself from that Massive Equifax Breach. Wired (Sept. 2017). https://www.wired.com/story/how-to-protect-yourself-from-that-massive-equifax-breach/.Google Scholar
- Ohrimenko, O., Costa, M., Fournet, C., Gkantsidis, C., Kohlweiss, M., and Sharma, D. Observing and Preventing Leakage in MapReduce. In Proc. of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS) (2015), pp. 1570--1581. Google ScholarDigital Library
- Ohrimenko, O., Goodrich, M. T., Tamassia, R., and Upfal, E. The Melbourne Shuffle: Improving Oblivious Storage in the Cloud. In Automata, Languages, and Programming: 41st International Colloquium, ICALP 2014, Part II (2014), pp. 556--567.Google Scholar
- Oliner, A. J., Iyer, A. P., Stoica, I., Lagerspetz, E., and Tarkoma, S. Carat: Collaborative Energy Diagnosis for Mobile Devices. In Proc. of the 11th Conference on Embedded Networked Sensor Systems (SenSys) (2013), ACM, pp. 10:1--10:14. Google ScholarDigital Library
- Papadopoulos, E. P., Diamantaris, M., Papadopoulos, P., Petsas, T., Ioannidis, S., and Markatos, E. P. The Long-Standing Privacy Debate: Mobile Websites vs Mobile Apps. In Proc. of the 26th International Conference on World Wide Web (WWW) (2017), pp. 153--162. Google ScholarDigital Library
- Ravindranath, L., Padhye, J., Agarwal, S., Mahajan, R., Obermiller, I., and Shayandeh, S. AppInsight: Mobile App Performance Monitoring in the Wild. In Proc. of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI) (2012), pp. 107--120. Google ScholarDigital Library
- Reis, C., Barth, A., and Pizano, C. Browser Security: Lessons from Google Chrome. Commun. ACM 52, 8 (Aug. 2009), 45--49. Google ScholarDigital Library
- Roy, I., Setty, S. T. V., Kilzer, A., Shmatikov, V., and Witchel, E. Airavat: Security and Privacy for MapReduce. In Proc. of the 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2010), pp. 297--312. Google ScholarDigital Library
- Saltzer, J. H., and Schroeder, M. D. The Protection of Information in Computer Systems. Proc. of the IEEE 63, 9 (1975), 1278--1308.Google Scholar
- Samarati, P. Protecting Respondents' Identities in Micro-data Release. IEEE Trans. on Knowl. and Data Eng. 13, 6 (Nov. 2001), 1010--1027. Google ScholarDigital Library
- Samarati, P., and Sweeney, L. Generalizing Data to Provide Anonymity when Disclosing Information (Abstract). In Proc. of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS) (1998), p. 188. Google ScholarDigital Library
- Schuster, F., Costa, M., Fournet, C., Gkantsidis, C., Peinado, M., Mainar-Ruiz, G., and Russinovich, M. VC3: Trustworthy Data Analytics in the Cloud Using SGX. In Proc. of the 2015 IEEE Symposium on Security and Privacy (2015), pp. 38--54. Google ScholarDigital Library
- Shamir, A. How to Share a Secret. Comm. of the ACM 22, 11 (1979), 612--613. Google ScholarDigital Library
- Spensky, C., Stewart, J., Yerukhimovich, A., Shay, R., Trachtenberg, A., Housley, R., and Cunningham, R. K. SoK: Privacy on Mobile Devices---It's Complicated. Proc. on Privacy Enhancing Technologies, 3 (2016), 96--116.Google ScholarCross Ref
- Tang, J., Korolova, A., Bai, X., Wang, X., and Wang, X. Privacy Loss in Apple's Implementation of Differential Privacy on macOS 10.12. arXiv:1709.02753 (2017). https://arxiv.org/abs/1709.02753.Google Scholar
- Vallina-Rodriguez, N., Sundaresan, S., Razaghpanah, A., Nithyanand, R., Allman, M., Kreibich, C., and Gill, P. Tracking the Trackers: Towards Understanding the Mobile Advertising and Tracking Ecosystem. CoRR abs/1609.07190 (2016). http://arxiv.org/abs/1609.07190.Google Scholar
- van den Hooff, J., Lazar, D., Zaharia, M., and Zeldovich, N. Vuvuzela: Scalable Private Messaging Resistant to Traffic Analysis. In Proc. of the 25th Symposium on Operating Systems Principles (SOSP) (2015), pp. 137--152. Google ScholarDigital Library
- Viega, J., Chandra, P., and Messier, M. Network Security with OpenSSL, 1st ed. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 2002. Google ScholarDigital Library
- Wang, T., Blocki, J., Li, N., and Jha, S. Locally Differentially Private Protocols for Frequency Estimation. In Proc. of the 26th USENIX Security Symposium (2017), pp. 729--745.Google Scholar
- Warner, S. L. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J. of the American Statistical Association 60, 309 (1965), 63--69.Google Scholar
- Xu, Y., Cui, W., and Peinado, M. Controlled-Channel Attacks: Deterministic Side Channels for Untrusted Operating Systems. In Proc. of the 2015 IEEE Symposium on Security and Privacy (2015), pp. 640--656. Google ScholarDigital Library
- Zhang, L., Bild, D. R., Dick, R. P., Mao, Z. M., and Dinda, P. Panappticon: Event-Based Tracing to Measure Mobile Application and Platform Performance. In Proc. of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) (2013), pp. 33:1--33:10. Google ScholarDigital Library
- Zheng, W., Dave, A., Beekman, J. G., Popa, R. A., Gonzalez, J. E., and Stoica, I. Opaque: An Oblivious and Encrypted Distributed Analytics Platform. In Proc. of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2017), pp. 283--298. Google ScholarDigital Library
Recommendations
Personalised anonymity for microdata release
Individual privacy protection in the released data sets has become an important issue in recent years. The release of microdata provides a significant information resource for researchers, whereas the release of person‐specific data poses a threat to ...
A new unpredictability-based radio frequency identification forward privacy model and a provably secure construction
The privacy model of radio frequency identification RFID systems is for formalizing the adversarial capabilities and the security requirements of RFID anonymity and untraceability. Existing unpredictability-based privacy models such as unp-privacy, eunp-...
An efficient privacy mechanism for electronic health records
Electronic health records (EHRs), digitization of patients' health record, offer many advantages over traditional ways of keeping patients' records, such as easing data management and facilitating quick access and real-time treatment. EHRs are a rich ...
Comments