Abstract
We consider the potential for network trace analysis while providing the guarantees of "differential privacy." While differential privacy provably obscures the presence or absence of individual records in a dataset, it has two major limitations: analyses must (presently) be expressed in a higher level declarative language; and the analysis results are randomized before returning to the analyst.
We report on our experiences conducting a diverse set of analyses in a differentially private manner. We are able to express all of our target analyses, though for some of them an approximate expression is required to keep the error-level low. By running these analyses on real datasets, we find that the error introduced for the sake of privacy is often (but not always) low even at high levels of privacy. We factor our learning into a toolkit that will be likely useful for other analyses. Overall, we conclude that differential privacy shows promise for a broad class of network analyses.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, 1994. Google ScholarDigital Library
- AOL search data scandal. http://en.wikipedia.org/ wiki/AOL_search_data_scandal. Retrieved 2010-16-01.Google Scholar
- M. Ayer, H. Brunk, G. Ewing, W. Reid, and E. Silverman. An empirical distribution function for sampling with incomplete information. The Annals of Mathematical Statistics, 26(4), 1955.Google ScholarCross Ref
- R. Chandra, R. Mahajan, V. Padmanabhan, and M. Zhang. CRAWDAD data set microsoft/osdi2006 (v. 2007-05-23).Google Scholar
- S. E. Coull, C. V. Wright, F. Monrose, M. P. Collins, and M. K. Reiter. Playing devil'cs advocate: Inferring sensitive information from anonymized network traces. In NDSS, 2007.Google Scholar
- CRAWDAD: A community resource for archiving wireless data at Dartmouth. http://crawdad.cs.dartmouth.edu/.Google Scholar
- C. Dwork. Differential privacy. In ICALP, 2006. Google ScholarDigital Library
- C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, 2006. Google ScholarDigital Library
- B. Eriksson, P. Barford, and R. Nowak. Network discovery from passive measurements. In SIGCOMM, 2008. Google ScholarDigital Library
- P. Gupta and N. McKeown. Algorithms for packet classification. IEEE Network, 15(2), 2001. Google ScholarDigital Library
- The Internet traffic archive. http://ita.ee.lbl.gov/.Google Scholar
- S. Kandula, R. Chandra, and D. Katabi. What's goingon? Learning communication rules in edge networks. In SIGCOMM, 2008. Google ScholarDigital Library
- A. Lakhina, M. Crovella, and C. Diot. Diagnosing network-wide traffic anomalies. In SIGCOMM, 2004. Google ScholarDigital Library
- F. McSherry. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In SIGMOD, 2009. Google ScholarDigital Library
- F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the Netflix prize contenders. In KDD, 2009. Google ScholarDigital Library
- G. Minshall. tcpdriv. http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html.Google Scholar
- J. Mirkovic. Privacy-safe network trace sharing via secure queries. In workshop on Network Data Anonymization, 2008. Google ScholarDigital Library
- P. Mittal, V. Paxson, R. Summer, and M. Winterrowd. Securing mediated trace access using black-box permutation analysis. In HotNets, 2009.Google Scholar
- J. C. Mogul and M. F. Arlitt. SC2D: An alternative to trace anonymization. In MineNet workshop, 2006. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Security and Privacy, 2008. Google ScholarDigital Library
- R. Pang, M. Allman, V. Paxson, and J. Lee. The devil and packet trace anonymization. SIGCOMM CCR, 36(1), 2006. Google ScholarDigital Library
- R. Pang and V. Paxson. A high-level programming environment for packet trace anonymization and transformation. In SIGCOMM, 2003. Google ScholarDigital Library
- Network trace analysis using PINQ. http://research.microsoft.com/pinq/networking.aspx.Google Scholar
- V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In SIGMOD, 2010. Google ScholarDigital Library
- J. Reed, A. J. Aviv, D. Wagner, A. Haeberlen, B. C. Pierce, and J. M. Smith. Differential privacy for collaborative security. In EuroSec, 2010. Google ScholarDigital Library
- B. Ribeiro, W. Chen, G. Miklau, and D. Towsley. Analyzing privacy in enterprise packet trace anonymization. In NDSS, 2008.Google Scholar
- S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm fingerprinting. In OSDI, 2004. Google ScholarDigital Library
- N. Spring, R. Mahajan, and T. Anderson. Quantifying the causes of path inflation. In SIGCOMM, 2003.Google ScholarDigital Library
- L. Sweeney. k-anonymity: A model for protecting privacy. Int'l Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems, 10(5), 2002. Google ScholarDigital Library
- K. V. Vishwanath and A. Vahdat. Swing: realistic and responsive network traffic generation. ToN, 17(3), 2009. Google ScholarDigital Library
- J. Xu, J. Fan, M. Ammar, and S. Moon. Prefix-preserving IP address anonymization: Measurement-based security evaluation and a new cryptography-based scheme. In ICNP, 2002. Google ScholarDigital Library
- Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ulfar Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008. Google ScholarDigital Library
- Y. Zhang and V. Paxson. Detecting stepping stones. In USENIX Security, 2000. Google ScholarDigital Library
Index Terms
- Differentially-private network trace analysis
Recommendations
Differentially-private network trace analysis
SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conferenceWe consider the potential for network trace analysis while providing the guarantees of "differential privacy." While differential privacy provably obscures the presence or absence of individual records in a dataset, it has two major limitations: ...
A differentially private algorithm for location data release
The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Differentially private regression analysis with dynamic privacy allocation
AbstractIn recent years, machine learning has reaped huge fruits in the domain of artificial intelligence. However, during the process of model training, machine learning models may be afflicted with the risk of disclosing sensitive ...
Comments