ABSTRACT
Research in data sanitization (including anonymization) emphasizes ways to prevent an adversary from desanitizing data. Most work focuses on using mathematical mappings to sanitize data. A few papers examine incorporation of privacy requirements, either in the guise of templates or prioritization. Essentially these approaches reduce the information that can be gleaned from a data set. In contrast, this paper considers both the need to ``desanitize'' and the need to support privacy. We consider conflicts between privacy requirements and the needs of analysts examining the redacted data. Our goal is to enable an informed decision about the effects of redacting, and failing to redact data. We begin with relationships among the data being examined, including relationships with a known data set and other, additional, external data. By capturing these relationships, desanitization techniques that exploit them can be identified, and the information that must be concealed in order to thwart them can be determined. Knowing that, a realistic assessment of whether the information and relationships are already widely known or available will enable the sanitizers to assess whether irreversible sanitization is possible, and if so, what to conceal to prevent desanitization.
- IDEF5 method report. Technical report, Knowledge Based Systems, Inc., College Station, TX 77840, 1994.Google Scholar
- Presidential decision directive/NSC-63: Critical infrastructure protection, May 1998.Google Scholar
- Final NIH statement on sharing research data, Feb. 2003.Google Scholar
- Homeland security presidential directive 7: Critical infrastructure identification, prioritization, and protection, Dec. 2003.Google Scholar
- Protecting personal health information in research: Understanding the HIPAA privacy rule. Publication 03--5388, National Institutes of Health, Bethesda, MD, 2003.Google Scholar
- DHS information sharing and access agreements. Publication 2009-01, Department of Homeland Security, May 2009.Google Scholar
- A. Acquisti and R. Gross. Predicting social security numbers from public data. Proceedings of the National Academy of Sciences, 106(27):10975--10980, July 2009.Google ScholarCross Ref
- D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGAC-SIGART Symposium on Principlkes of Database Systems, pages 247--255, 2001. Google ScholarDigital Library
- E. Astesiano, M. Bidoit, H. Kirchner, B. Krieg-Bruckner, P. Mosses, D. Sannella, and A. Tarlecki. CASL: the common algebraic specification language. Theoretical Computer Science, 286(2):153--196, 2002. Google ScholarDigital Library
- M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi. Blocking anonymity threats raised by frequent itemset mining. In Proceedings of the Fifth IEEE International Conference on Data Mining, Nov. 2005. Google ScholarDigital Library
- M. Barbaro and T. Zeller. A face is exposed for AOL searcher no. 4417749. New York Times, Aug. 9, 2006.Google Scholar
- R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05), pages 217--228, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- S. Bhansali and B. N. Grosof. Extending the SweetDeal approach for e-procurement using SweetRules and RuleML. In Proceedings of the 2005 Conference on Rules and Rule Markup Languages for the Semantic Web, pages 113--129, 2005. Google ScholarDigital Library
- B. Bhumiratana. Privacy Aware Micro Data Sanitization. PhD thesis, Dept. of Computer Science, University of California at Davis, Davis, CA 95616--8562, 2009. Google ScholarDigital Library
- B. Bhumiratana and M. Bishop. Privacy aware data sharing: Balancing the usability and privacy of datasets. In Proceedings of the 2nd International Conference on Pervsive Technologies Related to Assistive Environments (PETRA 2009), pages 1--8, New York, NY, USA, June 2009. ACM. Google ScholarDigital Library
- M. Bishop, B. Bhumiratana, R. Crawford, and K. Levitt. How to sanitize data? In Proceedings of the Thirteenth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE 2004), pages 217--222, Los Alamitos, CA, USA, June 2004. IEEE. Google ScholarDigital Library
- M. Bishop, R. Crawford, B. Bhumiratana, L. Clark, and K. Levitt. Some problems in sanitizing network data. In Proceedings of the Fifteenth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE 2004), pages 307--312, June 2006. Google ScholarDigital Library
- A. Blake and R. Nelson. Scalable architecture for prefix preserving anonymization of IP addresses. In Proceedings of the 8th international Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation, July 2008. Google ScholarDigital Library
- H. Boley, M. Kifer, P.-L. Patranjan, and A. Polleres. Rule interchange on the web. In Proceedings of the Third International Summer School on the Reasoning Web, pages 269--309, Sep. 2007. Google ScholarDigital Library
- H. Boley, S. Tabet, and G. Wagner. Design rationale of RuleML: A markup language for semantic web rules. In Proceedings of the Semantic Web Working Symposium, 2001.Google Scholar
- R. Boyle. The unsuccessful experiment. In Certain Physiological Essays. Henry Herringman, 1661.Google Scholar
- D. Brickley and R. Guha. Resource description framework (RDF) schema specification 1.0. Technical report, W3C, Oct. 2000.Google Scholar
- M. Burkhart, D. Brauckho, and M. May. On the utility of anonymized ow traces for anomaly detection. In Proceedings of the 19th ITC Specialist Seminar on Network Usage and Traffic, Oct. 2008.Google Scholar
- M. Burkhart, D. Schatzmann, B. Trammell, E. Boschi, and B. Plattner. The role of network trace anonymization under attack. ACM SIGCOMM Computer Communication Review, 40(1):5--11, January 2010. Google ScholarDigital Library
- J. Cao, B. Carminati, E. Ferrari, and K. L. Tan. CASTLE: A delay-constrained scheme for ks-anonymizing data streams. In Proceedings of the IEEE 24th International Conference on Data Engineering ICDE 2008, pages 1376--1378, 2008. Google ScholarDigital Library
- J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. Jena: Implementing the semantic web recommendations. In Proceedings of the 13th International World Wide Web Conference, pages 74--83, 2004. Google ScholarDigital Library
- S. E. Coull, C. V. Wright, A. D. Keromytis, F. Monrose, and M. K. Reiter. Taming the devil: Techniques for evaluating anonymized network data. In Proceedings of the 15th Network and Distributed System Security Symposium, 2008.Google Scholar
- S. E. Coull, C. V. Wright, F. Monrose, M. P. Collins, and M. K. Reiter. Playing devil's advocate: Inferring sensitive information from anonymized network traces. In Proceedings of the 14th Network and Distributed System Security Symposium, Feb. 2007.Google Scholar
- R. Crawford, M. Bishop, B. Bhumiratana, L. Clark, and K. Levitt. Sanitization models and their limitations. In Proceedings of the 2006 Workshop on New Security Paradigms (NSPW 2006), pages 41--56, New York, NY, USA, Sep. 2006. ACM. Google ScholarDigital Library
- M. Dean, G. Schreiber, S. Bechhofer, F. Van Harmelen, J. Hendler, I. Horrocks, D. McGuinness, P. Patel-Schneider, and L. Stein. OWL web ontology language reference. Technical report, W3C, Feb. 2004.Google Scholar
- H. Delugach. Common logic (CL): A framework for a family of logic-based languages. Standard ISO/IEC 24707:2007, International Organization for Standardization, 2007.Google Scholar
- J. Dolby, A. Fokoue, A. Kalyanpur, E. Schonberg, and K. Srinivas. Scalable highly expressive reasoner (SHER). Web Semantics, 7(4):357--361, Dec. 2009. Google ScholarDigital Library
- J. Fan, J. Xu, M. Ammar, and S. Moon. Prefix-preserving IP address anonymization: Measurement-based security evaluation and a new cryptography-based scheme. Computer Networks, 46(2):253--272, 2004. Google ScholarDigital Library
- S. R. Ganta and R. Acharya. Adaptive data anonymization against information fusion based privacy attacks on enterprise data. In Proceedings of the 2008 ACM Symposium on Applied Computing, pages 1075--1076, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- S. R. Ganta and R. Acharya. On breaching enterprise data privacy through adversarial information fusion. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop, pages 246--249, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarDigital Library
- J. Gardner and L. Xiong. An integrated framework for de-identifying unstrutured medical data. Data and Knowledge Engineering, 68(12):1441--1451, Dec. 2009. Google ScholarDigital Library
- M. R. Genesereth and R. E. Fikes. Knowledge interchange format version 3.0 reference manual. Technical reportlogic-92--1, Computer Science Department, Stanford University, Stanford, CA, 1992.Google Scholar
- P. Golle. Revisiting the uniqueness of simple demographics in the us population. In Proceedings of the Fifth ACM Workshop on Privacy in Electronic Society, pages 77--80, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- B. C. Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel-Schneider, and U. Sattler. OWL 2: The next step for OWL. Web Semantics: Science, Services and Agents on the World Wide Web, 6(4):309--322, 2008. Google ScholarDigital Library
- B. N. Grosof, I. Horrocks, R. Volz, and S. Decker. Description logic programs: Combining logic programs with description logic. In Proceedings of the 12th International Conference on the World Wide Web, pages 48--57. ACM, 2003. Google ScholarDigital Library
- X. He, J. Vaidya, B. Shafiq, N. Adam, and V. Atluri. Preserving privacy in social networks: A structure-aware approach. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies WI-IAT '09, volume 1, pages 647--654, Oct. 2009. Google ScholarDigital Library
- I. Horrock, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean. SWRL: A semantic web rule language combining OWL and RuleML. Technical report, W3C, May 2004.Google Scholar
- I. Horrocks. DAML+OIL: a description logic for the semantic web. Bulletin of the Technical Committee on, 51:4, 2002.Google Scholar
- M. Z. Islam and L. Brankovic. A framework for privacy preserving classification in data mining. In Proceedings of the Second Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation, pages 163--168, 2004. Google ScholarDigital Library
- M. Jang and J.-C. Sohn. Bossam: An extended rule engine for OWL inferencing. In Proceedings of the 2004 Conference on Rules and Rule Markup Languages for the Semantic Web, pages 128--138, 2004.Google ScholarCross Ref
- J. Jin and X. Wang. On the e ectiveness of low latency anonymous network in the presence of timing attack. In Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, pages 429--438, 2009.Google ScholarCross Ref
- P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias. Preventing location-based identity inference in anonymous spatial queries. IEEE Transactions on Knowledge and Data Engineering, 19(12):1719--1733, 2007. Google ScholarDigital Library
- E. E. Kenneally and K. Cla y. Dialing privacy and utility: a proposed data-sharing framework to advance internet research. IEEE Security and Privacy, 8(2), Mar. 2010. Google ScholarDigital Library
- A. Khoshgozaran, H. Shirani-Mehr, and C. Shahabi. SPIRAL: A scalable private information retrieval approach to location privacy. In Proc. Ninth International Conference on Mobile Data Management Workshops MDMW 2008, pages 55--62, 2008. Google ScholarDigital Library
- J. J. Kim and W. E. Winkler. Masking microdata les. Technical report, Bureau of the Census, 1997.Google Scholar
- J. J. Kim and W. E. Winkler. Multiplicative noise for masking continuous data. In Proceedings of the Annual Meeting of the American Statistical Association, 2001.Google Scholar
- D. Koukis, S. Antonatos, D. Antoniades, E. P. Markatos, and P. Trimintzios. A generic anonymization framework for network traffic. In Proceedings of the 2006 IEEE International Conference on Communications, volume 5, pages 2302--2309, June 2006.Google ScholarCross Ref
- T. S. Kuhn. The Structure of Scientific Revolutions. University of Chicago Press, 1962.Google Scholar
- N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE 23rd International Conference on Data Engineering, pages 106--115, June 2007.Google ScholarCross Ref
- T. Li and N. Li. Injector: Mining background knowledge for data anonymization. In Proceedings of the IEEE 2008 International Conference on Data Engineering, pages 446--455. IEEE Computer Society, Apr. 2008. Google ScholarDigital Library
- K. Liu and E. Terzi. Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 93--106, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- D. Lücke and T. Mossakowski. Heterogeneous model nding with hets. In Preliminary Proceedings of the 19th International Workshop on Algebraix Development Techniques, pages 58--61, June 2008.Google Scholar
- G. Luk--Acsy and P. Szeredi. Efficient description logic reasoning in Prolog: The DLog system. Theory and Practice of Logic Programming, 9(3):343--414, 2009. Google ScholarDigital Library
- F. G. M. Atzori, F. Bonchi and D. Pedreschi. k-anonymous patterns. In Proceedings of the Ninth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'05), volume 3721 of Lecture Notes in Computer Science, Springer, Porto, Portugal, October 2005. Google ScholarDigital Library
- A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering, Apr. 2006. Google ScholarDigital Library
- M. Moriconi and R. A. Riemenschneider. Introduction to SADL 1.0: A language for specifying software architecture hierarchies. Technical Report SRI-CSL-97-01, SRI International, Mar. 1997.Google Scholar
- P. D. Moses. CASL Reference Manual, volume 2960 of Lecture Notes in Computer Science. Springer, 2004.Google Scholar
- B. Motik, P. Patel-Schneider, B. Parsia, C. Bock, A. Fokoue, P. Haase, R. Hoekstra, I. Horrocks, A. Ruttenberg, U. Sattler, et al. OWL 2 web ontology language structural specification and functional-style syntax. Technical report, W3C, Oct. 2009.Google Scholar
- B. Motik, U. Sattler, and R. Studer. Query answering for OWL-DL with rules. Web Semantics: Science, Services and Agents on the World Wide Web, 3(1):41--60, 2005. Rules Systems. Google ScholarDigital Library
- K. Muralidhar, R. Parsa, and R. Sarathy. A general additive data perturbation method for database security. Management Science, 45(10):1399--1415, Oct. 1999. Google ScholarDigital Library
- K. Muralidhar, R. Parsa, and R. Sarathy. Security of random data perturbation methods. ACM Transactions on Database Systems, 24(4):487--493, Dec. 1999. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy, pages 111--125, May 2008. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. De-anonymizing social networks. Proceedings of the 2009 IEEE Symposium on Security and Privacy, pages 173--187, 2009. Google ScholarDigital Library
- National Center for Health Statistics and the Centers for Medicare and Medicaid Services, Hyattsville, MD 20782. International Classification of Diseases, Ninth Revision, Clinical Modification, Oct. 2009.Google Scholar
- M. E. Nergiz and C. Clifton. δ-presence without complete world knowledge. IEEE Transactions on Knowledge and Data Engineering, 22, 2010. Google ScholarDigital Library
- M. E. Nergiz, C. Clifton, and A. E. Nergiz. Multirelational k-anonymity. IEEE Transactions on Knowledge and Data Engineering, 21(8):1104--1117, Aug. 2009. Google ScholarDigital Library
- A. Panchenko and L. Pimenidis. Cross-layer attack on anonymizing networks. In Proceedings of the 2008 International Conference on Telecommunications, pages 1--7, June 2008.Google ScholarCross Ref
- R. Pang, M. Allman, V. Paxson, and J. Lee. The devil and packet trace anonymization. ACM SIGCOMM Computer Communication Review, 36(1):29--38, January 2006. Google ScholarDigital Library
- P. Porras and V. Shmatikov. Large-scale collection and sanitization of network security data: Risks and challenges (position paper). In Proceedings of the 2006 Workshop on New Security Paradigms, pages 57--64, Sep. 2006. Google ScholarDigital Library
- Racer Systems GmbH & Co. KG, Hamburg, Germany. RacerPro User's Guide Version 1.9, Dec. 2005.Google Scholar
- S. L. Reed and D. B. Lenat. Mapping ontologies into Cyc. In Proceedings of the 2002 AAAI Conference Workshop on Ontologies for the Semantic Web, pages 1--6, July 2002.Google Scholar
- M. K. Reiter and A. D. Rubin. Crowds: Anonymity for web transactions. ACM Transactions on Information and System Security, 1(1):66--92, Nov. 1998. Google ScholarDigital Library
- M. Rennhard and B. Plattner. Introducing MorphMix: Peer-to-peer based anonymous internet usage with collusion detection. In Proceedings of the 2002 ACM Workshop on Privacy in the Electronic Society, pages 91--102, 2002. Google ScholarDigital Library
- J. Saltzer and M. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63(9):1278--1308, Sep. 1975.Google ScholarCross Ref
- N. Singer. When 2+2 equals a privacy question. New York Times, Oct. 18, 2009.Google Scholar
- E. Sirin, B. Parsia, B. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. Web Semantics: Science, Services and Agents on the World Wide Webervices and agents on the World Wide Web, 5(2):51--53, 2007. Google ScholarDigital Library
- P. Spyns, R. Meersman, and M. Jarrar. Data modelling versus ontology engineering. ACM SIGMOD Record, 31(4):12--17, Dec. 2002. Google ScholarDigital Library
- X. Sun, H. Wang, and J. Li. Injecting purpose and trust into data anonymisation. In Proceeding of the 18th ACM Conference on Information and Knowledge Management, pages 1541--1544, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- L. Sweeney. Uniqueness of simple demographics in the U. S. population. Technical Report LIDAP-WP4, Laboratory for International Data Privacy, Carnegie Mellon University, Pittsburgh, PA, USA, 2000.Google Scholar
- L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, Oct. 2002. Google ScholarDigital Library
- G. Szarvas, R. Farkas, and R. Busa-Fekete. State-of-the-art anonymization of medical records using an iterative machine learning framework. Journal of the American Medical Informatics Association, 14(5):574--580, Sep. 2007.Google ScholarCross Ref
- D. Tsarkov and I. Horrocks. FaCT++ description logic reasoner: System description. Automated Reasoning, pages 292--297, 2006. Google ScholarDigital Library
- A. van Renssen. Gellish: An information representation language, knowledge base, and ontology. In Proceedings of the 3rd IEEE Conference on Standardization and Innovation In Information Technology, pages 215--228, Oct. 2003.Google ScholarCross Ref
- K. Wang, B. C. M. Fung, and P. S. Yu. Template-based privacy preservation in classification problems. In Proceedings of the 5th IEEE International Conference on Data Mining, pages 466--473, Houston, TX, November 2005. Google ScholarDigital Library
- K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. Knowledge and Information Systems, 11(3):345--368, Apr. 2006. Google ScholarDigital Library
- K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: a data mining solution to privacy protection. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 249--256, Nov. 2004. Google ScholarDigital Library
- S. Warren and L. D. Brandeis. The right to privacy. Harvard Law Review, 4(5):193--220, 1890.Google ScholarCross Ref
- R. Wong, J. Li, A. Fu, and K. Wang. (, -anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In Proceedings of the 12th ACM SIGKDD Iinternational Conference on Knowledge Discovery and Data Mining, pages 754--759, 2006. Google ScholarDigital Library
- M. Wright, M. Adler, B. N. Levine, and C. Shields. The predecessor attack: An analysis of a threat to anonymous communication systems. ACM Transactions on Information Systems Security, 7(4):489--522, Nov. 2004. Google ScholarDigital Library
- L. Xiao, Z. Xu, and X. Zhang. Low-cost and reliable mutual anonymity protocols in peer-to-peer networks. IEEE Transactions on Parallel and Distributed Systems, 14(9):829--840, Sep. 2003. Google ScholarDigital Library
- X. Xiao and Y. Tao. Personalized privacy preservation. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 229--240, 2006. Google ScholarDigital Library
Index Terms
- Relationships and data sanitization: a study in scarlet
Recommendations
Sanitization's slippery slope: the design and study of a text revision assistant
SOUPS '09: Proceedings of the 5th Symposium on Usable Privacy and SecurityFor privacy reasons, sensitive content may be revised before it is released. The revision often consists of redaction, that is, the "blacking out" of sensitive words and phrases. Redaction has the side effect of reducing the utility of the content, ...
On the identity anonymization of high-dimensional rating data
We study the challenges of protecting the privacy of individuals in a large public survey rating data. The survey rating data usually contains both ratings of sensitive and non-sensitive issues. The ratings of sensitive issues involve personal privacy. ...
Toward sensitive document release with privacy guarantees
Privacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection ...
Comments