Skip to main content

2015 | Buch

Advanced Research in Data Privacy

insite
SUCHEN

Über dieses Buch

This book provides an overview of the research work on data privacy and privacy enhancing technologies carried by the participants of the ARES project. ARES (Advanced Research in Privacy an Security, CSD2007-00004) has been one of the most important research projects funded by the Spanish Government in the fields of computer security and privacy. It is part of the now extinct CONSOLIDER INGENIO 2010 program, a highly competitive program which aimed to advance knowledge and open new research lines among top Spanish research groups. The project started in 2007 and will finish this 2014. Composed by 6 research groups from 6 different institutions, it has gathered an important number of researchers during its lifetime.

Among the work produced by the ARES project, one specific work package has been related to privacy. This books gathers works produced by members of the project related to data privacy and privacy enhancing technologies. The presented works not only summarize important research carried in the project but also serve as an overview of the state of the art in current research on data privacy and privacy enhancing technologies.

Inhaltsverzeichnis

Frontmatter

Introduction

Frontmatter
Advanced Research on Data Privacy in the ARES Project
Abstract
Privacy has become an important concern in today’s society. The advancement and pervasiveness of information and communication technologies have a great positive impact in our society, they greatly affect how we socialize, the way we do business, or even our individual and social freedom.
Guillermo Navarro-Arribas, Vicenç Torra
Selected Privacy Research Topics in the ARES Project: An Overview
Abstract
This chapter gives an overview of some of the data privacy research carried out by the team at Universitat Rovira i Virgili within the ARES project. Topics reviewed include query profile privacy, location privacy, differential privacy and anti-discrimination.
Jesús A. Manjón, Josep Domingo-Ferrer
Data Privacy: A Survey of Results
Abstract
In this paper we present an overview of the results obtained by our research group within the area of data privacy. Results focus on data-driven problems (respondent and owner privacy with an unknown use) and user privacy. We have developed some new masking methods, developed methodologies for parameter selection, and developed some information loss and disclosure risk measures. We have also obtained important results on reidentification methods (record linkage) when used for disclosure risk assessment.
Vicenç Torra, Guillermo Navarro-Arribas

Respondent Privacy: SDC and PPDM

Frontmatter
A Review of Attribute Disclosure Control
Abstract
Attribute disclosure occurs when the adversary can infer some sensitive information about an individual without identifying individual’s record in the published data set. To address this issue several privacy models were proposed with the goal of increasing the uncertainty of the adversary in deriving sensitive information from published data. In this chapter, firstly we review the underlying scenario used in statistical disclosure control (SDC) and Privacy-Preserving Data Mining (PPDM). In this chapter, we describe the attribute disclosure underlying scenario, the different forms of background knowledge of the adversary the adversary may have and their potential privacy attacks. then, we review the approaches introduced in the literature to tackle attribute disclosure attacks.
Stan Matwin, Jordi Nin, Morvarid Sehatkar, Tomasz Szapiro
Data Privacy with $$R$$ R
Abstract
Privacy Preserving Data Mining (PPDM) is an application field, which is becoming very relevant. Its goal is the study of new mechanisms which allow the dissemination of confidential data for data mining tasks while preserving individual private information. Additionally, due to the relevance of \(R\) language in the statistics and data mining communities, it is undoubtedly a good environment to research, develop and test privacy techniques aimed to data mining. In this chapter we outline some helpful tools in \(R\) to introduce readers to that field, so that we present several PPDM protection techniques as well as their information loss and disclosure risk evaluation process and outline some tools in \(R\) to help to introduce practitioners to this field.
Daniel Abril, Guillermo Navarro-Arribas, Vicenç Torra
Optimisation-Based Study of Data Privacy by Using PRAM
Abstract
Dissemination of data with sensitive information has an implicit risk of unauthorised disclosure. Several masking methods have been developed in order to protect the data without losing too much information. One of the methods is called the Post Randomisation Method (PRAM) which is based on perturbations according to a Markov probability transition matrix. However, the method has the drawback that it is difficult to find an optimal transition matrix to perform perturbations which maximise data utility. In this paper we present an study of data privacy from the point of view of optimisation using evolutionary algorithms to generate optimal probability transition matrices. Optimality is with respect to a pre-defined fitness function which aims to preserve several data protection properties such as data utility and disclosure risk. We also provide experimental results using real datasets in order to illustrate and empirically evaluate the application of this technique.
Jordi Marés, Vicenç Torra, Natalie Shlomo

Respondent Privacy: Semantic Related Respondent Privacy Protection

Frontmatter
Semantic Anonymisation of Categorical Datasets
Abstract
The exploitation of microdata compiled by statistical agencies is of great interest for the data mining community. However, such data often include sensitive information that can be directly or indirectly related to individuals. Hence, an appropriate anonymisation process is needed to minimise the risk of disclosing identities and/or confidential data. In the past, many anonymisation methods have been developed to deal with numerical data, but approaches tackling the anonymisation of non-numerical values (e.g. categorical, textual) are scarce and shallow. Since the utility of this kind of information is closely related to the preservation of its meaning, in this work, the notion of semantic similarity is used to enable a semantically coherent anonymisation. The knowledge modelled in ontologies is used as the basic pillar to propose semantic operators that enable an accurate management and transformation of categorical attributes. These operators are then used in three anonymisation mechanisms: Semantic Recoding, Semantic and Adaptive Microaggregation and Semantic Resampling. The three algorithms are compared in terms of semantic utility, privacy disclosure risk and runtime, with encouraging results.
Sergio Martínez, Aida Valls, David Sánchez
Contributions on Semantic Similarity and Its Applications to Data Privacy
Abstract
Semantic similarity aims at quantifying the resemblance between the meaning of textual terms. Thus, it represents the corner stone of textual understanding. Given the increasing availability and importance of textual sources within the current context of Information Societies, a lot of attention has been put in recent years in the development of mechanisms to automatically measure semantic similarity and to apply them to tasks dealing with textual inputs (e.g. document classification, information retrieval, question answering, privacy-protection, etc.). This chapter offers describes and discusses recent findings and proposals published by the authors on semantic similarity. Moreover, it also details recent works applying semantic similarity to privacy protection of textual data.
Montserrat Batet, David Sánchez
An Information Retrieval Approach to Document Sanitization
Abstract
In this paper we use information retrieval metrics to evaluate the effect of a document sanitization process, measuring information loss and risk of disclosure. In order to sanitize the documents we have developed a semi-automatic anonymization process following the guidelines of Executive Order 13526 (2009) of the US Administration. It embodies two main and independent steps: (i) identifying and anonymizing specific person names and data, and (ii) concept generalization based on WordNet categories, in order to identify words categorized as classified. Finally, we manually revise the text from a contextual point of view to eliminate complete sentences, paragraphs and sections, where necessary. For empirical tests, we use a subset of the Wikileaks Cables, made up of documents relating to five key news items which were revealed by the cables.
David F. Nettleton, Daniel Abril

Respondent Privacy: Location Privacy

Frontmatter
Privacy for LBSs: On Using a Footprint Model to Face the Enemy
Abstract
User privacy in Location Based Services (LBSs) is still in need of effective solutions. A new privacy model for LBSs has been recently proposed based on users’ footprints—these being a representation of the amount of time a user spends in a given area. The model is claimed to be independent from the specific knowledge of the adversary about users’ footprints. Despite this claim, we show in this chapter that when the adversary has a knowledge that differs from the one considered for the anonymization procedure, the model is not valid. Further, we generalize this weakness of the model and show that it is highly probable that the footprint model provides: (i) either a privacy level lower than the expected one; or, (ii) a LBS information coarser than what would be required for anonymization purposes. We support our claim via analysis: modeling the footprints data as an hypercube model; with a simple example to grasp the main problem; and, with the study of a real data set of traces of mobile users. Finally, we also investigate which properties must hold for both the anonymiser and the adversary knowledge, in order to guarantee an effective level of user privacy.
Mauro Conti, Roberto Di Pietro, Luciana Marconi
Privacy in Spatio-Temporal Databases: A Microaggregation-Based Approach
Abstract
Technologies able to track moving objects such as GPS, GSM, and RFID, have been well-adopted worldwide since the end of the 20th century. As a result, companies and governments manage and control huge spatio-temporal databases, whose publication could lead to previously unknown knowledge such as human behaviour patterns or new road traffic trends (e.g., through Data Mining). Aimed at properly balancing data utility with users’ privacy rights, several microaggregation-based methods for publishing movement data have been proposed. These methods are reviewed in this book chapter. We highlight challenges in the three stages of the microaggregation process namely, clustering, obfuscation, and privacy and utility evaluation. We also address some of these challenges by presenting yet another microaggregation-based method for privacy-preserving publication of spatio-temporal databases.
Rolando Trujillo-Rasua, Josep Domingo-Ferrer
A Prototype for Anonymizing Trajectories from a Time Series Perspective
Abstract
The evolution and expansion of location tracking technologies such as GPS, RFID, etc. and their integration with handheld devices, created a new trend of services and applications based on location information. However, location data is sensible data that could seriously compromise the privacy of the individuals. There is a large body of research in the area of location privacy, where researchers try to tackle this privacy problem. In this article we describe one of the systems implemented in the ARES project to anonymize trajectories of cars in a prototype, following an approach based on time series.
Sergi Martínez-Bea

Respondent Privacy: Social Networks

Frontmatter
A Summary of $$k$$ k -Degree Anonymous Methods for Privacy-Preserving on Networks
Abstract
In recent years there has been a significant raise in the use of graph-formatted data. For instance, social and healthcare networks present relationships among users, revealing interesting and useful information for researches and other third-parties. Notice that when someone wants to publicly release this information it is necessary to preserve the privacy of users who appear in these networks. Therefore, it is essential to implement an anonymization process in the data in order to preserve users’ privacy. Anonymization of graph-based data is a problem which has been widely studied last years and several anonymization methods have been developed. In this chapter we summarize some methods for privacy-preserving on networks, focusing on methods based on the \(k\)-anonymity model. We also compare the results of some \(k\)-degree anonymous methods on our experimental set up, by evaluating the data utility and the information loss on real networks.
Jordi Casas-Roma, Jordi Herrera-Joancomartí, Vicenç Torra
Evaluating Privacy Risks in Social Networks from the User’s Perspective
Abstract
Determining privacy risks when publishing information on social networks often presents a challenge for the users. A measure of how much of sensitive information users shared with others on a social network website would help the users to understand whether they individually share too much. We survey existing measures that evaluate privacy from the user’s perspective or help the user with the privacy risks and related decisions in social networks. We present the Privacy Scores—a measurement of how much sensitive information a user made available for others on a social network website, discuss some of their shortcomings, and discuss research directions for their extensions. In particular, we present our proposal for an extension that takes the privacy score metric from a single social network closed system to include auxiliary background knowledge. Our examples and experimental results show the need to include publicly available background knowledge in the computation of privacy scores in order to get scores that reflect the privacy risks of the users more truthfully. We add background knowledge about users by means of combining several social networks together or by using simple web search for detecting publicly known information about the evaluated users. This is a revision and extension of our former paper.
Michal Sramka

Respondent Privacy: Other Respondent Privacy Enhancing Technologies

Frontmatter
Trustworthy Video Surveillance: An Approach Based on Guaranteeing Data Privacy
Abstract
Thousands of video files are stored in surveillance databases. Pictures of individuals are considered personal data and, thus, their disclosure must be prevented. Although video surveillance is done for the sake of security, the privacy of individuals could be endangered if the proper measures are not taken. In this chapter we claim that a video-surveillance system could protect our safety and, at the same time, guarantee our privacy. Most literature on privacy in video surveillance systems concentrates on the goal of detecting faces and other regions of interest, and in proposing different methods to protect them. However, the trustworthiness of those systems and, by extension, of the privacy they provide are neglected. Hence, we define the concept of Trustworthy Video Surveillance System (T-VSS), which tackles the issue of protecting the privacy of the individuals. In this chapter, we assess the techniques proposed in the literature according to their suitability in a T-VSS. Moreover, we describe a privacy-aware video-surveillance platform that fulfils those properties and we detail all its components. We have implemented and tested the proposed platform to show the feasibility of our proposal.
Antoni Martínez-Ballesté, Agusti Solanas, Hatem A. Rashwan
Electronic Ticketing: Requirements and Proposals Related to Transport
Abstract
The use of electronic tickets (e-tickets) on mobile devices allow customers to book everywhere and use e-tickets immediately, and allows the companies to save resources and speed up management processes. Transport is one of the main sectors that use tickets in their standard activity. A wide variety of transport systems can benefit from the use of e-tickets. However, the use of e-tickets leads to various privacy abuses since anonymity of users is not always guaranteed and, therefore, users can be traced and profiles of usual movements can be created. In this chapter, we focus especially on the properties related to user privacy and we review and classify the main proposals in this area.
M. Magdalena Payeras-Capellà, Macià Mut-Puigserver, Josep-Lluís Ferrer-Gomila, Jordi Castellà-Roca, Arnau Vives-Guasch
Security and Privacy Concerns About the RFID Layer of EPC Gen2 Networks
Abstract
RFID systems are composed by tags (also known as electronic labels) storing an identification sequence which can be wirelessly retrieved by an interrogator, and transmitted to the network through middleware and database information systems. In the case of the EPC Gen2 technology, RFID tags are not provided with on-board batteries. They are passively powered through the radio frequency waves of the interrogators. Tags are also assumed to be of low-cost nature, meaning that they shall be available at a very reduced price (predicted for under 10 US dollar cents in the literature). The passive and low-cost nature of EPC Gen2 tags imposes several challenges in terms of power consumption and integration of defense countermeasures. Like many other pervasive technologies, EPC Gen2 might yield to security and privacy violations if not handled properly. In this chapter, we provide an in-depth presentation of the RFID layer of the EPC Gen2 standard. We also provide security and privacy threats that can affect such a layer, and survey some representative countermeasures that could be used to handle the reported threats. Some of the reported efforts were conducted within the scope of the ARES project.
Joaquin Garcia-Alfaro, Jordi Herrera-Joancomartí, Joan Melià-Seguí
Privacy on Mobile Coupons Booklets
Abstract
Electronic coupons booklets are the equivalent of paper-based coupons booklets, offered to customers as a great opportunity to obtain a better offer from merchants. In this book chapter, the authors describe the main coupons booklet scenarios and identify the basic and additional security requirements. They review the state-of-the-art of the coupons booklet solutions and discuss about the main challenges: security, privacy and efficiency. In order to solve all these challenges, they present a coupons booklet scheme for the mobile scenario. They analyze their proposal to prove it meets all security and privacy requirements, and provide some performance results to prove it is a viable solution.
M. Francisca Hinarejos, Andreu Pere Isern-Deyà, Josep-Lluís Ferrer-Gomila
Smart User Authentication for an Improved Data Privacy
Abstract
Market analysis predicts that in a few years, companies, universities, government agencies as well as common people in they daily life will increasingly adopt mobile computing systems thus increasingly enjoying the benefits of online, Internet-based services. However, such scenario will also expose user data privacy to severe attacks. This situation has led to the development of authentication approaches aimed at preventing unauthorized access to user data. Many different authentication approaches have been proposed over the last years, starting from basic password to more complex biometric solutions but all of them have proven to suffer from the same weaknesses. This issue drove us to design a solution based upon hardware intrinsic security features and aimed at guaranteeing a high level of data privacy while providing a user friendly authentication process. Our solution shows advanced features of data privacy policies definition making it a good candidate for the construction of future data privacy policies.
Vanesa Daza, Matteo Signorini

User Privacy: Web Search Engines

Frontmatter
Multi-party Methods for Privacy-Preserving Web Search: Survey and Contributions
Abstract
Web search engines (WSEs) locate keywords on websites and retrieve contents from the World Wide Web. To be successful among its users, the WSE must return the results that best match their interests. For this purpose, WSEs collect and analyze users’ search history and build profiles. Although this brings immediate benefits to the user, it is also a threat for her privacy in the long term. Profiles are built from past queries and other related data that may contain private and personal information. Consequently, researchers on this field have developed different approaches whose objective is to avoid this privacy threat and protect users of WSEs. One way to classify the existing alternatives is between single-party and multi-party. The former approach allows users to protect their privacy in front of the WSE without requiring the cooperation of others. The latter requires that a group of users or entities collaborate in order to protect the privacy of each member of the group. This work focuses on multi-party schemes. First, current solutions in this field are surveyed, their differences are analyzed and their advantages (and shortcomings) are stressed. Finally, our own contributions to this area are presented and evaluated.
Cristina Romero-Tris, Alexandre Viejo, Jordi Castellà-Roca
DisPA: An Intelligent Agent for Private Web Search
Abstract
Search queries can be used to infer preferences and interests of users. While search engines use this information for, among others, targeted advertising and personalization, these tasks can violate user’s privacy. In 2006, after AOL disclosed the search queries of 650,000 users and some of them were re-identified, many Privacy Enhancement Technologies (PETs) have sought to solve this problem. The Dissociating Privacy Agent (DisPA), is a browser extension that acts as a proxy between the user and the search engine and semantically dissociates queries on real time. We show that DisPA increases the privacy of the user and hinders re-identification. We also propose an algorithm to measure and evaluate the privacy properties offered by DisPA.
Marc Juarez, Vicenç Torra
A Survey on the Use of Combinatorial Configurations for Anonymous Database Search
Abstract
The peer-to-peer user-private information retrieval (P2P UPIR) protocol is an anonymous database search protocol in which the users collaborate in order to protect their privacy. This collaboration can be modelled by a combinatorial configuration. This chapter surveys currently available results on how to choose combinatorial configurations for P2P UPIR.
Klara Stokes, Maria Bras-Amorós

User Privacy: Recommender and Personalized Systems

Frontmatter
Privacy-Enhancing Technologies and Metrics in Personalized Information Systems
Abstract
In recent times we are witnessing the emergence of a wide variety of information systems that tailor the information-exchange functionality to meet the specific interests of their users. Most of these personalized information systems capitalize on, or lend themselves to, the construction of user profiles, either directly declared by a user, or inferred from past activity. The ability of these systems to profile users is therefore what enables such intelligent functionality, but at the same time, it is the source of serious privacy concerns. The purpose of this paper is twofold. First, we survey the state of the art in privacy-enhancing technologies for applications where personalization comes in. In particular, we examine the assumptions upon which such technologies build, and then classify them into five broad categories, namely, basic anti-tracking technologies, cryptography-based methods from private information retrieval, approaches relying on trusted third parties, collaborative mechanisms and data-perturbative techniques. Secondly, we review several approaches for evaluating the effectiveness of those technologies. Specifically, our study of privacy metrics explores the measurement of the privacy of user profiles in the still emergent field of personalized information systems.
Javier Parra-Arnau, David Rebollo-Monedero, Jordi Forné
Managing Privacy in the Internet of Things: DocCloud, a Use Case
Abstract
In this chapter, we describe nodes in the Internet of Things can configure themselves automatically and offer personalized services to the users while protecting their privacy. We will show how privacy protection can be achieved by means of a use case. We describe DocCloud, a recommender system where users get content recommended by other users based on their personal affinities. To do this, their things connect together based on the affinities of their owners, creating a social network of similar things, and then provide the recommender system on top of this network. We present the architecture of DocCloud and analyze the security mechanisms that the system includes. Specifically, we study the properties of plausible deniability and anonymity of the recommenders and intermediate nodes. In this way, nodes can recommend products to the customers while deny any knowledge about the product they are recommending or their participation in the recommendation process.
Juan Vera del Campo, Josep Pegueroles, Juan Hernández Serrano, Miguel Soriano
Metadaten
Titel
Advanced Research in Data Privacy
herausgegeben von
Guillermo Navarro-Arribas
Vicenç Torra
Copyright-Jahr
2015
Electronic ISBN
978-3-319-09885-2
Print ISBN
978-3-319-09884-5
DOI
https://doi.org/10.1007/978-3-319-09885-2