Skip to main content

2021 | Buch

Algorithms for Data and Computation Privacy

insite
SUCHEN

Über dieses Buch

This book introduces the state-of-the-art algorithms for data and computation privacy. It mainly focuses on searchable symmetric encryption algorithms and privacy preserving multi-party computation algorithms. This book also introduces algorithms for breaking privacy, and gives intuition on how to design algorithm to counter privacy attacks. Some well-designed differential privacy algorithms are also included in this book.

Driven by lower cost, higher reliability, better performance, and faster deployment, data and computing services are increasingly outsourced to clouds. In this computing paradigm, one often has to store privacy sensitive data at parties, that cannot fully trust and perform privacy sensitive computation with parties that again cannot fully trust. For both scenarios, preserving data privacy and computation privacy is extremely important. After the Facebook–Cambridge Analytical data scandal and the implementation of the General Data Protection Regulation by European Union, users are becoming more privacy aware and more concerned with their privacy in this digital world.

This book targets database engineers, cloud computing engineers and researchers working in this field. Advanced-level students studying computer science and electrical engineering will also find this book useful as a reference or secondary text.

Inhaltsverzeichnis

Frontmatter

Privacy Preserving Queries

Frontmatter
Chapter 1. Range Queries over Encrypted Data
Abstract
Privacy has been the key road block to cloud computing as clouds may not be fully trusted. In this chapter, we concern the problem of privacy preserving range query processing on clouds. ss Prior schemes are weak in privacy protection as they cannot achieve index indistinguishability, and therefore allow the cloud to statistically estimate the values of data and queries using domain knowledge and history query results. In this chapter, we propose the first range query processing scheme that achieves index indistinguishability under the indistinguishability against chosen keyword attack (IND-CKA). Our key idea is to organize indexing elements in a complete binary tree called PBtree, which satisfies structure indistinguishability (i.e., two sets of data items have the same PBtree structure if and only if the two sets have the same number of data items) and node indistinguishability (i.e., the values of PBtree nodes are completely random and have no statistical meaning). We prove that our scheme is secure under the widely adopted IND-CKA security model. We propose two algorithms, namely PBtree traversal width minimization and PBtree traversal depth minimization, to improve query processing efficiency. We prove that the worst case complexity of our query processing algorithm using PBtree is \(O(|R|\log n)\), where n is the total number of data items and R is the set of data items in the query result. We implemented and evaluated our scheme on a real world data set with 5 million items. For example, for a query whose results contain ten data items, it takes only 0.17 ms.
Alex X. Liu, Rui Li
Chapter 2. Fast and Scalable Range and Keyword Query Processing Over Encrypted Data with Provable Adaptive Security
Abstract
In this chapter, we concern the fundamental problem of processing conjunctive queries that contain both keyword conditions and range conditions on public clouds in a privacy preserving manner. No prior Searchable Symmetric Encryption (SSE) based privacy-preserving conjunctive query processing scheme satisfies the three requirements of adaptive security, efficient query processing, and scalable index size. In this chapter, we propose the first privacy preserving conjunctive query processing scheme that satisfies the above requirements. To achieve adaptive security, we propose an Indistinguishable Bloom Filter (IBF) data structure for indexing. To achieve efficient query processing and structure indistinguishability, we propose a highly balanced binary tree data structure called Indistinguishable Binary Tree (IBtree). To optimize searching efficiency, we propose a traversal width minimization algorithm and a traversal depth minimization algorithm. To achieve scalable and compact index size, we propose an IBtree space compression algorithm to remove redundant information in IBFs. We formally prove that our scheme is adaptive secure using a random oracle model. The key contribution of this chapter is on achieving conjunctive query processing with both strong privacy guarantee and practical efficiency in terms of both speed and space. We implemented our scheme in C++, evaluated and compared its performance with KRB for keyword queries and PBtree for range queries on two real-world data sets. Experimental results show that our scheme is fast and scalable (in milliseconds).
Alex X. Liu, Rui Li
Chapter 3. Nearest Neighbor Queries over Encrypted Data
Abstract
Nearest neighbor query processing is a fundamental problem that arises in many fields such as spatial databases and machine learning. ASPE, which uses invertible matrices to encrypt data, is a widely adopted Secure Nearest Neighbor (SNN) query scheme. Encrypting data by matrices is actually a linear combination of the multiple dimensions of the data, which is completely consistent with the relationship between the source signals and observed signals in the signal processing. By viewing dimensions of the data and the encrypted data as source signals and observed signals, respectively, we formally prove and experimentally demonstrate that ASPE is actually insecure against even ciphertext only attacks, using signal processing theory. Prior work proved that it is impossible to construct an SNN scheme even in much relaxed standard security models, we invalidate this hardness understanding by pointing out the incorrectness of the hardness proof.
Alex X. Liu, Rui Li
Chapter 4. K-Nearest Neighbor Queries Over Encrypted Data
Abstract
Nowadays, location-based services are proliferating and being widely deployed. For example, a Yelp user can obtain a list of the recommended restaurants near his/her current location. For some small or medium location service providers, they may rely on commercial cloud services, e.g., Dropbox, to store the tremendous geospatial data and deal with a number of user queries. However, it is challenging to achieve a secure and efficient location-based query processing over encrypted geospatial data stored on the cloud. In this chapter, we propose the Secure and Efficient Query Processing (SecEQP) scheme to address the secure k nearest neighbor (SkNN) query problem. SecEQP employs the projection function-based approach to code neighbor regions of a given location. Given the codes of two locations, the cloud server only needs to compare whether codes equal or not to check the proximity of the two locations. The codes are further embedded into an indistinguishable Bloom filter tree to build a secure and efficient index. The security of SecEQP is formally proved in the random oracle model. We further prototype SecEQP scheme and evaluate its performance on both real-world and synthetic datasets. Our evaluation results show that SecEQP is a highly efficient approach, e.g., top-10 NN query over 1 million datasets only needs less than 40 ms to get queried results.
Alex X. Liu, Rui Li
Chapter 5. Top-k Queries for Two-Tiered Sensor Networks
Abstract
Privacy and integrity have been the main road block to the applications of two-tiered sensor networks. The storage nodes, which act as a middle tier between sensors and the sink, could be compromised and allow attackers to learn sensitive data and manipulate query results. Prior schemes on secure query processing are weak because they reveal non-negligible information and therefore, attackers can statistically estimate the data values using domain knowledge and history of query results. In this chapter, we propose the first top-k query processing scheme that protects the privacy of sensor data and the integrity of query results. To preserve privacy, we build an index for each sensor collected data item using pseudo-random hash function and Bloom filters and transform top-k queries into top-range queries. To preserve integrity, we propose a data partition algorithm to partition each data item into an interval and attach the partition information with the data. The attached information ensures that the sink can verify the integrity of query results. We formally prove that our scheme is secure under IND-CKA security model. Our experimental results on real-life data show that our approach is accurate and practical for large network sizes.
Alex X. Liu, Rui Li

Privacy Preserving Computation

Frontmatter
Chapter 6. Collaborative Enforcement of Firewall Policies in Virtual Private Networks
Abstract
The widely deployed Virtual Private Network (VPN) technology allows roaming users to build an encrypted tunnel to a VPN server, which henceforth allows roaming users to access some resources as if that computer were residing on their home organization’s network. Although VPN technology is very useful, it imposes security threats on the remote network because its firewall does not know what traffic is flowing inside the VPN tunnel. To address this issue, we propose VGuard, a framework that allows a policy owner and a request owner to collaboratively determine whether the request satisfies the policy without the policy owner knowing the request and the request owner knowing the policy. We first present an efficient protocol, called Xhash, for oblivious comparison, which allows two parties, where each party has a number, to compare whether they have the same number, without disclosing their numbers to each other. Then, we present the VGuard framework that uses Xhash as the basic building block. The basic idea of VGuard is to first convert a firewall policy to non-overlapping numerical rules and then use Xhash to check whether a request matches a rule. Comparing with the Cross-Domain Cooperative Firewall (CDCF) framework, which represents the state-of-the-art, VGuard is not only more secure but also orders of magnitude more efficient. On real-life firewall policies, for processing packets, our experimental results show that VGuard is three to four orders of magnitude faster than CDCF.
Alex X. Liu, Rui Li
Chapter 7. Privacy Preserving Quantification of Cross-Domain Network Reachability
Abstract
Network reachability is an important characteristic for understanding end-to-end network behavior and helps in detecting violations of security policies across the network. While quantifying network reachability within one administrative domain is a difficult problem in itself, performing the same computation across a network spanning multiple administrative domains presents a novel challenge. The problem of quantifying network reachability across multiple administrative domains is more difficult because the privacy of security policies of individual domains is a serious concern and needs to be protected through this process. In this chapter, we propose the first cross-domain privacy-preserving protocol for quantifying network reachability. Our protocol constructs equivalent representations of the Access Control List (ACL) rules and determines network reachability while preserving the privacy of the individual ACLs. This protocol can accurately determine the network reachability along a network path through different administrative domains. We have implemented and evaluated our protocol on both real and synthetic ACLs. The experimental results show that the on-line processing time of an ACL containing thousands of rules is less than 25 s. Given two ACLs, each containing thousands of rules, the comparison time is less than 6 s, and the total communication cost is less than 2100 KB.
Alex X. Liu, Rui Li
Chapter 8. Cross-Domain Privacy-Preserving Cooperative Firewall Optimization
Abstract
Firewalls have been widely deployed on the Internet for securing private networks. A firewall checks each incoming or outgoing packet to decide whether to accept or discard the packet based on its policy. Optimizing firewall policies is crucial for improving network performance. Prior work on firewall optimization focuses on either intra-firewall or inter-firewall optimization within one administrative domain where the privacy of firewall policies is not a concern. This chapter explores inter-firewall optimization across administrative domains for the first time. The key technical challenge is that firewall policies cannot be shared across domains because a firewall policy contains confidential information and even potential security holes, which can be exploited by attackers. In this chapter, we propose the first cross-domain privacy-preserving cooperative firewall policy optimization protocol. Specifically, for any two adjacent firewalls belonging to two different administrative domains, our protocol can identify in each firewall the rules that can be removed because of the other firewall. The optimization process involves cooperative computation between the two firewalls without any party disclosing its policy to the other. We implemented our protocol and conducted extensive experiments. The results on real firewall policies show that our protocol can remove as many as 49% of the rules in a firewall whereas the average is 19.4%. The communication cost is less than a few hundred KBs. Our protocol incurs no extra online packet processing overhead and the offline processing time is less than a few hundred seconds.
Alex X. Liu, Rui Li
Chapter 9. Privacy Preserving String Matching for Cloud Computing
Abstract
Cloud computing has become indispensable in providing highly reliable data services to users. But, there are major concerns about the privacy of the data stored on cloud servers. While encryption of data provides sufficient protection, it is challenging to support rich querying functionality, such as string matching, over the encrypted data. In this work, we present the first ever symmetric key based approach to support privacy preserving string matching in cloud computing. We describe an efficient and accurate indexing structure, the PASStree, which can execute a string pattern query in logarithmic time complexity over a set of data items. The PASStree provides strong privacy guarantees against attacks from a semi-honest adversary. We have comprehensively evaluated our scheme over large real-life data, such as Wikipedia and Enron documents, containing up to 100,000 keywords, and show that our algorithms achieve pattern search in less than a few milliseconds with 100% accuracy. Furthermore, we also describe a relevance ranking algorithm to return the most relevant documents to the user based on the pattern query. Our ranking algorithm achieves 90% + above precision in ranking the returned documents.
Alex X. Liu, Rui Li
Chapter 10. Privacy Preserving Information Hub Identification in Social Networks
Abstract
This chapter addresses the problem of identifying the top-k information hubs in a social network. Identifying top-k information hubs is crucial for many applications such as advertising in social networks where advertisers are interested in identifying hubs to whom free samples can be given. Existing solutions are centralized and require time stamped information about pair-wise user interactions and can only be used by social network owners as only they have access to such data. Existing distributed algorithms suffer from poor accuracy. In this chapter, we propose a new algorithm to identify information hubs that preserves user privacy. Our method can identify hubs without requiring a central entity to access the complete friendship graph. We achieve this by fully distributing the computation using the Kempe-McSherry algorithm, while addressing user privacy concerns. We evaluate the effectiveness of our proposed technique using three real-world data set; The first two are Facebook data sets containing about six million users and more than 40 million friendship links. The third data set is from Twitter and comprises of a little over two million users. The results of our analysis show that our algorithm is up to 50% more accurate than existing algorithms. Results also show that the proposed algorithm can estimate the rank of the top-k information hubs users more accurately than existing approaches.
Alex X. Liu, Rui Li

Differential Privacy

Frontmatter
Chapter 11. Publishing Social Network Data with Privacy Guarantees
Abstract
Online Social Networks (OSNs) often refuse to publish their social network graphs due to privacy concerns. Recently, differential privacy has become the widely accepted criteria for privacy preserving data publishing. Although some work has been done on publishing matrices with differential privacy, they are computationally unpractical as they are not designed to handle large matrices such as adjacency matrices of OSN graphs. In this chapter, we propose a random matrix approach to OSN data publishing, which achieves storage and computational efficiency by reducing dimensions of adjacency matrices and achieves differential privacy by adding a small amount of noise. Our key idea is to first project each row of an adjacency matrix into a low dimensional space using random projection, and then perturb the projected matrix with random noise, and finally publish the perturbed and projected matrix. In this chapter, we first prove that random projection plus random perturbation preserve differential privacy, and also that the random noise required to achieve differential privacy is small. We validate the proposed approach and evaluate the utility of the published data for three different applications, namely node clustering, node ranking, and node classification, using publicly available OSN graphs of Facebook, Live Journal, and Pokec.
Alex X. Liu, Rui Li
Chapter 12. Predictable Privacy-Preserving Mobile Crowd Sensing
Abstract
The rise of mobile crowd sensing has brought privacy issues into a sharp view. In this chapter, our goal is to achieve the predictable privacy-preserving mobile crowd sensing, which we envision to have the capability to quantify the privacy protections, while also allowing application users to predict the utility loss at the same time. The Salus algorithm is first proposed to protect the private data against data reconstruction attacks. To understand the privacy protection, we quantify the privacy risks in terms of the private data leakage under reconstruction attacks. To predict the utility, we provide accurate utility predictions for various crowd sensing applications using Salus. The risk assessments can be generally applied to different type of sensors on the mobile platform, and the utility prediction can also be used to support various applications that use data aggregators such as average, histogram, and classifiers. Finally, we propose and implement the P 3 application framework. Both measurement results using online datasets and real-world case studies show that the P 3 provides accurate risk assessments and utility estimations, which makes it a promising framework to support future privacy-preserving mobile crowd sensing applications.
Alex X. Liu, Rui Li
Chapter 13. Differentially Private and Budget Limited Bandit Learning over Matroids
Abstract
We propose the first Budget-limited Multi-Armed Bandit (BMAB) algorithm subject to a union of matroid constraints in arm-pulling, while at the same time achieving differential privacy. Our model generalizes the arm-pulling models studied in prior BMAB schemes, and it can be used to address many practical problems such as network backbone construction and dynamic pricing in crowdsourcing. We handle the exploitation vs. exploration tradeoff in our BMAB problem by exploiting the combinatorial structures of matroids, and reduce the searching complexity of arm-selection based on a divide-and-conquer approach. Our algorithm achieves a uniform logarithmic regret bound with respect to B and 𝜖-differential privacy, where B is the budget for pulling the arms with random costs. Without differential privacy, our algorithm achieves a uniform logarithmic regret bound with respect to B, which advances the asymptotic regret bounds achieved by prior BMAB algorithms. We performed side-by-side comparisons with prior schemes in our experiments. Experimental results show that our algorithm not only achieves significantly better regret performance, but also is more than 20 times faster than prior BMAB schemes, as they use LP-solving techniques which is time-consuming, whereas our algorithm is purely combinatorial.
Alex X. Liu, Rui Li

Breaking Privacy

Frontmatter
Chapter 14. Breaching Privacy in Encrypted Instant Messaging Networks
Abstract
We present a novel attack on relayed instant messaging (IM) traffic that allows an attacker to infer who’s talking to whom with high accuracy. This attack only requires collection of packet header traces between users and IM servers for a short time period, where each packet in the trace goes from a user to an IM server or vice-versa. The specific goal of the attack is to accurately identify a candidate set of top-k users with whom a given user possibly talked to, while using only the information available in packet header traces (packet payloads cannot be used because they are mostly encrypted). Towards this end, we propose a wavelet-based scheme, called COmmunication Link De-anonymization (COLD), and evaluate its effectiveness using a real-world Yahoo! Messenger data set. The results of our experiments show that COLD achieves a hit rate of more than 90% for a candidate set size of 10. For slightly larger candidate set size of 20, COLD achieves almost 100% hit rate. In contrast, a baseline method using time series correlation could only achieve less than 5% hit rate for similar candidate set sizes.
Alex X. Liu, Rui Li
Metadaten
Titel
Algorithms for Data and Computation Privacy
verfasst von
Prof. Alex X. Liu
Assoc. Prof. Rui Li
Copyright-Jahr
2021
Electronic ISBN
978-3-030-58896-0
Print ISBN
978-3-030-58895-3
DOI
https://doi.org/10.1007/978-3-030-58896-0