Search pattern leakage in searchable encryption: Attacks and new construction

doi:10.1016/j.ins.2013.11.021

Information Sciences

Volume 265, 1 May 2014, Pages 176-188

https://doi.org/10.1016/j.ins.2013.11.021 Get rights and content

Abstract

Searching on remote encrypted data (commonly known as searchable encryption) has become an important issue in secure data outsourcing, since it allows users to outsource encrypted data to an untrusted third party while maintains the capability of keyword search on the data.

Searchable encryption can be achieved using the classical method called oblivious RAM, but the resultant schemes are too inefficient to be applied in the real-world scenarios (e.g., cloud computing). Recently, a number of efficient searchable encryption schemes have been proposed under weaker security guarantees. Such schemes, however, still leak statistical information about the users’ search pattern.

In this paper, we first present two concrete attack methods to show that the search pattern leakage will result in such a situation: an adversary who has some auxiliary knowledge can uncover the underlying keywords of user queries. To address this issue, we then develop a grouping-based construction (GBC) to transform an existing searchable encryption scheme to a new scheme hiding the search pattern. Finally, experiments based on the real-world dataset demonstrate the effectiveness of our attack methods and the feasibility of our construction.

Introduction

Cloud computing has been increasingly applied to outsource data by cloud tenants to diminish the high overhead of data storage and management. In many cases, cloud tenants need to encrypt data before outsourcing them to prevent cloud administrators from accessing sensitive information, such as government documents, hospital records and personal emails. However, data encryption invalidates many data query functions, which will lead to inefficient data utilization. For instance, the cloud service provider cannot respond keyword search query over encrypted data. The purpose of searchable encryption is allowing a user to outsource data to a third party in a secure manner, and then retrieve documents containing the queried keyword. Therefore, searchable encryption plays an important role in cloud computing scenario [16].

Index, an auxiliary structure that accelerates the search process, has been widely studied in information retrieval fields. In general, each entry of the index is formed as a 〈keyword, document identifiers〉 tuple, so that all the documents containing the queried keyword can be easily located. We describe the general scenario of searchable encryption as follows: a data owner (e.g., Alice) has a set of documents to outsource to an untrusted third party (e.g., Carol). Alice first builds an index of all the keywords appeared in the documents, and then encrypts both the documents and the index. After that, she outsources the encrypted documents and index to Carol. An authorized data user (e.g., Bob) has a secret key, so that he is able to generate keyword search queries (a.k.a. trapdoors or tokens) by calling a trapdoor function. Once Carol receives a query from Bob, she can search in the index and return the (encrypted) search result¹ to Bob. Then Bob is able to decrypt the search result and retrieve needed documents. Note that if the secret key is shared appropriately (such as the Multi-user SSE scheme in [11]), there can be multiple data users. For example, all employees of a company shared a secret key to perform keyword search on their data.

It has been widely accepted in the literature that both outsourced data and search query should leak as little information as possible to the third party. We note that searchable encryption can be achieved using the classical method called oblivious RAM [19], [14], which attains the optimal security (i.e., nothing leaked to the third party). However, this kind of approaches is impractical due to the poly-logarithmic computation and communication overheads. Therefore a number of searchable symmetric encryption (SSE) schemes (e.g., [22], [13], [9], [11], [23], [18], [17]) have been proposed under weaker security guarantees for efficiency. However, the access pattern and the search pattern are leaked in their schemes. Informally, the access pattern is the information about which documents contain the queried keyword (i.e., the search result) for each of the user queries, while the search pattern is the information about whether any two queries are generated from the same keyword or not. In recent studies, [15] has discussed a concrete attack exploiting the access pattern to disclose the underlying keywords of user queries. However, the potential risks of the search pattern leakage are rarely studied in the literature. As it is noted in [17], a limitation of most known SSE schemes is the leakage of the search pattern. In fact, this limitation also exists in searchable asymmetric encryption schemes (e.g., [8], [6]). For the reasons above, we are motivated to investigate on the search pattern leakage issue and develop a new searchable encryption construction under more rigorous security requirements.

Let’s begin with analyzing why search pattern is leaked in SSE schemes. To the best of our knowledge, the query algorithms of existing SSE schemes in the literature are mostly deterministic, which means the same keyword will always generate the same query. In this sense, an adversary can easily judge whether any two queries are generated from the same keyword or not, so as to obtain the users’ search pattern. One may apply probabilistic query algorithms to solve this problem. However, simply making use of probabilistic query algorithms [21] still cannot hide the search pattern, because the entry touched in each search process discloses the search pattern as well. In other words, for the same keyword, its corresponding entry in the index must be the same, so that the adversary can disclose users’ search pattern just by observing the entries touched in the index during the search process. For this reason, the search pattern is also leaked in searchable asymmetric encryption (a.k.a. public key encryption with keyword search, PEKS), whose query algorithms are probabilistic [8], [6], [20], [12]. Therefore, randomizing query algorithm only contributes to defend outer adversaries but not inner adversaries (e.g., cloud administrators). It is also worthwhile to point out that the access pattern might leak the search pattern in the same way (i.e., the same search result might mean the same queried keyword). We refer the reader to [15] for the details on how to hide the access pattern. The work of this paper focuses on the search pattern leakage caused by the queries and index. A delicate approach to hide the search pattern is launching several fake queries along with the real query, so that the adversary cannot identify the real query. Based on this idea, we develop a new searchable encryption construction where the number of sub-queries (i.e., the real query and the fake queries) for each query is parametrized by a confusion parameter k. We will give detailed description in Section 5.

We outline the contributions of this paper as the following:

1.
We address the search pattern leakage issue and demonstrate its potential risks in the practical applications by giving two concrete attack methods. In particular, with our attack methods, an adversary who has obtained users’ search pattern can effectively uncover the underlying keywords of the user queries under the help of some public available knowledge.
2.
We present a grouping-based construction (GBC) which transforms an existing index-based searchable encryption scheme to a new scheme hiding the search pattern. GBC is designed to be independent from the underlying searchable encryption scheme, so that most previous schemes [13], [9], [11], [23], [18], [17] can be used in GBC.
3.
We prove that the resultant scheme of GBC satisfies a stronger security guarantee than any existing searchable encryption scheme. GBC reduces the search pattern leakage to the group pattern leakage.
4.
Based on the real-world dataset [5], we test the performance of the proposed attack methods and the proposed construction. The experiment results indicate the effectiveness and feasibility of proposed attack methods and construction.

The remainder of the paper is organized as follows: Section 2 briefly surveys the motivations of hiding the search pattern. Section 3 introduces some preliminaries. We formalize two attack methods in Section 4. In Section 5 we will describe the grouping-based construction. Section 6 shows experimental studies for evaluating the performance of the proposed attack methods and construction. We review related work in Section 7 and conclude the paper in Section 8.

Section snippets

Motivations

Throughout this paper, we treat the third party Carol as the adversary, which is consistent with most previous work [22], [13], [8], [9], [11], [23], [18], [17]. Carol behaves “honest-but-curious” in the searchable encryption scheme. On one hand, Carol follows the operations required in the scheme. On the other hand, Carol tries to deduce as much private information as possible by utilizing all kinds of attack methods. Carol has the ability of eavesdropping the network, so that she has access

Preliminaries

Definition 1 defines the index-based searchable encryption, which covers both searchable symmetric encryption and searchable asymmetric encryption.

Definition 1 Searchable Encryption

An index-based Searchable Encryption (SE) scheme is a tuple of 6 algorithms SE = (KeyGen, BuildIndex, Encryption, Query, Search, Decryption):

1.
KeyGen( $1^{λ}$ ): The key generation algorithm takes a security parameter $λ$ as input, it outputs a secret key K.
2.
BuildIndex( $D$ ): The index building algorithm takes a document collection $D = {D_{1}, \dots, D_{n}}$ as input, it outputs an

Two attack methods

In this section, we formalize two attack methods which exploit the search pattern to uncover the underlying keywords of user queries. Let q be a specific query that Carol wants to attack, Carol records the occurrence frequency of q in each period of time (e.g., day, week, month, etc.), which we denote as $f_{q}^{i}$ for $1 ⩽ i ⩽ p$ where p is the total number of record items (e.g., $p = 50$ in Fig. 1(a)). Thus Carol gets a frequency vector of q denoted as $V_{q} = (f_{q}^{1}, \dots, f_{q}^{p})$ . Let $K$ denote a dictionary of keywords and

Our construction

In this section, we present new searchable encryption constructions to defend the previous attacks. For better understanding, we will first introduce a straightforward construction, and then our main construction. Both constructions are based on an existing index-based searchable encryption scheme. Before describing our constructions in detail, we introduce some additional notations. Let $W$ ( $W \subseteq K$ ) be the list of all distinct keywords contained in document collection $D$ in alphabetical order and $| W |$

Experiments

In this section, the proposed attack methods and construction are tested. Our test programs were implemented by Python-2.7.3. All experiments were performed on a computer with Intel Core i5-2400 CPU 3.10 GHz and 4.00 GB RAM. The operation system was Windows 7 Professional. Each data point presented in the experiment results was the mean of 10 executions.

In our experiments, all keywords were selected from Enron email dataset [5], which is a real-world dataset containing a total of about 500,000

Related work

The problems of searching on remote encrypted data have been widely studied in the literature, most of which focus on enhancing privacy guarantees and optimizing efficiency. The classical method proposed by Goldreich [19] and Goldreich and Ostrovsky [14], which called oblivious RAM, can resolve the problem without leaking any information to the third party. However, standing in the perspective of practical applications, such schemes are unacceptable due to their poly-logarithmic computation and

Conclusion

In this paper, we review searchable encryption schemes in the literature and point out the search pattern leakage issue. By giving two concrete attack methods, we demonstrate that the search pattern can be utilized to attack the underlying queried keywords. Motivated by this threat, we present a new searchable encryption construction based on the idea of generating fake queries. Finally, to clarify the performance of proposed attack methods and constructions, we give detailed experiments which

Acknowledgements

The authors are grateful to Yuan Zhang and Seny Kamara for the discussions of this work. This work is supported by National Natural Science Foundation of China No. 61272512, National 863 Programme No. 2013AA01A212 and Beijing Municipal Natural Science Foundation No. 4121001.

References (25)

L. Fang et al.
Public key encryption with keyword search secure against keyword guessing attacks without random oracle
Inform. Sci.
(2013)
H.S. Rhee et al.
Generic construction of designated tester public-key encryption with keyword search
Inform. Sci.
(2012)
AOL Search-log....
Google Trends....
Sogou Search-log....
Stopword List....
Enron Email Dataset, 2009....
M. Abdalla et al.
Searchable encryption revisited: consistency properties, relation to anonymous ibe, and extensions
B.H. Bloom
Space/time trade-offs in hash coding with allowable errors
Commun. ACM
(1970)
D. Boneh et al.
Public key encryption with keyword search

Y.-C. Chang et al.

Privacy preserving keyword searches on remote encrypted data

M. Chase et al.

Structured encryption and controlled disclosure

Cited by (177)

Updatable searchable symmetric encryption: Definitions and constructions
2024, Theoretical Computer Science
Searchable symmetric encryption (SSE) allows a client to search over encrypted data. To address the real threat of key compromise in practice, this work initiates the study of key rotation for SSE and introduces the notion of updatable SSE (USSE). In USSE, a client can issue a single update token that permits the server to convert existing encrypted data from the old key to the new key. In particular,
- •
  we formalize the syntax of USSE and define the security model that captures the inference of key, search token, update token and encrypted data with bi-/uni-/no-directional key updates and bi-/uni-directional encrypted data updates.
- •
  we present a USSE scheme that supports conjunctive queries with sub-linear complexity, and prove its security with no-directional key update and bi-directional encrypted data update. We also give extensions for concerning different key/encrypted data updates.
- •
  we implement our USSE schemes and evaluate the performance with real-world dataset, which illustrates that our schemes achieve practically acceptable computational overhead and communication cost.
Technically, our formalization of USSE is inspired by updatable encryption (UE); our USSE schemes are obtained by a semi-generic transformation from Cash et al.'s SSE and UE. The transformation itself only relies on DL and DBDH assumptions. We believe that the transformation is of independent interest and applicable to other scenarios where the SSE systems follow the structure of Cash et al.'s work.
Efficient and privacy-preserving similar electronic medical records query for large-scale ehealthcare systems
2024, Computer Standards and Interfaces
The advancements and adoption of cloud-assisted ehealthcare systems have enabled the storage of massive electronic medical records (EMRs) in the cloud for efficient and easy access. A direct benefit of EMRs is the ability of patients to search for EMRs that are similar to their own in the cloud for use as references. These similar EMRs can help a patient find appropriate medical services quickly. However, for large-scale ehealthcare systems, challenges remain with respect to ensuring the efficiency and privacy of these queries. In this study, we construct an efficient and privacy-preserving similar EMR query scheme to help patients find similar EMRs to reference in a large-scale ehealthcare system. Specifically, we propose a coarse-grained query method based on a binary decision tree to find a set of EMRs corresponding to the patient’s set of medical-symptom keywords. We also design a fine-grained query method to find similar EMRs that meet the threshold set by the patient. A detailed security analysis shows that the proposed scheme is secure. The efficiency of the proposed method in a large-scale ehealthcare system is verified experimentally.
Refined statistical attacks against searchable symmetric encryption using non-indexed documents
2023, Journal of Information Security and Applications
Searchable symmetric encryption(SSE) schemes allow clients to search encrypted data stored on remote servers but ensure efficient retrieval by revealing specific information about the queries, such as access patterns. Honest but curious servers can then use these leakages to infer keywords queried by users. However, most attack schemes can only achieve considerable recovery accuracy(over 30%) under favorable conditions. When the auxiliary information is weak, the inference attack schemes using statistical information can achieve the same or even better recovery accuracy. In this paper, an attack based on statistical information is proposed. Our attack iteratively solves the quadratic recovery query problem using a linear optimization solver. With a tiny number of known queries(which can be deleted by the server), the query recovery accuracy can reach about 95% without knowing the exact background knowledge of the documents stored by the client. This process not only makes our scheme outperforms other attack schemes in accuracy but also makes our attack execution more efficient than other schemes, up to a maximum difference of 1000 seconds. The attack can still achieve considerable accuracy even when the defense is applied to the SSE scheme; defenses can help the SSE scheme obfuscate the pattern leakages.
Blockchain-based public key encryption with keyword search for medical data sharing in cloud environment
2023, Journal of Information Security and Applications
The cloud-based sharing of electronic health data (EHDs) has great significance in researching diseases, doctors’ diagnoses and various fields. In this paper, we applied the potentiality of both blockchain and public key searchable encryption to build a medical data-sharing platform. In recent years, blockchain has emerged as a promising solution for medical data sharing with security and privacy preservation due to its immutability, decentralization, anonymity and verifiability advantages. Two kinds of blockchains, private blockchain and consortium blockchain, are used in our model. In our model, we have designed a consortium blockchain of preselected users. To achieve data security, access control, privacy preservation and secure search, public key encryption with keyword search (PEKS) technology is used here. According to the security proof, our proposed scheme is indistinguishable under the keyword guessing attack. The outcomes of the performance analysis and comparison simulations reveal that the proposed scheme performs better than other related schemes.
Provably secure public key encryption with keyword search for data outsourcing in cloud environments
2023, Journal of Systems Architecture
In recent days, the application of cloud computing has been gaining significant popularity among people. A considerable amount of data are being stored in the cloud server. However, data owners outsource their encrypted data to the cloud for various security reasons. Unfortunately, encrypted data cannot be searched, like plaintext data. So how to search encrypted data is an interesting problem in this era. Many public key encryption with keyword search (PEKS) schemes have been designed in the literature. However, most of them cannot prevent keyword-guessing attacks. In this paper, we develop a provably secure PEKS scheme in the random oracle model. This scheme may be used for secure email access from an email server containing a list of encrypted keywords. The proposed scheme can resist keyword-guessing attacks, and offer ciphertext and trapdoor indistinguishability properties. We use the data owner’s private key during encryption to prevent keyword-guessing attacks. Using the data owner’s public key in the verification phase ensures the resilience of keyword-guessing attacks. Finally, the proposed scheme has been tested on a real testbed, and the results show that it can be used in the cloud computing scenario to search for keywords on encrypted data.
Query Recovery from Easy to Hard: Jigsaw Attack against SSE
2024, arXiv

View all citing articles on Scopus

View full text