Abstract

The fast growth of Internet-of-Things (IoT) strategies has actually presented the generation of huge quantities of information. There should exist a method to conveniently gather, save, refine, and also provide such information. On the other hand, IoT data is sensitive and private information; it must not be available to potential attackers. We propose a robust scheme to guarantee both secure IoT data storage and retrieval from the untrusted cloud servers. The proposed scheme is based on Private Information Retrieval (PIR). It stores the data onto different servers and retrieves the requested data slice without disclosing its identity. In our scheme, the information is encrypted before sending to the cloud servers. It is also divided into slices of a specific size class. The experimental analysis on many different configurations supported efficiency and the efficacy of the proposed scheme, which demonstrated compatibility and exceptional performance.

1. Introduction

With the huge revolution of Internet-of-Things (IoT) and cloud computing as its storage environment, the user requests a query to a part of information and should receive that part without disclosing its identity. A great number of researches have been dedicated to defend the database from curious users. There are approaches that enable questions to be asked by an individual into a database by reconstructing the worth of entities in a manner that prevents him. If the user would like to maintain his privacy (in the information-theoretic sense), then he could request a copy of the entire database. This can cause a huge communication overhead, making it unacceptable.

Before going further let us make the problem more tangible. Let a binary string of length . server stores copies of this binary string. The user has some indicator and he is interested in getting this little . To attain this aim, the little could be calculated, the user queries each of the servers and receives responses. The query to each server is distributed separately of and each server gains no information about . A strategy with these properties is called a Private Information Retrieval (PIR) [1, 2].

Within this paper we introduce encrypted PIR that offers a great privacy. That is, unbounded servers should not obtain any information about the requested piece of information. One does require at least two servers to achieve privacy. These servers do not need to store the whole database; they could store portions of it. We show that when those pieces are encrypted, instead of duplicated, the memory overhead can be decreased.

1.1. Motivation

In addition to reducing the storage overhead caused by replicating the data to reduce the communication cost in the traditional PIR protocols [2] and achieving the information-theoretic privacy, in big data, the user who reconstructs data is distinct from the user who distributes them. Also, the user who distributes data should encrypt it using different keys and distribute the ciphertext.

Moreover, querying information over big data where no one can get the identity of the parts you are querying or the responses obtained is a big challenge and sounds like science fiction. But it is actually the PIR science. In modern data storage systems, data are usually stored at multiple storage nodes in the cloud. The privacy of information retrieval has to be shielded. One naive method to attain PIR is by simply downloading each document in the system regardless of the user requirements. The drawback of that strategy is the very large restoration cost, which further increases with (the range of saved documents). Thus, there is an urgent requirement to present a strong PIR strategy for storage and recovery.

1.2. Contributions

One of the main contributions in this paper is integrating PIR [1] with cloud computing, to guarantee the efficiency and security of encrypted storage and retrieval of information for big data queries from the untrusted cloud servers. Our scheme first divides the data into slices, then encrypts each of them, and stores those encrypted slices using the Swift service on the untrusted cloud servers. In the reconstruction stage, the requested data slice is reconstructed without disclosing any information about that requested slice, with retrieval cost independent of the number of stored slices. The main contributions can be summarized as follows:(i)We have integrated the PIR scheme with cloud computing, to secuerly store the personal information as well as effiently and smothly reterive it.(ii)We have implemented our scheme on the top of our private cloud environment, by initializing a set of virtual instances that are divided into three categories based on their configuration and their role in the proposed scheme.(iii)We have tested the proposed scheme under two different scenarios, client situation and overlay situation.(iv)The experimental results have suggested that the decryption and retrieval cost are separate from the amount of saved slices and harmonious with all reasonable scenarios for applying our proposed scheme.

Ultimately, the throughput for a workload caused by the implementation of mixing get and also put_fragment requests on Swift is sensible and accepted by the operator and user.

1.3. Organization

The rest of this paper is organized as follows: Section 2 presents the related work and smooth comparison between the currently proposed information hiding and extraction approaches while not disclosing the queried information. Section 3 presents our proposed -private and -out-of- PIR scheme and its three stages. The proposed scheme implementation is introduced in Section 4. The comprehensive performance analysis is presented in Section 5. This is followed by the conclusion in Section 6.

The notion of earning the extraction of this data content of a huge data source should consider quite rough security requirements to be able to do not disclose the queried data. Reconstructing a part of discerning information from an encrypted source is determined by the availability of the whole source, which is introduced by Rivest [3], by proposing all-or-nothing transform (AONT). In this scheme any modification on the encrypted message limits the ability to decrypt the resource. Thus, the AONT scheme works well for the scenarios where the user who wants to decrypt the resource has never accessed the key before. This is not realistic, because the cloud users frequently access the resources and request to decrypt their ciphertext. In our scheme, the user can decrypt the resource as long as he has the appropriate key and the sufficient data slices that can generate the requested resource. Moreover, the requesting user can only decrypt the ciphertext if he has successfully generated the decryption token. Another direction for securely uploading the data to cloud is introduced in [4], which ensures that the cloud validates the data integrity while avoiding malicious home gateways that monitor and modify the data. In our scheme the data is not stored as one block, but it is sliced into several slices based on its size. Then, it is first hidden using a permutation hash function. After that, those data slices are encrypted using and encryption algorithm and an access structure that is implicitly included in the ciphertext. In this manner we are reducing the storage time overhead. The authors in [5] formalize the notion of verifiable database with incremental updates (Inc-VDB). As well as in [6] pointing out to Catalano-Fiore’s VDB framework from vector commitment is vulnerable to the so-called forward automatic update (FAU) attack.

For information hiding and securely extracting that information, there are multiple contributions based on different schemes, such as [7]. The authors introduced a T-private PIR which is a generalization of PIR to include the requirement that even if any T of the N databases colludes, the identity of the retrieved message remains completely unknown to them. But in our scheme whatever was the number of colluding databases, the user will only receive the encrypted data slices that are only sufficient for extracting the required information not more. Also, the authors in [8] utilized a new cryptographic primitive, called conditional disclosure of secrets, which we believe that it may be a useful building block for the design of other cryptographic protocols. In our scheme, we have considered slicing the data into multiple slices before encrypting and storing it on the cloud storage servers. The authors in [9] proposed a new symmetric encryption for mobile devices. The authors in [10] have introduced a new operative scheme called RoughDorid for detecting malware applications directly on the smartphone. A Dynamic Fully Homomorphic Encryption-based Merkle Tree is proposed in [11]. An outsourced revocation has been introduced in [12] based on identity-based encryption in cloud computing. A new privacy preserving response scheme has been proposed in [13] based on adaptive key evolution in smart grid. A novel dynamic structure for cloud data has been proposed in [14] for efficient public auditing.

Other strategies for applying access control in the cloud via encryption are developed together two research lines: attribute-based encryption (ABE) and discerning encryption procedures. ABE approaches (e.g., [1518]) supply access control authorities by making sure that the key used to protect a source could be derived exclusively by the consumers that meet a specified condition in their characteristics (e.g., era, function). An efficient and secure data outsourcing with check-ability has been proposed in [19] for cloud computing based on ABE. Also, a novel lightweight encryption mechanism for database is introduced in [20]. The authors in [21] introduced a new identity-based signcryption on lattice without trapdoor. For the multiple sources, the authors in [22] proposed a new homomorphic signature scheme based on network coding and applied it to IoT. The privacy is preserved in IoT using centralized duplicate removal video storage system [23]. Also, a new identity-based antiquantum blind authentication for privacy preserving in wireless sensor networks has been proposed in [24]. For the blind storage, the authors in [25] have introduced efficient multikeyword ranked search for mobile cloud data. In [26] the authors proposed the personalized search in mobile clouds over encrypted data with efficient updates. Reference [27] proposed a new block design-based key agreement for data sharing in cloud computing.

Procedures based on discerning encryption (e.g., [28, 29]) suppose to encrypt every single source using a secret that only licensed users understand or may derive. Within this situation, the information owner either manages policy upgrades, with overhead, or is assigned to the server. Though overencryption is shown to provide functionality that is decent and promises a prompt authorities of policy upgrades, it needs trust premises that are more powerful on the machine, which has to offer support. Also, another set of contributions [30, 31] have implemented access control for a private cloud computing evironment, by proposing the confidence notation for denying or granting access. In the event the host is oblivious of its adoption our scheme may be used. The authors in [32] proposed a flexible EHR sharing scheme supporting offline encryption of EHR and outsourced decryption of EHR ciphertexts in mobile cloud computing. Reference [33] proposed the first lattice-based linearly homomorphic signature in the standard model, which settles this open problem. The authors in [34] proposed a dependable distributed WSN framework for SHM. Then the authors in [35] proposed a new biometrics-based authentication scheme for the multiserver environment.

3. PIR with Cloud Computing

The challenge yet is the way to look for an efficient and protected PIR scheme (regarding costs for information storage and recovery). Our proposed scheme is -private and -out-of- PIR, which means that if only of servers is required to respond and even though servers collude together, the queried information will not be revealed. Our scheme consists of three basic stages: Data Storage and Encryption, User Authorization and Query, and Data Decryption and Reconstruction. We have employed the basic PIR scheme definition [36].

3.1. Data Storage and Encryption

The data will be stored on cloud server after encrypting it with our encryption algorithm. We consider three types of servers in the data storage side, as shown in our system model Figure 1. The Storage Manager Sever receives the encrypted data slices and distributes them on the Data Storage Servers under its own control, and the Key-Server is responsible for the key management that is distributed using the PIR Protocol.

The owner encrypts all the information bits using a content key using symmetric encryption procedures. The content essential is then encrypted by the owner. It requires as inputs the public key handled by the PIR Protocol, the content key , and an access structure . Let be a matrix, where denotes the total number of all the attributes. The function associates rows of to attributes.where each element of the vector represents the data to be stored on each server respectively. Each is the corresponding column after applying on each piece of data. is our hash permutation function that is used to hide the index of each piece of the stored data. More precisely, a hash function maps bit strings of arbitrary finite length to strings of fixed length, say bits. For a domain and range with and , the function is many-to-one, implying that the existence of collisions (pairs of inputs with identical output) is unavoidable.

Definition 1 (Hash Permutation Function ). A hash function is a function and maps bit strings of arbitrary finite length to strings of fixed length, which has the following properties: (1)Compression: maps an input of arbitrary finite bit length, to an output of fixed bit length .(2)Ease of computation: given and an input , is easy to compute.(3)Preimage resistance: for essentially all prespecified outputs, it is computationally infeasible to find any input which hashes to that output, i.e., to find any preimage such that when given any for which a corresponding input is not known.(4)2nd-preimage resistance: it is computationally infeasible to find any second input which has the same output as any specified input, i.e., given , to find a 2nd-preimage such that .(5)Collision resistance: it is computationally infeasible to find any two distinct inputs which hash to the same output, i.e., such that . We can find that the collision resistance implies 2nd-preimage resistance of hash functions, but collision resistance does not guarantee preimage resistance.

3.2. User Authorization and Query

The user can issue a query to receive a file which is partitioned and stored on different server . Considering that the information is hosted on both cloud servers. Then there has to be a decryption token for every user to have the ability to synchronize the information. The user has to issue a query to the PIR protocol including the user’s secret key, attributes, and certificates. The PIR protocol will issue k-queries (one for each server) for the access manager server, where and is a randomly chosen by flipping coins. Each server will respond with a single encrypted answer; the user will have an encrypted answer set , where . The access manager server will reply with the servers’ answers and the decryption token that is sent to the user. Then, it will have the ability to decrypt and rebuild the requested information.

3.2.1. Decryption Token Generation

The decryption token generation algorithm (Algorithm 2) is run by the access manager server. It requires as inputs the ciphertext that implicitly includes an access structure , user’s public key generated by the PIR protocol, user’s secret key , and user’s set of attributes . When suits the accessibility construction , the algorithm could successfully calculate the right decryption token of this ciphertext. Thus, the PIR protocol can transform the user’s query to a set of k-queries for the Access Manager.

3.3. Data Decryption and Reconstruction

Once the user has received the encrypted answers set and its decryption token , he can decrypt the answers with the help of its own secret key and get the answer set that will be used for reconstruction. Since the permutation function can make some collusions probability , the success probability is , and the reconstruction threshold . Based on the value of the user will be able to reconstruct the requested slice. Once the data has been reconstructed, the user can use its given decryption token to decrypt the data.

4. Implementation

Our model is implemented on the top of our private cloud environment, which is based OpenStack [37]. We have built our proposed scheme by initializing a set of virtual instances that are divided into three categories based on their configuration:(i)Category 1:   virtual instances working as storage servers. The configuration of those servers is 4 VCPUS, 4 GB RAM, and 80 GB disk. We have considered ; thus, the IP addresses for those servers are : with subnet mask . Those storage servers have two basic tasks, storing the encrypted data slices submitted to them through the Storage Manager. The second task is sending the encrypted data slices back as answers to the Access Manager server to be delivered to the requesting user.(ii)Category 2: two virtual instances. The configuration for each of them is 4 VCPUS, 8 GB RAM, and 20 GB disk. They are working as PIR Protocol server with IP address and Key Manger server with IP address . After the are received by the PIR Protocol from the Access Manager server, the user must not be given so much data slices to guarantee the data secrecy. Also, the user should not be given less data slices than the required slices that can generate the requested data. Thus, the PIR Protocol server runs the PIR protocol to ensure hiding the requested data slices identity from the cloud storage servers and also ensuring granting the user the appropriate data slices to be able to recover the requested data. The Key Manager server is responsible for assigning the keys for both the data owners and requesting users.(iii)Category 3: two virtual instances. The configuration for each of them is 8 VCPUS, 16 GB RAM, and 40 GB disk. They are working as Storage Manager server with IP address and Access Manager server with IP address . The Storage Manager receives the encrypted data slices and distributes them on the appropriate cloud storage instances. The Access Manager receives the encrypted data slices from the cloud storage instances and sends them to the PIR Protocol.

It should be mentioned that all of those virtual instances are cooperating together to construct the proposed scheme. Each of them has its own task that can perfectly execute it.

5. Performance Analysis

The performance analysis of our proposed scheme is introduced based on two mandatory scenarios.

5.1. Client Situation

Our scheme requires the user to perform a more intricate decryption compared to using Attribute Encryption Scheme (AES) using a conventional encryption manner, by introducing the Token Generation Algorithm 2. In our scheme the decryption is parallelized on several core VCPUs, which makes the user processing much more effective. In our experiments, we have considered five different size categories: 32:127 bits; 128:511 bits; 512:2047 bits; 2048:8191 bits; and 8192:32768 bits.

Figure 2 shows that the decryption cost can be used with all acceptable scenarios for the use of our scheme. Specifically, the figure illustrates the throughput obtained by changing the number of slices, through implementing our scheme in various configurations defined by five size groups (32:127 bits, 128:511 bits, 512:2047 bits, 2048:8191 bits, and 8192:32768 bits). Based on the execution of our encryption protocol (Algorithm 1), we notice that even the largest size category (8192:32768 bits) offers a throughput that is approximately 85 MB/s, while considering 16 slices of that size category. The throughput for the smallest size category (32:127 bits) is about 140 MB/s, while considering 16 slices of that size category. The figure also reveals that decreasing the amount of slices, we achieve the performance level that is 1.5 times that obtained from the 8192:32768 bits size group and 2 times the one obtained by 32:127 bits size group. It should be noted that the experimental results for that part are the average of 25 trails.

Input:(i) The content key for each
(ii) The data public key managed by Shamir secret sharing
(iii) The data access structure, which will be
implicitly included in the ciphertext
Choose The random secrets of each data slice
(2) Choose A random encryption exponent
(3) Choose A random vector, where
are used to share the random encryption exponent
(4) for   to do
(5) Compute is the vector corresponding to the j-th row of
the matrix
(6) end for
(7) Randomly choose
(8) Compute: and
(9) for to do
(10) Compute
(11) Compute and
(12) end for
(13) Compute the ciphertext:
Output:The ciphertext
Input:(i) The ciphertext related to the file
(ii) The user’s public key given by PIR protocol
(iii) The user’s secret key
(iv) The user’s set of attributes
Let: The set of attributes involved in
(2) Choose a set of constants ,
(3) for to do
(4) ifthen are valid shares of the secret according to
the data
(5) Reconstruct the encryption exponent:
(6) end if
(7) end for
(8) Let are random constants
(9) ifthen The user’s attributes satisfies
(10) else cannot successfully computed
(11) will get a random Hexadecimal value
(12) end if
Output:The Decryption Token
5.2. Overlay Situation

In our proposed scheme, the overlay situation is analyzed by using the Swift (it organizes objects within containers) service as a reference. We have adopted the DLO support offered by Swift to implement the get and put_fragment methods that characterize our scheme. With our experiments at the overlay scenario, we have assembled a Swift user program in Python that implements the get and also put_fragment techniques that describe our strategy. We have followed two strategies: one is by fragmenting an object as atomic objects; the second is by using DLO introduced by Swift.

Figure 3 contrasts the period required for the implementation of get requests based on different amounts of contemplated fragments (1, 4, 16, 64, 256, and 1024) to a specific object. The operation is droved dependent on the network bandwidth and also the overhead imposed by the management of every get request.

Specifically for get requests, the overhead introduced into handling one portion for every fragment predominates in the event of little costs, whereas the growth in object's size exhausts the system's bandwidth that causes a great bottleneck. By contemplating an item of size 1 GB, the time required for implementing the get requests is roughly 92 seconds for 1 considered fragment, and the time is 1000 seconds for 1024 considered fragments for the same object size. In case of considering an object of size 64 KB, the time required for executing the get requests is about 0.08 seconds for 1 considered fragment, and the time is 50 seconds for 1024 considered fragments for the same object size.

Figure 4 reports the throughput for a workload caused by the implementation of blending get and also put_fragment requests on Swift by changing the object size and the amount of contemplated fragments for the exact same object. We have assessed the behavior of our scheme based on a selection of 2048 objects, where following every put_fragment request, a succession of 100 get requests were implemented on objects in precisely the exact same group and are all of the exact same size. These configurations using fragments’ throughput will be orders of magnitude greater already. The figure also demonstrates that the very best number of fragments is dependent upon the resource size. The identification of this value needs to think about the setup of the workload and this machine.

6. Conclusion

We presented a robust PIR scheme for efficiently and securely storing and retrieving private information from untrusted cloud servers. Our scheme lets the data owners to effectively divide and encrypt their own data into small slices of five different size categories. Our implementation and experimental analysis confirm the efficacy and efficacy of our proposed scheme, which appreciates orders of magnitude of improvement in throughput concerning source protection and decryption. For an object of size 1 GB, the time required for implementing the get requests is roughly 92 seconds for 1 considered fragment; the time is 1000 seconds for 1024 considered fragments for the same object size. Considering an object of size 64 KB, the time required for executing the get requests is about 0.08 seconds for 1 considered fragment, and the time is 50 seconds for 1024 considered fragments for the same object size. The proposed scheme also supports its compatibility with all present cloud storage environments, which makes it also relevant to a lot of application domains.

Data Availability

The data used to support the findings of this study are available from the authors upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.