Background
Background on searchable encryption
Searchable encryption: architecture
Searchable encryption: security requirements
-
Retrieved data: Server should not be able to distinguish between documents and determine search contents.
-
Search query: Server should not learn anything about the keyword being searched for. Given a token, the server can retrieve nothing other than pointers to the encrypted content that contains the keyword.
-
Query generation: Server should not be able to generate a coded query. The query can be generated by only those users with the relevant secret key.
-
Search query outcome: Server should not learn anything about the contents of the search outcome.
-
Access patterns: Server should not learn about the sequences and frequency of documents accessed by the user.
-
Query patterns: Server should not learn whether two tokens were intended for the same query.
Searchable encryption: design approaches
Related work
Proposed solution
Adaptively secure searchable symmetric encryption
Content/ document | Content ID | Keyword |
---|---|---|
\(D_{1}\)
| 1 |
\(w_{1}\), \(w_{5}\)
|
\(D_{2}\)
| 2 |
\(w_{1}\), \(w_{3}\), \(w_{8}\)
|
\(D_{3}\)
| 3 |
\(w_{2}\), \(w_{9}\), \(w_{5}\)
|
\(D_{4}\)
| 4 |
\(w_{6}\), \(w_{7}\), \(w_{9}\)
|
\(D_{5}\)
| 5 |
\(w_{1}\), \(w_{4}\), \(w_{10}\)
|
Keyword | Content ID |
---|---|
\(w_{1}\)
| 1, 2, 5 |
\(w_{2}\)
| 3 |
\(w_{3}\)
| 2 |
\(w_{4}\)
| 5 |
\(w_{5}\)
| 1, 3 |
\(w_{6}\)
| 4 |
\(w_{7}\)
| 4 |
\(w_{8}\)
| 2 |
\(w_{9}\)
| 3, 4 |
\(w_{10}\)
| 5 |
Encrypted keyword | Content ID |
---|---|
\(ENC_{K_{1}}\left( w_{1}||1\right)\)
| 1 |
\(ENC_{K_{1}}\left( w_{1}||2\right)\)
| 2 |
\(ENC_{K_{1}}\left( w_{1}||5\right)\)
| 5 |
\(ENC_{K_{1}}\left( w_{2}||3\right)\)
| 3 |
\(ENC_{K_{1}}\left( w_{3}||2\right)\)
| 2 |
\(ENC_{K_{1}}\left( w_{4}||5\right)\)
| 5 |
\(ENC_{K_{1}}\left( w_{5}||1\right)\)
| 1 |
\(ENC_{K_{1}}\left( w_{5}||3\right)\)
| 3 |
\(ENC_{K_{1}}\left( w_{6}||4\right)\)
| 4 |
\(ENC_{K_{1}}\left( w_{7}||4\right)\)
| 4 |
\(ENC_{K_{1}}\left( w_{8}||2\right)\)
| 2 |
\(ENC_{K_{1}}\left( w_{9}||3\right)\)
| 3 |
\(ENC_{K_{1}}\left( w_{9}||4\right)\)
| 4 |
\(ENC_{K_{1}}\left( w_{10}||5\right)\)
| 5 |
Content ID | Encrypted Document |
---|---|
1 |
\(ENC_{K_{2}}\left( D_{1}\right)\)
|
2 |
\(ENC_{K_{2}}\left( D_{2}\right)\)
|
3 |
\(ENC_{K_{2}}\left( D_{3}\right)\)
|
4 |
\(ENC_{K_{2}}\left( D_{4}\right)\)
|
5 |
\(ENC_{K_{2}}\left( D_{5}\right)\)
|
Implementation of PrivCloud system
Framework of the PrivCloud system
Implementation details
-
generate () : This method generates two 128-bit AES keys.
-
store () : This method converts the generated keys (encoded) into strings, writes into a Key object and then stores it to a location at the user device.
-
read () : This provides the functionality to read the keys from user. The method read () takes the Key object as input and reads the key values. Following this, it will convert the key string to an array of bytes and then construct the corresponding secret key from the given byte of array.
-
Document ID allocation: For each document, it assigns a unique document ID. The document ID will be assigned starting from 0 and will increase sequentially.
-
Keyword extraction: The keyword extraction process searches for consecutive sequences of non-blank and non-punctuation characters. It can easily be tweaked to search for consecutive sequences of ASCII characters. This provides adequate results for binary files containing uncompressed English text. To read out the characters from an input document, we need to convert it to a text file first and for this file conversion we have used the Apache Tika. The keyword extractor is implemented by pulling characters from stream, checking if they are acceptable, and accumulating consecutive acceptable characters into a keyword. An additional functionality is also provided where one can limit the number of total keywords by selecting a minimum length of characters in a keyword. For example, if we select the minimum keyword length to be 4 characters, then the keyword extraction algorithm will pull out only those keywords which have 4 or more ASCII characters.
-
encrypt () : This function takes as input all the files from user given path and encrypt all the files and also creates an encrypted index. For each input file the function retrieves the keywords first and for each keyword it computes the 128-bit AES encryption of the keyword and document ID. Then, it creates an index entry in the encrypted index table to list out the encrypted keyword and the corresponding document ID. It uses \(put \left( key, value\right)\) specified in Java TreeMap interface to associate the specified value (document ID) with the specified key (encrypted keyword). Finally, it saves the encrypted index in the database. The input documents are ready for encryption once the indexing is finished. For the encryption of a document, this method first creates a new document in the database and writes the encrypted stream in this. For each encrypted document, the corresponding document ID is used as the new filename of the encrypted document.
-
writeDocIDFileNameMapping () : This method is used to create and store a filename mapping object /filename index which stores the ID of an encrypted document and the corresponding filename. It uses hash table to map the keys (document ID) into values (filename). This object is stored at client and utilized during decryption for the mapping of document ID to retrieve original filename and extension of the document.
-
searchToken () : This takes the keyword and document ID as input and computes the search token at the client device. The keyword encryption key is used to compute the encrypted value of the user given keyword and returns the generated search token. The search token is then sent to the server.
-
search () : This takes place at the server when it receives a search request from the user. This method takes the search token and the user database as input to find out the document IDs containing the search result. It uses get() defined in the Java TreeMap class, which returns the value to which the specified key is mapped, or null if this map contains no mapping for the key. Finally, server retrieves the corresponding document and sends back to the client for decryption.
-
decrypt () : This method decrypts back the data in its plaintext form by using the user decryption key stored at the client device. Once the user has input the encrypted document, the program reads the corresponding document ID from the filename of the encrypted document. Then, it uses the filename index stored at the user device and gets the original filename and extension for an encrypted document by using its ID. Then, it reads the input stream from the encrypted document and decrypts back the plaintext content using the corresponding decryption key. Lastly, this method generates a new document with the retrieved original filename and extension and then writes the decrypted plaintext stream into this file and stores it in the user device.