Searchable symmetric encryption (SSE) Lab
SSE allows you to store information at an untrusted server, so you can make further inquiries about this information, guaranteeing your privacy throughout the process. In this Lab, you will work with two implementations of this technique, in order to understand the foundations on which it is built as well as the data structures and security primitives that are needed for its development. To do so, we will use a Library called Clusion that implements different variations of SSE.
Clusion is an easy to use software library for searchable symmetric encryption
(SSE). Its goal is to provide modular implementations of various
state-of-the-art SSE schemes. Clusion includes constructions that handle
single, disjunctive, conjunctive and (arbitrary) boolean keyword search. All
the implemented schemes have sub-linear asymptotic search complexity in the
worst-case.
Clusion is provided as-is under the GNU General Public License v3 (GPLv3).
Indexing. The indexer takes as input a folder that can contain pdf files,
Micorosft files such .doc, .ppt, media files such as pictures and videos as
well as raw text files such .html and .txt. The indexing step outputs two
lookup tables. The first associates keywords to document filenames while the
second associates filenames to keywords. For the indexing, we use Lucene to
tokenize the keywords and get rid of noisy words. For this phase, Apache
Lucene, PDFBox and POI are required. For our data structures, we use Google
Guava.
Cryptographic primitives. All the implementations make use of the Bouncy
Castle library. The code is modular and all cryptographic primitives are
gathered in the CryptoPrimitives.java
file. The file contains AES-CTR,
HMAC_SHA256/512, AES-CMAC, key generation based on PBE PKCS1 and random string
generation based on SecureRandom. It also contains a synthetic IV AES encryption and AES based authenticated encryption.
In addition, it also contains an
implementation of the HCB1 online cipher from [[BBKN07][BBKN07]].
In this Lab, you will test the following SSE schemes:
2Lev: a static and I/O-efficient SSE scheme [CJJJKRS14].
Dyn2Lev: a dynamic variation of [CJJJKRS14], comes with two instantiations, a first instantiation that
only handles add operations, and a second one that handles delete operations in addition. Both instantiations have forward-security guarantees but at the cost of more interactions and non-optimality (in the case of delete).
Run below commands to build the jar
cd SSELab
mvn clean compile assembly:single
cd target
ls SSELab-1.0-SNAPSHOT-jar-with-dependencies.jar
If the above file exists, build was successful and contains all dependencies
In order to test the previously introduced schemes, follow the next steps:
Create other directory that will contain the key and index files of both implementations.
Run the previously generated .jar by executing the command below
java -jar SSELab-1.0-SNAPSHOT-jar-with-dependencies.jar
Start testing the first option of the main menu (the static implementation). This option has two possible commands 1. Test indexing and query, 2.Test files encryption and query over those files. The first command creates a secure index, but the information is kept in plain text. The second commands allows you to encrypt the files.
Notice that in this case the operation is static; first, you give a set of documents; second, the associated index is created; third, you can search based on keywords of your choice. However, you cannot make any updates over your index. Study the associated implementation and the library used in the generateKey(), buildIndex() and query() methods.
If you chose the second option (Test files encryption and query over those files), after index building, verify that the files were properly encrypted. To do this, try to open them from your preferred editor and notice that it is not possible to see their content.
Then, when you perform some queries, you will have the option to decrypt the returned files. Choose this option and verify that your files were properly decrypted (seeing that their content is accurate and complete).