Code to simulate Score and Refined Score attacks on Searchable Symmetric Encryption (SSE)
This repository contains Python implementation of the attacks presented in:
M. Damie, F. Hahn, A. Peter, “A Highly Accurate Query-Recovery Attack against Searchable Encryption using Non-Indexed Documents” (USENIX 2021)
This repository contains everything necessary to simulate the attacks (Score attack, Refined score attack, with clustering, generalized version). This code was used to obtained all the results presented in the paper mentioned above. It allows a simulation with varying parameters (choice of dataset, countermeasures, document set sizes, etc.).
In the folder score_attacks
, you find 3 elements:
src
: folder containing all the utilitary codes to simulate the attackssimulate_attack.py
: Python script to simulate an attack with parameters specified by the user (dataset, countermeasure, vocabulary size, query set size, number of known queries).generate_results.py
: launch scripts that were used to obtain the vast majority of the results presented in the paper.To simulate a simple attack, you just have to follow these instructions:
pip
installed. NB: you can use a Python virtual environment if you want to keep a clean Python environment (the setup script will install few dependencies).bash setup.sh
. The dataset download and decompression may take a while (~1h) depending on your connection and your computer. Dataset size: Apache emails takes 252 MB and Enron dataset takes 2.6 GB!!score_attacks
folder: cd score_attacks
python3 simulate_attack.py
. One attack simulation (including the keyword extraction) takes few minutes.To explore further the attack, you can change the parameters:
Score attack simulator
optional arguments:
-h, --help show this help message and exit
--similar-voc-size SIMILAR_VOC_SIZE
Size of the vocabulary extracted from similar
documents.
--server-voc-size SERVER_VOC_SIZE
Size of the 'queryable' vocabulary.
--queryset-size QUERYSET_SIZE
Number of queries which have been observed.
--nb-known-queries NB_KNOWN_QUERIES
Number of queries known by the attacker. Known
Query=(Trapdoor, Corresponding Keyword)
--countermeasure COUNTERMEASURE
Which countermeasure will be applied.
--attack-dataset ATTACK_DATASET
Dataset used for the attack
You can find all these information using python3 simulate_attack.py --help
. For example, you can launch the following command: python3 simulate_attack.py --queryset-size 500 --nb-known-queries 5
. At the start of each simulation, the parameters are systematically printed. Moreover, we developped some logging so you can easily follow the simulation progress. At the end, the accuracies with and without refinement are printed.
The attack codes are in the file src/attackers.py
. The rest of the files are only used to setup a simulation (keyword extraction, email cleaning, query generation, etc.).
In src/attackers.py
are present two classes: ScoreAttacker
(for the Score and Refined score attacks) and GeneralizedScoreAttacker
(for their “generalized” version cf. the paper). Both have methods predict
and predict_with_refinement
which needs a list of trapdoors as parameter. These methods will return, for each trapdoor, a keyword. You can also specify a cluster_max_size
in the predict_with_refinement
so you can perform a refinement with clustering (cf. paper appendices).
If you want to understand how to assess the accuracy of these attackers, you can watch either the simulate_attack.py
or the result_procedures.py
(this second file is much messier).
If you want to reproduce most results that are presented in the paper, you have to execute generate_results.py
. The complete execution with 50 repetitions of each experiment may take several days depending on your machine specifications. If you want to reduce the number of repetitions, you can change the variable NB_REP
in src/result_procedures.py
. For convenience purpose, this script also have a file logging (in results.log
) so you can analyze the logs later.
Before the publication of this repo, we tried to simplify as much as possible our function to only keep what is relevent in the paper. We also added docstrings and typing so it is easier for a reader to understand what the functions/classes do and how to use them.
Feel free to contact me (marc [at] damie [dot] eu) if you have some questions about the code or the attack itself (or if you spotted some bugs in our code).
If you have the error setup.sh: 1: set: Illegal option -
when executing setup.sh
, this may be due to the file formatting. It happens when sending (via scp) a file from a Windows computer to a Linux server (Windows formatter using different hidden characters than Unix formatters). To solve this issue, on the Linux server, you just have to install dos2unix (via apt for Debian and Ubuntu) and launch dos2unix setup.sh
. Then, the script should have the adequate format and should be executed successfully.
The installation and the code have been tested on Debian and Ubuntu so some issues may occur on other operating systems.