项目作者: RishabhMaheshwary

项目描述 :
Natural Language Attacks in a Hard Label Black Box Setting.
高级语言: Python
项目地址: git://github.com/RishabhMaheshwary/hard-label-attack.git
创建时间: 2020-12-07T03:12:43Z
项目社区:https://github.com/RishabhMaheshwary/hard-label-attack

开源协议:

下载


Generating Natural Language Attacks in a Hard Label Black Box Setting

This repository contains source code for the research work described in our AAAI 2021 paper:

Generating Natural Language Attacks in a Hard Label Black Box Setting

The hard label attack has also been implemented in TextAttack library.

Follow these steps to run the attack from the library:

  1. Fork the repository

  2. Run the following command to install it.

    ```bash
    $ cd TextAttack
    $ pip install -e . “.[dev]”

  3. Run the following command to attack bert-base-uncased trained on MovieReview dataset.

    ```bash
    $ textattack attack —recipe hard-label-attack —model bert-base-uncased-mr —num-examples 100

Take a look at the models directory in TextAttack to run the attack across any dataset and any target model.

Instructions for running the attack from this repository.

Requirements

  • Pytorch >= 0.4
  • Tensorflow >= 1.0
  • Numpy
  • Python >= 3.6
  • Tensorflow 2.1.0
  • TensorflowHub

Download Dependencies

  • Download pretrained target models for each dataset bert, lstm, cnn unzip it.

  • Download the counter-fitted-vectors from here and place it in the main directory.

  • Download top 50 synonym file from here and place it in the main directory.

  • Download the glove 200 dimensional vectors from here unzip it.

How to Run:

Use the following command to get the results.

For BERT model

  1. python3 classification_attack.py \
  2. --dataset_path path_to_data_samples_to_attack \
  3. --target_model Type_of_taget_model (bert,wordCNN,wordLSTM) \
  4. --counter_fitting_cos_sim_path path_to_top_50_synonym_file \
  5. --target_dataset dataset_to_attack (imdb,ag,yelp,yahoo,mr) \
  6. --target_model_path path_to_pretrained_target_model \
  7. --USE_cache_path " " \
  8. --max_seq_length 256 \
  9. --sim_score_window 40 \
  10. --nclasses classes_in_the_dataset_to_attack

Example of attacking BERT on IMDB dataset.

  1. python3 classification_attack.py \
  2. --dataset_path data/imdb \
  3. --target_model bert \
  4. --counter_fitting_cos_sim_path mat.txt \
  5. --target_dataset imdb \
  6. --target_model_path bert/imdb \
  7. --USE_cache_path " " \
  8. --max_seq_length 256 \
  9. --sim_score_window 40 \
  10. --nclasses 2

Example of attacking BERT on SNLI dataset.

  1. python3 nli_attack.py \
  2. --dataset_path data/snli \
  3. --target_model bert \
  4. --counter_fitting_cos_sim_path mat.txt \
  5. --target_dataset snli \
  6. --target_model_path bert/snli \
  7. --USE_cache_path "nli_cache" \
  8. --sim_score_window 40

Results

The results will be available in results_hard_label directory for classification task and in results_nli_hard_label for entailment tasks.
For attacking other target models look at the commands folder.

Training target models

To train BERT on a particular dataset use the commands provided in the BERT directory. For training LSTM and CNN models run the train_classifier.py --<model_name> --<dataset>.

If you find our repository helpful, consider citing our work.

  1. @article{maheshwary2020generating,
  2. title={Generating Natural Language Attacks in a Hard Label Black Box Setting},
  3. author={Maheshwary, Rishabh and Maheshwary, Saket and Pudi, Vikram},
  4. journal={arXiv preprint arXiv:2012.14956},
  5. year={2020}
  6. }