Bayesian Optimization implementation for text classifiction
This is a simple application of LSTM to text classification task in Pytorch using Bayesian Optimization for hyperparameter tuning.
The dataset used is Yelp 2014 review data[1] which can be downloaded from here.
Detailed instructions are explained below.
You can set various hyperparameters in src/constants.py
file.
The description of each variable is as follows.
Note that for Bayesian Optmization, the hyperparameter to be tuned should be passed in a form of tuple
.
So you can set an argument as a tuple
or a certain value.
The former means that the argument will be included as the subject of Bayesian Optimization and the latter means that it should not be included.
Argument | Type | Description | Default |
---|---|---|---|
device |
torch.device |
The device type. (CUDA or CPU) | torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') |
learning_rates |
tuple (float, float) or float |
The range of learning rates. (or a value) | (0.0001, 0.001) |
batch_sizes |
tuple (int, int) or int |
The range of batch sizes. (or a value) | (16, 128) |
seq_len |
tuple (int, int) or int |
The range of maximum sequence lengths. (or a value) | 512 |
d_w |
tuple (int, int) or int |
The range of word embedding dimensions. (or a value) | 256 |
d_h |
tuple (int, int) or int |
The range of hidden state dimensions in the LSTM. (or a value) | 256 |
drop_out_rate |
tuple (float, float) or float |
The range of drop out rates. (or a value) | 0.5 |
layer_num |
tuple (int, int) or int |
The range of LSTM layer numbers. (or a value) | 3 |
bidirectional |
bool |
The flag which determines whether the LSTM is bidirectional or not. | True |
class_num |
int |
The number of classes. | 5 |
epoch_num |
tuple (int, int) or int |
The range of total iteration numbers. (or a value) | 10 |
ckpt_dir |
str |
The path for saved checkpoints. | ../saved_model |
init_points |
int |
The number of initial points to start Bayesian Optimization. | 2 |
n_iter |
int |
The number of iterations for Bayesian Optimization. | 8 |
Install all required packages.
pip install -r requirements.txt
Download the dataset and extract it.
Of course, you can use another text classification dataset but make sure that the formats/names of files are same as those of Yelp 2014 review dataset. (See the next step.)
Make a directory named data
.
Get files named train.txt
, text.txt
, dev.txt
and wordlist.txt
from yelp14
and put them into data
.
The directory structure should be as follows.
Execute below command to train the model.
python src/main.py --mode='train'
--mode
: This specify the running mode. The mode can be either train
or test
.The Bayesian Optimization is used for hyper-parameter tuning in this task.
You can add/modify the hyperparameter list to tune in main.py
.
self.pbounds = {
'learning_rate': learning_rates,
'batch_size': batch_sizes
}
self.bayes_optimizer = BayesianOptimization(
f=self.train,
pbounds=self.pbounds,
random_state=777
)
Currently, the batch size and the learning rate are only subjects to be adjusted.
If you want to modify self.pbounds
, add the desired hyperparameter and change its value in constant.py
into a tuple consisting of two values, minimum and maximum, sequentially.
Then you should add that hyperparameter as an additional parameter for the function train
like batch_size
and learning_rate
.
After training, you can test the model with test data by following command.
python src/main.py --mode='test' --model_name=MODEL_NAME --inference_batch_size=BATCH_SIZE
model_name
: This is the file name of trained model you want to test. The model is located in saved_models
directory if you didn’t change the checkpoint directory setting. (default: None
)inference_batch_size
: This is the batch size for inference step. This is irrelevant with batch_size
in src/constants.py
since this argument might be subject to Bayesian Optmization process. You can set the separate batch size only for inferencing. (default: 128
)[1] Yelp Open Dataset. (https://www.yelp.com/dataset)