simple-vqa

Learns an MLP for VQA

This code implements the VQA MLP basline from Revisiting Visual Question Answering Baselines.

Some numbers on VQA

Features/Methods	VQA Val Accuracy	VQA Test-dev Accuracy
MCBP	-	66.4
Baseline - MLP	-	64.9
Imagenet - MLP	63.62	65.9

Readme is a work in progress……

Installation

The MLP is implemented in Torch, and depends on the following packages:
torch/nn,
torch/nngraph,
torch/cutorch,
torch/cunn,
torch/image,
torch/tds,
lua-cjson,
nninit,
torch-word-emb,
torch-hdf5,
torchx

After installing torch, you can install / update these dependencies by running the following:

luarocks install nn
luarocks install nngraph
luarocks install image
luarocks install tds
luarocks install cutorch
luarocks install cunn
luarocks install lua-cjson
luarocks install nninit
luarocks install torch-word-emb
luarocks install torchx

Install torch-hdf5 by following instructions here

Running trained models

Download this repo

git clone --recursive https://github.com/arunmallya/simple-vqa.git

Data Dependencies

Create a data/ folder and symlink or place the following datasets: vqa -> VQA dataset root, coco -> COCO dataset root (coco is needed only if you plan to extract and use your own features, not required if using cached features below).
Download the Word2Vec model file from here. This is needed to encode sentences into vectors. Place the .bin file in the data/models folder.
Download cached resnet-152 imagenet features for the VQA dataset splits and place them in data/feats: features
Download VQA lite annotations and place then in data/vqa/Annotations/. These are required because the original VQA annotations do not fit in the 2GB limit of luajit.
Download MLP models trained on the VQA train set and place them in checkpoint/: models
At this point, your data folder should have models/, feats/, coco/ and vqa/ folders.

Run Eval

For example, to run the model trained on the VQA train set with Imagenet features, on the VQA val set:

th eval.lua -eval_split val \
-eval_checkpoint_path checkpoint/MLP-imagenet-train.t7

In general, the command is:

th eval.lua -eval_split (train/val/test-dev/test-final) \
-eval_checkpoint_path <model-path>

This will dump the results in checkpoint/ as a .json file as well as a results.zip file in case of test-dev and test-final. This results.zip can be uploaded to CodaLab for evaluation.

Training MLP from scratch

th train.lua -im_feat_types imagenet -im_feat_dims 2048