项目作者： AaronCCWong

项目描述：
A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

高级语言： Python

项目主页：

项目地址: git://github.com/AaronCCWong/Show-Attend-and-Tell.git

创建时间： 2018-11-15T04:09:51Z
项目社区：https://github.com/AaronCCWong/Show-Attend-and-Tell
开源协议：
下载

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

A PyTorch implementation

For a trained model to load into the decoder, use

Some training statistics

BLEU scores for VGG19 (Orange) and ResNet152 (Red) Trained With Teacher Forcing.

BLEU Score	Graph	Top-K Accuracy	Graph
BLEU-1		Training Top-1
BLEU-2		Training Top-5
BLEU-3		Validation Top-1
BLEU-4		Validation Top-5

To Train

This was written in python3 so may not work for python2. Download the COCO dataset training and validation
images. Put them in data/coco/imgs/train2014 and data/coco/imgs/val2014 respectively. Put the COCO
dataset split JSON file from Deep Visual-Semantic Alignments
in data/coco/. It should be named dataset.json.

Run the preprocessing to create the needed JSON files:

python generate_json_data.py

Start the training by running:

python train.py

The models will be saved in model/ and the training statistics will be saved in runs/. To see the
training statistics, use:

tensorboard --logdir runs

To Generate Captions

python generate_caption.py --img-path <PATH_TO_IMG> --model <PATH_TO_MODEL_PARAMETERS>

Todo

Create image encoder class
Create decoder class
Create dataset loader
Write main function for training and validation
Implement attention model
Implement decoder feed forward function
Write training function
Write validation function
Add BLEU evaluation
Update code to use GPU only when available, otherwise use CPU
Add performance statistics
Allow encoder to use resnet-152 and densenet-161

Captioned Examples

Correctly Captioned Images

Correctly Captioned Image 1

Correctly Captioned Image 2

Incorrectly Captioned Images

Incorrectly Captioned Image 1

Incorrectly Captioned Image 2

References

Show, Attend and Tell

Original Theano Implementation

Neural Machine Translation By Jointly Learning to Align And Translate

Karpathy’s Data splits