Keras implementation of NIC(Show And Tell)
Keras implementation of Neural Image Caption Generator(NIC)
Related paper : Vinyals, Oriol, et al. “Show and tell: A neural image caption generator.” computer vision and pattern recognition (2015): 3156-3164.
Trained on: Flickr8k
Inspired by :
Code works well under these settings(Maybe also works well for other settings):
python 3.6.8
keras 2.2.4 tensorflow backend
numpy 1.15.4
nltk 3.4
PIL 5.4.1
tensorflow 1.12.0 (Only use to define the TensorBoardCaption to monitor the captions genrated during the training by tensorboard, if you don’t need it is ok to bypass tensorflow module)
current_best.h5: current best model weights. Put it in the ./model-params
features_dict.pkl: all filckr8k image features extracted by Inception_v3. Put it in the ./datasets
(Use beam search size=5)
Evaluation by evaluate.py
BLEU-1 0.608
BLEU-2 0.419
BLEU-3 0.272
BLEU-4 0.184
Evaluation by pycocoevalcap
CIDEr: 0.532
Bleu_4: 0.210
Bleu_3: 0.305
Bleu_2: 0.442
Bleu_1: 0.625
ROUGE_L: 0.460
METEOR: 0.202
SPICE: 0.144
Examples: