项目作者: veb-101

项目描述 :
Mini Project on paper "Show, attend and tell: Image Captioning with Visual Attention"
高级语言: Jupyter Notebook
项目地址: git://github.com/veb-101/Image-Captioning.git
创建时间: 2020-04-03T14:06:34Z
项目社区:https://github.com/veb-101/Image-Captioning

开源协议:The Unlicense

下载


Image Captioning with Visual Attention


Open In Colab

  • View Notebook
  • Details about different runs of the project can be found on Weights & Biases
  • To run in Colab you need to add your kaggele’s API token file
  • Final Architecture used:
    • Encoder: InceptionV3
    • Attention: Bahdanau’s Soft attention
    • Decoder: LSTM unit
    • Embeddings: Glove Embedding (glove6b300d)

  • Some Outputs from the final run
  1. Real Caption: a man on snow skis who is performing a jump
    Prediction Caption: a man flying through the sky
    Attention Plot
    Image

  2. Real Caption: a couple of elephants that are by the pond
    Prediction Caption: a group of elephants relax along water in a body of water
    Attention Plot
    Image


  • ToDo
    • Applying beam Search
    • Applyling LearningRateScheduler
    • Making an interface
    • Tuning different Hyperparameters

  • Comments:
    • Code for ExponentialDecay added but not used in the run as evaluating takes a 3 hours on colab.
    • I have added a manual early stopping and saving weights for each epoch (all.zip)
    • Try decreasing vocab_size and increasing number of images used.
    • I couldn’t find any resources for dynamically caching images and loading them directly during run time to save storage space. Numpy’s memmap seems a good starting point.