项目作者: chaitanya100100

项目描述 :
Implemented Variational Autoencoder generative model in Keras for image generation and its latent space visualization on MNIST and CIFAR10 datasets
高级语言: Python
项目地址: git://github.com/chaitanya100100/VAE-for-Image-Generation.git
创建时间: 2017-11-03T07:33:48Z

开源协议:MIT License


VAE for Image Generation

Variational AutoEncoder - Keras implementation on mnist and cifar10 datasets


  • keras
  • tensorflow / theano (current implementation is according to tensorflow. It can be used with theano with few changes in code)
  • numpy, matplotlib, scipy

implementation Details

code is highly inspired from keras examples of vae : vae,
(source files contains some code duplication)


  • images are flatten out to treat them as 1D vectors
  • encoder and decoder - both have normal neural network architecture
network architecture
" class="reference-link">src/mnist_train.py
  • it trains vae model according to the hyperparameters defined in src/mnist_params.py
  • entire vae model, encoder and decoder is stored as keras models in models directory as ld_<latent_dim>_id_<intermediate_dim>_e_<epochs>_<vae/encoder/decoder>.h5 where <latent_dim> is number of latent dimensions, <intermediate_dim> is number of neurons in hidden layer and <epochs> is number of training epochs
  • after training, the saved model can be used to analyse the latent distribution and to generate new images
  • it also stores the training history in ld_<latent_dim>_id_<intermediate_dim>_e_<epochs>_history.pkl
" class="reference-link">src/mnist_2d_latent_space_and_generate.py
  • it is only for 2 dimensional latent space
  • it loads trained model according to the hyperparameters defined in mnist_params.py
  • it displays the latent space distribution and then generates the images according to user input of latent variables (see the code as it is almost self-explanatory)
  • it can also generate images from latent vectors randomly sampled from 2D latent space (comment out the user input lines) and display them in a grid
" class="reference-link">src/mnist_3d_latent_space_and_generate.py
  • it is same as mnist_2d_latent_space_and_generate.py but it is for 3d latent space
" class="reference-link">src/mnist_general_latent_space_and_generate.py
  • it loads trained model according to the hyperparameters defined in mnist_params.py
  • if latent space is either 2D or 3D, it displays it
  • it displays a grid of images generated from randomly sampled latent vectors


2D latent space
latent space uniform sampling
2D 2D
3D latent space


3D latent space results
uniform sampling random sampling
3D 3D
  • more results are in images directory


  • images are treated as 2D input
  • encoder has the architecture of convolutional neural network and decoder has the architecture of deconvolutional network
  • network architecture for encoder and decoder are as follows
, src/cifar10_generate.py" class="reference-link">src/cifar10_train.py , src/cifar10_generate.py

implementation structure is same as mnist files

result - latent dimensions 16

25 epochs 50 epochs 75 epochs
cifar10 cifar10 cifar10
600 epochs


  • caltech101_<sz>_train.py and caltech101_<sz>_generate.py (where sz is the size of input image - here the training was done for two sizes - 92*92 and 128*128) are same as cifar10 dataset files
  • as the image size is large, more computation power is needed to train the model
  • results obtained with less training are qualitatively not good
  • in dataset directory, src/caltech101_preprocess.py is provided to preprocess the dataset