code for the paper: Character-Preserving Coherent Story Visualization
Author: @yunzhusong, @theblackcat102, @redman0226, Huiao-Han Lu, Hong-Han Shuai
Code implementation for Character-Preserving Coherent Story Visualization
Objects in pictures should so be arranged as by their very position to tell their own story.
- Johann Wolfgang von Goethe (1749-1832)
In this paper we propose a new framework named Character-Preserving Coherent Story Visualization (CP-CSV) to tackle the challenges in story visualization: generating a sequence of images that emphasizes preserving the global consistency of characters and scenes across different story pictures.
CP-CSV effectively learns to visualize the story by three critical modules: story and context encoder (story and sentence representation learning), figure-ground segmentation (auxiliary task to provide information for preserving character and story consistency), and figure-ground aware generation (image sequence generation by incorporating figure-ground information). Moreover, we propose a metric named Frechet Story Distance (FSD) to evaluate the performance of story visualization. Extensive experiments demonstrate that CP-CSV maintains the details of character information and achieves high consistency among different frames, while FSD better measures the performance of story visualization. The FVD evaluation metric is from here.
PORORO images and segmentation images can be downloaded here. Pororo, original pororo datasets with self labeled segmentation mask of the character.
CLEVR with segmentation mask, 13755 sequence of images, generate using Clevr-for-StoryGAN
images/
CLEVR_new_013754_1.png
CLEVR_new_013754_1_mask.png
CLEVR_new_013754_2.png
CLEVR_new_013754_2_mask.png
CLEVR_new_013754_3.png
CLEVR_new_013754_3_mask.png
CLEVR_new_013754_4.png
CLEVR_new_013754_4_mask.png
Download link
virtualenv -p python3 env
source env/bin/activate
pip install -r requirements.txt
Download the Pororo dataset and put at DATA_DIR, downloaded. The dataset should contain SceneDialogues/ ( where gif files reside ) and *.npy files.
Modify the DATA_DIR in ./cfg/final.yml
The dafault hyper-parameters in ./cfg/final.yml are set to reproduce the paper results. To train from scratch:
./script.sh
./script_inference.sh
Pretrained model can be download here.
Download the Pororo dataset and put at DATA_DIR, downloaded. The dataset should contain SceneDialogues/ ( where gif files reside ) and *.npy files.
Modify the DATA_DIR in ./cfg/final.yml
To evaluate FID, FSD of the pretrained model:
./script_inference.sh
Use the tensorboard to check the results.
tensorboard --logdir output/ --host 0.0.0.0 --port 6009
The slide and the presentation video can be found in slides.
@inproceedings{song2020CPCSV,
title={Character-Preserving Coherent Story Visualization},
author={Song, Yun-Zhu and Tam, Zhi-Rui and Chen, Hung-Jen and Lu, Huiao-Han and Shuai, Hong-Han},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}