CPCStoryVisualization-Pytorch (ECCV 2020)

Author: @yunzhusong, @theblackcat102, @redman0226, Huiao-Han Lu, Hong-Han Shuai

Paper Link (ECCV 2020)

Code implementation for Character-Preserving Coherent Story Visualization

Objects in pictures should so be arranged as by their very position to tell their own story.
           - Johann Wolfgang von Goethe (1749-1832)

In this paper we propose a new framework named Character-Preserving Coherent Story Visualization (CP-CSV) to tackle the challenges in story visualization: generating a sequence of images that emphasizes preserving the global consistency of characters and scenes across different story pictures.

CP-CSV effectively learns to visualize the story by three critical modules: story and context encoder (story and sentence representation learning), figure-ground segmentation (auxiliary task to provide information for preserving character and story consistency), and figure-ground aware generation (image sequence generation by incorporating figure-ground information). Moreover, we propose a metric named Frechet Story Distance (FSD) to evaluate the performance of story visualization. Extensive experiments demonstrate that CP-CSV maintains the details of character information and achieves high consistency among different frames, while FSD better measures the performance of story visualization. The FVD evaluation metric is from here.

Datasets

PORORO images and segmentation images can be downloaded here. Pororo, original pororo datasets with self labeled segmentation mask of the character.
CLEVR with segmentation mask, 13755 sequence of images, generate using Clevr-for-StoryGAN

images/
    CLEVR_new_013754_1.png
    CLEVR_new_013754_1_mask.png
    CLEVR_new_013754_2.png
    CLEVR_new_013754_2_mask.png
    CLEVR_new_013754_3.png
    CLEVR_new_013754_3_mask.png
    CLEVR_new_013754_4.png
    CLEVR_new_013754_4_mask.png

Download link

Setup environment

    virtualenv -p python3 env
    source env/bin/activate
    pip install -r requirements.txt

Train CPCSV

Steps

Download the Pororo dataset and put at DATA_DIR, downloaded. The dataset should contain SceneDialogues/ ( where gif files reside ) and *.npy files.
Modify the DATA_DIR in ./cfg/final.yml
The dafault hyper-parameters in ./cfg/final.yml are set to reproduce the paper results. To train from scratch:

./script.sh

To run the evaluation, specify the —cfg to ./output/yourmodelname/setting.yml, e.g.,:

./script_inference.sh

Evaluate CPCSV

Pretrained model can be download here.

Steps

Download the Pororo dataset and put at DATA_DIR, downloaded. The dataset should contain SceneDialogues/ ( where gif files reside ) and *.npy files.
Modify the DATA_DIR in ./cfg/final.yml
To evaluate FID, FSD of the pretrained model:

./script_inference.sh

Tensorboard

Use the tensorboard to check the results.

    tensorboard --logdir output/ --host 0.0.0.0 --port 6009

The slide and the presentation video:

The slide and the presentation video can be found in slides.

Cite

@inproceedings{song2020CPCSV, 
    title={Character-Preserving Coherent Story Visualization},  
    author={Song, Yun-Zhu and Tam, Zhi-Rui and Chen, Hung-Jen and Lu, Huiao-Han and Shuai, Hong-Han},  
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},  
    year={2020} 
}