PROVIDE: A Probabilistic Framework for Unsupervised Video Decomposition (UAI 2021)
This is the code repository complementing the paper “PROVIDE: A Probabilistic Framework for Unsupervised Video Decomposition”. The pretrained models are included in the repo.
GIF showing results for CLEVRER dataset | Scene Decomposition Experiment |
---|---|
![]() |
![]() |
It might take a while.
python bb_binary.py
python slice_videos_from_annotations.py
You will need at least one GPU to run tests. We used GeForce GTX 10 series GPU. For the Python version we had Python 3.6.9. Use the following commands to test the models:
Here and everywhere below use the flag —max_num_frames to set the number of frames per video. Default is 30.
python scripts/test.py --batch_size 1 --datapath /path/to/bb_binary/ --gt_datapath /path/to/bb_color --model_name bb_binary --T 6 --K 5
For enabling the frame prediction please use the —predict_frames flag followed by the desired number of predicted frames.
Also please make sure that you prepared both bb_binary and bb_color for this experiment if you want to compute the scores. bb_color is the GT for bb_binary. If you don’t want to compute the scores and just want to visualise results then you can set the —gt_datapath to /path/to/bb_binary/.
If you want to visualise the outputs or generate latents walks add the batch numbers to batch_to_print and batch_to_print_latent in the test.py.
python scripts/test.py --batch_size 1 --datapath /path/to/bb_color/ --gt_datapath /path/to/bb_color --model_name bb_color --T 6 --K 5
Bouncing Balls 4-8 balls/binary
python scripts/test.py --batch_size 1 --datapath /path/to/bb_binary678/ --gt_datapath /path/to/bb_color678/ --model_name bb_binary --T 6 --K 9 --max_num_frames 10
Bouncing Balls 4-8 balls/colored/ colors
python scripts/test.py --batch_size 1 --datapath /path/to/bb_color678/ --gt_datapath /path/to/bb_color678/ --model_name bb_color --T 6 --K 9 --max_num_frames 10
Bouncing Balls 4-8 balls/colored/4 colors
python scripts/test.py --batch_size 1 --datapath /path/to/bb_color678_4_colors/ --gt_datapath /path/to/bb_color678/ --model_name bb_color --T 6 --K 9 --max_num_frames 10
CLEVRER 3-5 objects
python scripts/test.py --batch_size 1 --datapath /path/to/clevrer345/ --gt_datapath /path/to/clevrer345masks/ --model_name clevrer --T 5 --K 6
python scripts/test.py --batch_size 1 --datapath /path/to/clevrer6/ --gt_datapath /path/to/clevrer6masks/ --model_name clevrer --T 5 --K 6
For training models we used 8 GeForce GTX 10 series GPUs.
python scripts/train.py --batch_size 32 --max_num_frames 4 --datapath /path/to/bb_binary/ --model_name bb_binary --T 6 --K 5
Bouncing balls color
python scripts/train.py --batch_size 32 --max_num_frames 4 --param_schedule --datapath /path/to/bb_color/ --model_name bb_color_train --T 6 --K 5
CLEVRER
python scripts/train.py --batch_size 32 --max_num_frames 4 --param_schedule --datapath /path/to/clevrer/ --gt_datapath /path/to/clevrer/ --model_name clevrer --T 5 --K 6
We thank Michael Kelly for allowing us to use his implementation of the IODINE[1] paper that has served as a backbone for developing this model.
Greff, K., Kaufmann, R.L., Kabra, R., Watters, N., Burgess, C., Zoran, D., Matthey, L., Botvinick, M., Lerchner, A.: Multi-object representation learning with iterative variational inference. https://arxiv.org/pdf/1903.00450.pdf.
Van Steenkiste, S., Chang, M., Greff, K., Schmidhuber, J.: Relational neural expectation maximization: Unsupervised discovery of objects and their interactions. https://arxiv.org/pdf/1802.10353.pdf.
Yi, K., Gan, C., Li, Y., Kohli, P., Wu, J., Torralba, A., Tenenbaum, J.B.: Clevrer: Collision events for video representation and reasoning. https://arxiv.org/pdf/1910.01442.pdf.