Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching using pytorch-lightning
Unofficial implementation of Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching using pytorch-lightning
Official implementation: CasMVSNet
Reference MVSNet implementation: MVSNet_pl
--num_groups 8
in training.conda create -n casmvsnet_pl python=3.7
to create a conda environment and activate it by conda activate casmvsnet_pl
)pip install -r requirements.txt
pip install inplace-abn
Please see each subsection for training on different datasets.
Available training datasets:
Download the preprocessed DTU training data and Depth_raw (replace the Depths
folder in mvs_training/dtu
with the Depths/scanXX
in Depth raw
). from original MVSNet repo and unzip. For the description of how the data is created, please refer to the original paper.
Run (example)
python train.py \
--dataset_name dtu \
--root_dir $DTU_DIR \
--num_epochs 16 --batch_size 2 \
--depth_interval 2.65 --n_depths 8 32 48 --interval_ratios 1.0 2.0 4.0 \
--optimizer adam --lr 1e-3 --lr_scheduler cosine \
--exp_name exp
Note that the model consumes huge GPU memory, so the batch size is generally small.
See opt.py for all configurations.
The metrics are collected on the DTU val set.
resolution | n_views | abs_err | acc_1mm | acc_2mm | acc_4mm | GPU mem in GB (train*/val) |
|
---|---|---|---|---|---|---|---|
Paper | 1152x864 | 5 | N/A | N/A | 82.6% | 88.8% | 10.0 / 5.3 |
This repo (same as paper) |
640x512 | 3 | 4.524mm | 72.33% | 84.35% | 90.52% | 8.5 / 2.1 |
This repo (gwc**) |
640x512 | 3 | 4.242mm | 73.99% | 85.85% | 91.57% | 6.5 / 2.1 |
*Training memory is measured on batch size=2
and resolution=640x512
.
**Gwc with num_groups=8
with parameters --depth_interval 2.0 --interval_ratios 1.0 2.5 5.5 --num_epochs 50
, see update 1. This implementation aims at maintaining the concept of cascade cost volume, and build new operations to further increase the accuracy or to decrease inference time/GPU memory.
Download the pretrained model and training log in release.
The above metrics of This repo (same as paper)
correspond to this training but the model is saved on the 10th epoch (least val_loss
but not the best in other metrics).
Run
python train.py \
--dataset_name blendedmvs \
--root_dir $BLENDEDMVS_LOW_RES_DIR \
--num_epochs 16 --batch_size 2 \
--depth_interval 192.0 --n_depths 8 32 48 --interval_ratios 1.0 2.0 4.0 \
--optimizer adam --lr 1e-3 --lr_scheduler cosine \
--exp_name exp
The --depth_interval 192.0
is the product of the coarsest n_depth
and the coarsest --interval_ratio
: 192.0=48x4.0
.
Since BlendedMVS contains outdoor and indoor scenes with a large variety of depth ranges (some from 0.1 to 2 and some from 10 to 200, notice that these numbers are not absolute distance in mm, they’re in some unknown units), it is difficult to evaluate the absolute accuracy (e.g. an error of 2 might be good for scenes with depth range 10 to 200, but terrible for scenes with depth range 0.1 to 2). Therefore, I decide to scale the depth ranges roughly to the same scale (about 100 to 1000). It is done here. In that way, the depth ranges of all scenes in BlendedMVS are scaled to approximately the same as DTU (425 to 935), so we can continue to use the same metrics (acc_1mm, etc) to evaluate predicted depth maps.
Another advantage of the above scaling trick is that when applying model pretrained on DTU to BlendedMVS, we can get better results since their depth range is now roughly the same; if we do without scaling, the model will yield very bad result if the original depth range is for example 0.1 to 2.
Download the pretrained model and training log in release.
Since MVS models consumes a lot of GPU memory, it is indispensable to do some code tricks to reduce GPU memory consumption. I tried the followings:
BatchNorm+Relu
with Inplace-ABN: Reduce the memory by ~15%!del
the tensor when it is never accessed later: Only helps a little.a = a+b
in training and a += b
in testing: Reduce about 300MB (don’t know the reason..)For depth prediction example, see test.ipynb.
For point cloud fusion from depth prediction, please go to evaluations to see the general depth fusion method description, then go to dataset subdirectories for detailed results (qualitative and quantitative).
A video showing the point cloud for scan9 in DTU in different angles and me (click to link to YouTube):
You can follow this great post to convert the point cloud into mesh file. Poisson’ reconstruction turns out to be a good choice. Here’s what I get after tuning some parameters (the parameters should be scene-dependent, so you need to experiment by yourself):