Catalyst.Segmentation
PyTorch framework for Deep Learning research and development.
It was developed with a focus on reproducibility,
fast experimentation and code/ideas reusing.
Being able to research/develop something new,
rather than write another regular train loop.
Break the cycle - use the Catalyst!
Project manifest. Part of PyTorch Ecosystem. Part of Catalyst Ecosystem:
Note: this repo uses advanced Catalyst Config API and could be a bit out-of-day right now.
Use Catalyst’s minimal examples section for a starting point and up-to-day use cases, please.
You will learn how to build image segmentation pipeline with transfer learning using the Catalyst framework.
pip install -r requirements/requirements.txt
This creates a build catalyst-segmentation
with the necessary libraries:
make docker-build
bash
export DATASET="isbi"
rm -rf data/
mkdir -p data
if [[ "$DATASET" == "isbi" ]]; then
# binary segmentation
# http://brainiac2.mit.edu/isbi_challenge/
download-gdrive 1uyPb9WI0t2qMKIqOjFKMv1EtfQ5FAVEI isbi_cleared_191107.tar.gz
tar -xf isbi_cleared_191107.tar.gz &>/dev/null
mv isbi_cleared_191107 ./data/origin
elif [[ "$DATASET" == "voc2012" ]]; then
# semantic segmentation
# http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xf VOCtrainval_11-May-2012.tar &>/dev/null
mkdir -p ./data/origin/images/; mv VOCdevkit/VOC2012/JPEGImages/* $_
mkdir -p ./data/origin/raw_masks; mv VOCdevkit/VOC2012/SegmentationClass/* $_
fi
#### Data structure
Make sure, that final folder with data has the required structure:bash
/path/to/your_dataset/
images/
image_1
image_2
...
image_N
raw_masks/
mask_1
mask_2
...
mask_N
#### Data location
The easiest way is to move your data:
```bash
mv /path/to/your_dataset/ /catalyst.segmentation/data/origin
In that way you can run pipeline with default settings.
* If you prefer leave data in `/path/to/your_dataset/`
* In local environment:
* Link directory
bash
ln -s /path/to/your_dataset $(pwd)/data/origin
* Or just set path to your dataset `DATADIR=/path/to/your_dataset` when you start the pipeline.
* Using docker
You need to set:
bash
-v /path/to/your_dataset:/data \ #instead default $(pwd)/data/origin:/data
```
in the script below to start the pipeline.
The pipeline will automatically guide you from raw data to the production-ready model.
We will initialize Unet model with a pre-trained ResNet-18 encoder. During current pipeline model will be trained sequentially in two stages.
#### Run in local environment:bash
CUDA_VISIBLE_DEVICES=0 \
CUDNN_BENCHMARK="True" \
CUDNN_DETERMINISTIC="True" \
WORKDIR=./logs \
DATADIR=./data/origin \
IMAGE_SIZE=256 \
CONFIG_TEMPLATE=./configs/templates/binary.yml \
NUM_WORKERS=4 \
BATCH_SIZE=256 \
bash ./bin/catalyst-binary-segmentation-pipeline.sh
#### Run in docker:bash
export LOGDIR=$(pwd)/logs
docker run -it --rm --shm-size 8G --runtime=nvidia \
-v $(pwd):/workspace/ \
-v $LOGDIR:/logdir/ \
-v $(pwd)/data/origin:/data \
-e "CUDA_VISIBLE_DEVICES=0" \
-e "USE_WANDB=1" \
-e "LOGDIR=/logdir" \
-e "CUDNN_BENCHMARK='True'" \
-e "CUDNN_DETERMINISTIC='True'" \
-e "WORKDIR=/logdir" \
-e "DATADIR=/data" \
-e "IMAGE_SIZE=256" \
-e "CONFIG_TEMPLATE=./configs/templates/binary.yml" \
-e "NUM_WORKERS=4" \
-e "BATCH_SIZE=256" \
catalyst-segmentation ./bin/catalyst-binary-segmentation-pipeline.sh
#### Run in local environment:bash
CUDA_VISIBLE_DEVICES=0 \
CUDNN_BENCHMARK="True" \
CUDNN_DETERMINISTIC="True" \
WORKDIR=./logs \
DATADIR=./data/origin \
IMAGE_SIZE=256 \
CONFIG_TEMPLATE=./configs/templates/semantic.yml \
NUM_WORKERS=4 \
BATCH_SIZE=256 \
bash ./bin/catalyst-semantic-segmentation-pipeline.sh
#### Run in docker:bash
export LOGDIR=$(pwd)/logs
docker run -it --rm --shm-size 8G --runtime=nvidia \
-v $(pwd):/workspace/ \
-v $LOGDIR:/logdir/ \
-v $(pwd)/data/origin:/data \
-e "CUDA_VISIBLE_DEVICES=0" \
-e "USE_WANDB=1" \
-e "LOGDIR=/logdir" \
-e "CUDNN_BENCHMARK='True'" \
-e "CUDNN_DETERMINISTIC='True'" \
-e "WORKDIR=/logdir" \
-e "DATADIR=/data" \
-e "IMAGE_SIZE=256" \
-e "CONFIG_TEMPLATE=./configs/templates/semantic.yml" \
-e "NUM_WORKERS=4" \
-e "BATCH_SIZE=256" \
catalyst-segmentation ./bin/catalyst-semantic-segmentation-pipeline.sh
The pipeline is running and you don’t have to do anything else, it remains to wait for the best model!
You can use W&B account for visualisation right after pip install wandb
:
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
Tensorboard also can be used for visualisation:
tensorboard --logdir=/catalyst.segmentation/logs
All results of all experiments can be found locally in WORKDIR
, by default catalyst.segmentation/logs
. Results of experiment, for instance catalyst.segmentation/logs/logdir-191107-094627-2f31d790
, contain:
best.pth
and last.pht
can be also found in the corresponding experiment in your W&B account.For your future experiments framework provides powerful configs allow to optimize configuration of the whole pipeline of segmentation in a controlled and reproducible way.
Common settings of stages of training and model parameters can be found in catalyst.segmentation/configs/_common.yml
.
model_params
: detailed configuration of models, including:
model, for instance ResnetUnet
detailed architecture description
using pretrained model
stages
: you can configure training or inference in several stages with different hyperparameters. In our example:
optimizer params
first learn the head(s), then train the whole network
The CONFIG_TEMPLATE
with other experiment`s hyperparameters, such as data_params and is here: catalyst.segmentation/configs/templates/binary.yml
. The config allows you to define:
data_params
: path, batch size, num of workers and so on
* callbacks_params
: Callbacks are used to execute code during training, for example, to get metrics or save checkpoints. Catalyst provide wide variety of helpful callbacks also you can use custom.
You can find much more options for configuring experiments in catalyst documentation.