项目作者: dennisbappert

项目描述 :
Pretrained Pytorch license plate segmentation model (DeepLabV3 with ResNet-101 backbone)
高级语言: Python
项目地址: git://github.com/dennisbappert/pytorch-licenseplate-segmentation.git
创建时间: 2020-05-02T08:53:03Z
项目社区:https://github.com/dennisbappert/pytorch-licenseplate-segmentation

开源协议:MIT License

下载


License Plate Segmentation

Preview

Table of contents

Quick start

  1. Install all dependencies:

    1. # With conda - best to start in a fresh environment:
    2. conda install --yes pytorch torchvision cudatoolkit=10.2 -c pytorch
    3. conda install --yes opencv
    4. conda install --yes matplotlib
    5. conda install --yes -c conda-forge tensorboard
    6. pip install mmcv
    7. # or clone this repo, removing the '-' to allow python imports:
    8. git clone https://github.com/dennisbappert/pytorch-licenseplate-segmentation pytorch_licenseplate_segmentation
    9. # Get started with the sample notebooks
  2. Making predictions:

    1. # Load the model:
    2. model = create_model()
    3. checkpoint = torch.load(weights, map_location='cpu')
    4. model.load_state_dict(checkpoint['model'])
    5. _ = model.eval()
    6. if torch.cuda.is_available():
    7. model.to('cuda')
    8. # Prediction pipeline
    9. def pred(image, model):
    10. preprocess = transforms.Compose([
    11. transforms.ToTensor(),
    12. transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    13. ])
    14. input_tensor = preprocess(image)
    15. input_batch = input_tensor.unsqueeze(0)
    16. if torch.cuda.is_available():
    17. input_batch = input_batch.to('cuda')
    18. with torch.no_grad():
    19. output = model(input_batch)['out'][0]
    20. return output
    21. # Loading an image
    22. img = Image.open(f'{filename}').convert('RGB')
    23. # Defining a threshold for predictions
    24. threshold = 0.1 # 0.1 seems appropriate for the pre-trained model
    25. # Predict
    26. output = pred(img, model)
    27. output = (output > threshold).type(torch.IntTensor)
    28. output = output.cpu().numpy()[0]
    29. # Extracting coordinates
    30. result = np.where(output > 0)
    31. coords = list(zip(result[0], result[1]))
    32. # Overlay the original image
    33. for cord in coords:
    34. frame.putpixel((cord[1], cord[0]), (255, 0, 0))

Pretrained models

The following models have been pretrained (with links to download pytorch state_dict’s):

Model name mIOU Training dataset
model.pth (477MB) 79.51 Handcrafted
model_v2.pth - new version (477MB) 83.17 Handcrafted

The base weights are from here.

The model has been trained (transfer learning) on a small hand-crafted (130 images) dataset. Several augmentations were used during each epoch to ensure a good generalization of the model. However there are certain underrepresented classes (motorcycles, busses, american trucks).

The learning process has been adapted from the original training with the following changes:

  • The learning rate has been decreased to 0.001 as starting point
  • Lovasz Softmax has been used instead of cross entropy loss
  • More augmentations were added to the pipeline
  • SWA has been used to fine-tune the model after the training

The model has been trained on a RTX2060 Super with a batch-size of 2 for 250 Epochs (8 hours training time) on a Windows machine with an eGPU. After the training the model was fine-tuned using Stochastic Weight Averaging in order to generalize better (with bug success, most initial bugs disappeared).

The attached YouTube video shows the strength and weaknesses of the current model:

  • It works quite good overall, even with yellow plates (it never saw them before, the color jitter transformation did a good job)
  • It works in blurry areas (gaussian transformation applied during training)
  • It doesn’t work with trucks and and vans (I should modify my training dataset accordingly to include these classes more balanced)
  • It seems to be way better in detecting license plates on the front of cars, I guess my dataset is not balanced properly

Example notebooks

This notebook demonstrates how you can leverage the model to make predictions on single images.

This notebook demonstrates how you can leverage the model to overlay license plates in videos. This is just an example and loads all frames into memory which is not optimal for big videos. Also there is no resizing applied, which means that you need a beefy GPU with at least 8GB of memory for HD videos.

It might also be an obvious usecase to extract the license plates and use the segmentation results for further license plate recognition. This notebook shows, how to extract the results.

Example:
example_advanced_processing

Training results

loss
iou
miou

Motivation and future plans

I basically started this project out of couriosity to improve my skillset. In particular I wanted to examine whether it is possible to solve an image segmentation problem with a very small dataset and I think I have been somewhat successful so far. I’m pretty new in this domain and appreciate all feedback, having that said, feel free to open as many issues as required.

I’m planning to do the following things in future:

  • Publish the code and all the Notebooks
  • Write a proper README
  • Upload the model
  • Implement license plate recognition (I’m planning to evaluate CRAFT)
  • Build a simple commandline tool to blur license plates in videos (it seems that there is no open-source tool available)

Train with your own data

By default the dataloader expects the following structure:

  • train.py
  • ./dataset
    • ./val
      • images
        • {filename}.jpg
      • masks
        • {filename}.jpg.png
    • ./train
      • images
        • {filename}.jpg
      • masks
        • {filename}.jpg.png

Feel free to change the dataloader to your needs. The masks are expected to be a PNG using 8-bit pixels, black and white. The training outputs a tensorboard summary to ./runs/.

The data should look like this:

training_data

References

  1. Pytorch DeepLabV3 ResNet-101 implementation and weights https://pytorch.org/hub/pytorch_vision_deeplabv3_resnet101/

  2. Lovasz Softmax Loss function https://github.com/bermanmaxim/LovaszSoftmax

  3. Pytorch reference implementation to train the model https://github.com/pytorch/vision/tree/master/references/segmentation

  4. Example videos https://www.pexels.com/