项目作者: freedomtan

项目描述 :
DeepLab V3 TFLite models that be fully delegated to NNAPI
高级语言:
项目地址: git://github.com/freedomtan/deeplabv3_tflite_for_nnapi_delegate.git
创建时间: 2020-03-15T04:15:34Z
项目社区:https://github.com/freedomtan/deeplabv3_tflite_for_nnapi_delegate

开源协议:BSD 3-Clause "New" or "Revised" License

下载


deeplabv3_tflite_for_nnapi_delegate

DeepLab V3 TFLite models that can be fully delegated to NNAPI.

When running ./benchmark_model --graph=frozen_inference_graph.tflite --use_nnapi=1, where the frozen_inference_graph.tflite is from the quantized model by Google. We can see that the graph is divided into 8 nodes, becasue AVERAGE_POOL_2D, RESIZE_BILINEAR, and ARG_MAX cannot be delegated.

  1. Operator-wise Profiling Info for Regular Benchmark Runs:
  2. ============================== Run Order ==============================
  3. [node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
  4. TfLiteNnapiDelegate 0.000 11.665 11.422 21.939% 21.939% 0.000 1 [MobilenetV2/expanded_conv_16/project/add_fold, aspp0/Relu]:71
  5. AVERAGE_POOL_2D 11.422 0.075 0.080 0.153% 22.092% 0.000 1 [AvgPool2D/AvgPool]:61
  6. TfLiteNnapiDelegate 11.503 0.549 0.654 1.256% 23.348% 0.000 1 [image_pooling/Relu]:72
  7. RESIZE_BILINEAR 12.157 0.733 0.731 1.404% 24.753% 0.000 1 [ResizeBilinear]:63
  8. TfLiteNnapiDelegate 12.888 1.468 1.386 2.663% 27.416% 0.000 1 [logits/semantic/BiasAdd]:73
  9. RESIZE_BILINEAR 14.275 0.093 0.095 0.182% 27.597% 0.000 1 [ResizeBilinear_1]:68
  10. RESIZE_BILINEAR 14.370 22.826 22.369 42.965% 70.562% 0.000 1 [ResizeBilinear_2]:69
  11. ARG_MAX 36.740 15.440 15.326 29.438% 100.000% 0.000 1 [ArgMax]:70
model name dataset input size output stride note
deeplabv3_mnv2_pascal_513_os8_quant pascal_voc_2012 513x513 8
deeplabv3_mnv2_pascal_513_os16_quant pascal_voc_2012 513x513 16
deeplabv3_mnv2_pascal_257_os8_quant pascal_voc_2012 257x257 8
deeplabv3_mnv2_pascal_257_os16_quant pascal_voc_2012 257x257 16
deeplabv3_mnv2_cityscapes_513_os8_quant cityscapes 513x513 8 dummy quant
deeplabv3_mnv2_cityscapes_513_os16_quant cityscapes 513x513 16 dummy quant
deeplabv3_mnv2_ade20k_513_os8_quant ade20k 513x513 8 dummy quant
deeplabv3_mnv2_ade20k_513_os16_quant ade20k 513x513 16 dummy quant

Note that “dummy quant” ones are generated using dummy-quantization.

How to get fully delegatable .tflite. Why?

TFLite models from Google, such as those in mobilenetv2_coco_voc_trainaug_8bit, are from MobilenetV2 input to ArgMax. As we noted above, there are 3 types of ops preventing them from been fully delegated to NNAPI.

  1. Resize_bilinear: align_corners not supported by NNAPI. How to fix it: change to False in python source code.
  2. Argmax: output_type=int64, which is default value. How to fix it: change it to int32 in source.
  3. Average_pool_2d: filter H*W > 256. How to fix it: 1. Do something like --disable_nnapi_cpu=1 or --nnapi_accelerator_name=neuron-ann in benchmark_model. 2. Simply skip the constraint in NNAPI delegate source code.

Generating pb files / exporting files:

513x513, OS = 8, quant:

  1. PYTHONPATH=`pwd`:`pwd`/slim python deeplab/export_model.py \
  2. --checkpoint_path=/tmp/deeplabv3_mnv2_pascal_train_aug_8bit/model.ckpt \
  3. --export_path=/tmp/deeplab_export/deeplabv3_mnv2_pascal_513_os8_quant.pb \
  4. --model_variant="mobilenet_v2" \
  5. --quantize_delay_step=0

513x513, OS =16, quant:

  1. PYTHONPATH=`pwd`:`pwd`/slim python deeplab/export_model.py\
  2. --checkpoint_path=/tmp/deeplabv3_mnv2_pascal_train_aug_8bit/model.ckpt \
  3. --export_path=/tmp/deeplab_export/deeplabv3_mnv2_pascal_513_os16_quant.pb \
  4. --model_variant="mobilenet_v2" \
  5. --quantize_delay_step=0 \
  6. --output_stride=16

257x257, OS = 8, quant:

  1. PYTHONPATH=`pwd`:`pwd`/slim python deeplab/export_model.py \
  2. --checkpoint_path=/tmp/deeplabv3_mnv2_pascal_train_aug_8bit/model.ckpt \
  3. --export_path=/tmp/deeplab_export/deeplabv3_mnv2_pascal_257_os8_quant.pb \
  4. --model_variant="mobilenet_v2" \
  5. --quantize_delay_step=0 \
  6. --crop_size=257 \
  7. --crop_size=257

513x513, OS =16, quant:

  1. PYTHONPATH=`pwd`:`pwd`/slim python deeplab/export_model.py \
  2. --checkpoint_path=/tmp/deeplabv3_mnv2_pascal_train_aug_8bit/model.ckpt \
  3. --export_path=/tmp/deeplab_export/deeplabv3_mnv2_pascal_257_os16_quant.pb \
  4. --model_variant="mobilenet_v2" \
  5. --quantize_delay_step=0 \
  6. --crop_size=257 \
  7. --crop_size=257 \
  8. --output_stride=16