keras-tensorrt-jetson

nVidia’s Jetson platform is arguably the most powerful family of devices for deep learning at the edge. In order to achieve the full benefits of the platform, a framework called TensorRT drastically reduces inference time for supported network architectures and layers. However, nVidia does not currently make it easy to take your existing models from Keras/Tensorflow and deploy them on the Jetson with TensorRT. One reason for this is the python API for TensorRT only supports x86 based architectures. This leaves us with no real easy way of taking advantage of the benefits of TensorRT. However, there is a harder way that does work: To achieve maximum inference performance we can export and convert our model to .uff format, and then load it in TensorRT’s C++ API.

1. Training and exporting to .pb

Train your model
If using Jupyter, restart the kernel you trained your model in to remove training layers from the graph
Reload the models weights
Use an export function like the one in this notebook to export the graph to a .pb file

2. Converting .pb to .uff

I suggest using the chybhao666/cuda9_cudnn7_tensorrt3.0:latest Docker container to access the script needed for converting a .pb export from Keras/Tensorflow to .uff format for TensorRT import.

cd /usr/lib/python2.7/dist-packages/uff/bin
# List Layers and manually pick out the output layer
# For most networks it will be dense_x/BiasAdd, the last one that isn't a placeholder or activation layer
python convert_to_uff.py tensorflow --input-file /path/to/graph.pb -l
# Convert to .uff, replace dense_1/BiasAdd with the name of your output layer
python convert_to_uff.py tensorflow -o /path/to/graph.uff --input-file /path/to/graph.pb -O dense_1/BiasAdd

More information on the .pb export and .uff conversion is available from nVidia

3. Loading the .uff into TensorRT C++ Inference API

I have create a generic class which can load the graph from a .uff file and setup TensorRT for inference. It supports any number of inputs and outputs and is available on my Github. It can be built with nVidia nSight Eclipse Edition using a remote toolchain (instructions here)

Caveats

Keep in mind that many layers are not supported by TensorRT 3.0. The most obvious omission is Keras particular implementation of BatchNorm.
Concatenate only works on the channel axis and if and only if the other dimensions are the same. If you have multiple paths for convolution, you are limited to concatenating them only when they have the same dimensions.

nVidia DIGITS TensorRT Inference Nodes for ROS

Support for DetectNet, ImageNet, and soon SegNet!
Includes a flexible abstraction layer for TensorRT
Github