An image recognition/object detection model that detects handwritten digits and simple math operators. The output of the predicted objects (numbers & math operators) is then evaluated and solved.
Utilizing TensorFlow Object Detection API open source framework makes it feasible to construct, train and deploy a custom object detection model with ease. The detection model shown above uses TensorFlow’s API and detects handwritten digits and simple math operators. In addition, the output of the predicted objects (numbers & math operators) are then evaluated and solved. Currently, the model created above is limited to basic math and linear algebra.
Model: ssd_mobilenet_v1_coco_2017_11_17
Config: ssd_mobilenet_v1_coco
Create an image library - The pre-existing ssd_mobilenet_v1_coco model was trained with a custom, created from scratch, image library (of math numbers & operators). This image library can be substituted with any object or objects of choice. Due to the constraint of time, the model above was trained on a total of 345 images of which 10% was allocated for test validation.
Box & label each class - In order to train and test the model, TensorFlow requires that a box is drawn for each class. To be more specific, it needs the X and Y axis (ymin, xmin, ymax, xmax) of the box in relation to the image. These coordinates is then respectively divided by the lenght or width of the image and is stored as a float. An example of the process is shown below. (Note: the current model contains 23 classes) Thanks to tzutalin tzutalin, labelImg, with the creation of GUI that makes this process easy.
Convert files - Once the labeling process is complete the folder will be full with XML files, however this cannot be used yet by TensorFlow for training and testing. Instead the XML files needs to be converted into a CSV file. Then the CSV file will then be converted to tfrecords file for training.
Create pbtxt - Create a pbtxt file by creating ID’s and Name (labels) for each class. This file will be used with the finished model as an category_index.
Train the model - (See model above)
Summary: input layer —> 3x3 CNN —> batch normalization —> activation function: ReLu —> 1x1 CNN —> batch normalization —> activation function: ReLu —> output layer.
After the output layer, it compares the output to the intended output —> cost function (weighted_sigmoid) —> optimization function (optimizer) —> minimize cost (rms_prop_optimizer, learning rate = 0.004)
1 cycle of summary above = 1 Global Step
This process requires heavy computing power, due to the constraints of hardware (CPU only), it took approximately 4 days & 7 hours to complete 50k Global Step.
Export inference graph - Once a model is trained with an acceptable loss rate. It is stopped by the user manually. As the model is being trained it is creates a checkpoint file after each set milestone. This checkpoint file is then converted into an inference graph which is used for deployment/serving.
Google’s object detection (Link)
Will contain:
Tensorflow detection model zoo (Link)
Will contain:
Tensorflow config files (Link)
Will contain:
Racoon’s object detection (Link)
Will contain:
Label images with labelImg (Link)
Will contain:
To Harrison Kinsley (Sentdex)