项目作者: danieljaensch

项目描述 :
Project code for Kaggle's Digit Recognizer
高级语言: Python
项目地址: git://github.com/danieljaensch/kaggle-DigitRecognizer.git
创建时间: 2018-11-21T18:35:33Z
项目社区:https://github.com/danieljaensch/kaggle-DigitRecognizer

开源协议:MIT License

下载


[Kaggle] Project: Digit Recognizer

Project code for Kaggle’s Digit Recognizer competition 2018.

Getting Started

This project is the result of a personal desire to develop an image classifier for handwritten digits. After a lot of research I found a good approach on the website of Kaggle. The project data are also from the above mentioned competition.

Prerequisites

Thinks you have to install or installed on your working machine:

  • Python 3.7
  • Numpy (win-64 v1.15.4)
  • Pandas (win-64 v0.23.4)
  • Matplotlib (win-64 v3.0.2)
  • Torchvision (win-64 v0.2.1)
  • PyTorch (win-64 v0.4.1)

Environment:

Installing

Use the package manager pip or
miniconda or Anaconda to install your packages.
A step by step guide to install the all necessary components in Anaconda for a Windows-64 System:

  1. conda install -c conda-forge numpy
  2. conda install -c conda-forge pandas
  3. conda install -c conda-forge matplotlib
  4. pip install torchvision
  5. conda install -c pytorch pytorch

Image Data Description (Kaggle)

The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.

Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.

The training data set, (train.csv), has 785 columns. The first column, called “label”, is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.

Each pixel column in the training set has a name like pixelx, where x is an integer between 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27, inclusive. Then pixelx is located on row i and column j of a 28 x 28 matrix, (indexing by zero).

For example, pixel31 indicates the pixel that is in the fourth column from the left, and the second row from the top, as in the ascii-diagram below.

Visually, if we omit the “pixel” prefix, the pixels make up the image like this:

  1. 000 001 002 003 ... 026 027
  2. 028 029 030 031 ... 054 055
  3. 056 057 058 059 ... 082 083
  4. | | | | ... | |
  5. 728 729 730 731 ... 754 755
  6. 756 757 758 759 ... 782 783

The test data set, (test.csv), is the same as the training set, except that it does not contain the “label” column.

Your submission file should be in the following format: For each of the 28000 images in the test set, output a single line containing the ImageId and the digit you predict. For example, if you predict that the first image is of a 3, the second image is of a 7, and the third image is of a 8, then your submission file would look like:

  1. ImageId,Label
  2. 1,3
  3. 2,7
  4. 3,8
  5. (27997 more lines)

The evaluation metric for this contest is the categorization accuracy, or the proportion of test images that are correctly classified. For example, a categorization accuracy of 0.97 indicates that you have correctly classified all but 3% of the images.

Running the project

The original image-data train.csv file includes 42000 raw-image datas for 28x28 pixel grayscale images of digits and the label of each image in front of the raw data set.

First, you have to convert the raw image data’s from the csv-file to real grayscale images.
In these convertion steps, the input training data’s will be separated to a training-set (4/6), a validation-set (1/6) and a test-set (1/6).

Convert train.csv file to real grayscale images

First, you need to convert the raw-image data from file train.csv to real grayscale images.

  1. convert_data_to_MNIST.py

Parameters of convertion

To change the folder input and output folder, you can change the following constants inside the python file:

  1. CONST_OUTPUT_TRAIN_FOLDER = "./MNIST_data/train/"
  2. CONST_OUTPUT_TEST_FOLDER = "./MNIST_data/test/"
  3. CONST_OUTPUT_VALID_FOLDER = "./MNIST_data/valid/"
  4. CONST_DATA_NN_FOLDER = "./data/"

Ouput of convertion to real grayscale images

  1. ---------------------------------------------------------------------
  2. loading train-data: train.csv ... done.
  3. train_data (lines x columns) : (42000, 785)
  4. ---------------------------------------------------------------------
  5. percent: 100 % ------ file: ./MNIST_data/test/9/test_image_41999.jpg
  6. ----------------- converting training complete ----------------------

After this you will find grayscale images separated in train- / test- and valid-sets in the folder ./MNIST_data/.

Train the model

To train the neural network (CNN), start the following python file:

  1. main_train.py

All needed parameters are set and you can just run it to the results.

In general, the training function is configured that the model-state will saved on
every step the actual validation loss is decreased and lower as the one before.
This save file will be stored and named by the global constant: param_save_filename_validation_loss

After the whole training section, the whole model with some parameters will be saved in
a checkpoint file. This can be configured by the global constant:
param_save_filename

In case you want to re-train the model again, after some training and you don’t want to
start from the scratch. You can load the best-validation-loss-save-file after the model
is created and uncomment the 3rd line (which start with model_nn.load_state_dictionary(...)):

  1. # --------- create model --------
  2. model_nn = cnn.CNNNetwork( param_data_directory, param_save_filename_validation_loss, param_output_size, param_hidden_units, param_learning_rate, param_gpu )
  3. # load previous save model from the validation loss file
  4. #model_nn.load_state_dictionary( param_save_filename_validation_loss )

In this case, your best-validation-loss-save-file will be loaded and the model will start to train from this point.

ATTENTION: Be sure, your network parameters like output-size and hidden-units are the same as you saved the validation-loss-file.

Parameters of training

To change to input folder, the output size and some other parameters for the neural network, you can adapt these global constants inside the python file.

  1. # ---- set parameters ---------------
  2. param_data_directory = "MNIST_data" # default: MNIST_data
  3. param_output_size = 10 # 10 - original
  4. param_save_filename = "checkpoint.pth" # checkpoint.pth
  5. param_save_filename_validation_loss = "checkpoint_validation_model.pth"
  6. param_save_directory = "./" # ./
  7. param_learning_rate = 0.001 # 0.001
  8. param_hidden_units = 512 # 512
  9. param_epochs = 3 # 3
  10. param_print_every_steps = 20 # 20
  11. param_gpu = True # True or False
  12. # -----------------------------------

Output of training

  1. ----- running with params -----
  2. data directory: MNIST_data
  3. save directory: ./
  4. learning rate: 0.001
  5. hidden units: 512
  6. epochs: 3
  7. gpu: True
  8. -------------------------------
  9. cnn neural network ...
  10. load image data ... done
  11. create model ... done
  12. initialized.
  13. start training in -gpu- mode ...
  14. adjust learning rate in epoch 1 to 0.001
  15. epoch: 1/3.. training loss: 1.9140.. validation loss: 2.1263.. validation accuracy: 0.4642 saving model ...
  16. epoch: 1/3.. training loss: 1.4975.. validation loss: 1.4882.. validation accuracy: 0.6012 saving model ...
  17. epoch: 1/3.. training loss: 1.2465.. validation loss: 1.1469.. validation accuracy: 0.6630 saving model ...
  18. epoch: 1/3.. training loss: 1.0826.. validation loss: 0.9405.. validation accuracy: 0.7030 saving model ...
  19. epoch: 1/3.. training loss: 1.0147.. validation loss: 0.7908.. validation accuracy: 0.7678 saving model ...
  20. epoch: 1/3.. training loss: 0.8401.. validation loss: 0.7744.. validation accuracy: 0.7470 saving model ...
  21. epoch: 1/3.. training loss: 0.7997.. validation loss: 0.5522.. validation accuracy: 0.8356 saving model ...
  22. epoch: 1/3.. training loss: 0.8026.. validation loss: 0.5758.. validation accuracy: 0.8297
  23. epoch: 1/3.. training loss: 0.7686.. validation loss: 1.2378.. validation accuracy: 0.6027
  24. epoch: 1/3.. training loss: 0.6253.. validation loss: 0.6213.. validation accuracy: 0.8003
  25. epoch: 1/3.. training loss: 0.6962.. validation loss: 0.4294.. validation accuracy: 0.8720 saving model ...
  26. epoch: 1/3.. training loss: 0.6369.. validation loss: 0.4598.. validation accuracy: 0.8718
  27. epoch: 1/3.. training loss: 0.6764.. validation loss: 0.4090.. validation accuracy: 0.8826 saving model ...
  28. ...
  29. epoch: 3/3.. training loss: 0.2202.. validation loss: 0.1290.. validation accuracy: 0.9620 saving model ...
  30. epoch: 3/3.. training loss: 0.2302.. validation loss: 0.1323.. validation accuracy: 0.9603
  31. epoch: 3/3.. training loss: 0.2535.. validation loss: 0.1291.. validation accuracy: 0.9620
  32. epoch: 3/3.. training loss: 0.2277.. validation loss: 0.1312.. validation accuracy: 0.9615
  33. epoch: 3/3.. training loss: 0.2648.. validation loss: 0.1263.. validation accuracy: 0.9618 saving model ...
  34. epoch: 3/3.. training loss: 0.2106.. validation loss: 0.1259.. validation accuracy: 0.9628 saving model ...
  35. -- done --
  36. duration: 00:13:49
  37. save model to: ./checkpoint.pth ... done
  38. calculate accuracy on test ... done.
  39. accuracy of the network on the 10000 test images: 96.369 %
  40. duration: 00:00:56

After a iteration of 3 epochs, I got a result of 96.369%.
The best result I got, was 98.321% after around 20 epochs with this configuration.
I’m getting closer to the target: 99.9%

Get prediction of the previous trained model

To get prediction of the previous trained neural network (CNN), start the following python file:

  1. main_predict.py

Parameters of prediction

To change to input folder, the output size and some other parameters for the neural network, you can adapt these global constants inside the python file.

  1. # ---- set parameters ---------------
  2. param_data_directory = "MNIST_data" # default: MNIST_data
  3. param_output_size = 10 # 10 - original
  4. param_save_filename_validation_loss = "checkpoint_validation_model.pth"
  5. param_save_directory = "./" # ./
  6. param_learning_rate = 0.001 # 0.001
  7. param_hidden_units = 512 # 512
  8. param_gpu = True # True or False
  9. # -----------------------------------
  10. param_image_file = "./MNIST_data/test/6/test_image_1914.jpg" # default: ./MNIST_data/test/6/test_image_1914.jpg
  11. param_load_file_name = "checkpoint.pth" # default: checkpoint.pth
  12. param_top_k = 5 # 5
  13. # -----------------------------------

Output of prediction

  1. ----- running with params -----
  2. data directory: MNIST_data
  3. save directory: ./
  4. learning rate: 0.001
  5. hidden units: 512
  6. gpu: True
  7. -------------------------------
  8. ----- running with params -----
  9. image file: ./MNIST_data/test/6/test_image_1914.jpg
  10. load file: checkpoint.pth
  11. top k: 5
  12. -------------------------------
  13. cnn neural network ...
  14. load image data ... done
  15. create model ... done
  16. initialized.
  17. load model state dict ... done.
  18. --- prediction ---
  19. load image data ... done
  20. get prediction ... done.
  21. 1 with 0.898 is 6
  22. 2 with 0.077 is 8
  23. 3 with 0.020 is 5
  24. 4 with 0.004 is 2
  25. 5 with 0.000 is 0
  26. ------------------
  27. load image data ... done

predicted image (./MNIST_data/test/6/test_image_1914.jpg):

6

The prediction of the digit-image of a 6 was 89.8%.

Improvements

This is my first version of a digit-recognizer with these 28x28 grayscale images.
The next steps will be:

  • do some experiments with the neural network to change the hyperparameters
  • convert the digit-recognizer algorithm to keras
  • convert the digit-recognizer algorithm to fast.ai
  • I will see what’s comming next …

Authors

  • Daniel Jaensch

License

MIT