ETH Zurich Fall 2017
This task asked us to use Parallel Stochastic Gradient Descent and MapReduce to complete the Large-Scale Image Classification. Stochastic Gradient Descent classifier was implemented in the given solution. Firstly, input the data from handour_train.txt in the form of (label, image) (each image has 400 features), and then project this large dataset (16K) to the smaller one by using Random Fourier Features with RBF as the kernel. This is the trick for us to do the real computation in lower dimensional, which can perform better in the given time constraint as it cut down the training cost. However, if we only do this transform for the whole classification problem, then the evaluation can’t even pass the easy baseline. Therefore, I also implement ADAM. This method helps classify both dense and sparse features in the lower training cost, but especially useful for classifying dense features. With this method added, the evaluation raises drastically from around 0.6 to 0.83, and this helps me pass the hard baseline eventually.
With the result found in multiple testing of the program, the optimal solution is with 31000 pairs. It can reach the evaluation of 0.832 within 1.16 seconds. However, this result is very memory consuming and slow down the speed when it ran on the server. Hence, I used 27000 pairs instead to reach the hard baseline within the time constraint.