Making an MNIST-like Database, and (a) Model to Recognise it.

Introduction

After building a few machine learning (ML) models that could recognize handwritten characters from the English alphabet, I realized I was only getting half the experience of creating them.The dataset I had used to make those models had been given to me, and I never needed to preprocess it,or even consider manipulating it while I was making the models. In doing this, I had missed out on a huge part of machine learning: gathering the data! Thus, I decided to create my own dataset to use with my new models to experience manipulating the data into a processable form. In order to have something to compare it to, and to test my models against a larger database to prove their efficacy, I made my dataset imitate the MNIST database.

 I began crafting my data set by taking examples of my own handwritten characters, as well as some from my peers. I edited the images and permuted them ,essentially changing shape of the images so the pixel information is different, in order to grow the sample size. I ended up with around 1000 samples, so around 40 of each character in the training set. In order to reduce the noise of the pictures, I transferred them from RGB color channels to BW color channels. Otherwise the Convolutional Neural Networks (CNN’s) would have to take in a red, blue, and green pixel data; however, with only black and white it is merely the two channels per pixel. After resizing them (making them smaller, from HD resolution to around 28×28) to reduce the data needed for processing, I used Python in combination with PyTorch and Google Colaboratory to make the models and process the data. To test the accuracy of these models, I created a training set, and a much smaller validation set. 

To avoid a lengthy, and convoluted explanation, my goal is to give the intuition behind this project. At the end of this article I’ve included some important terms that may help with your understanding of this content.

What is a Convolutional Neural Network?

In this project Convolutional Neural Networks (CNNs) are used to process images, recognize if the characteristics of one image are similar to that of another image, and label them accordingly. This is the goal of computer vision. 

The advantage of using a CNN over another neural net is that it can more easily assign importance to images, or aspects of images, that go beyond just the array of pixels that an image is. It does this by essentially analyzing an image, finding patterns, and then assigning importance to these. So, if there was a certain grouping of pixels, the CNN could record this pattern and, with the validation set, could perhaps recognize that this grouping is a car, for example. 

The truly great thing about CNNs though, is that they don’t simply process an image in the way that a Feed Forward Network would. They are also able to recognize the spatial and temporal traits of an image, like shading and other intricacies that make the image unique, and this leads to better identification of that image. This allows for more meaningful weights that seemingly understand the nature of what makes an image of a dog look like a dog. 

What do the Layers of a Convolutional Neural Network actually do?

The layers of the CNN are where the actual Convolution occurs. For example: Convolution takes, lets say, a 5×5 of pixels and using a filter of size 3×3 converts the 5×5 into a 3×3 by aggregating the values of certain pixels, as determined by the filter while scraping over the 5×5. This cuts the information that needs to be processed down considerably, but one must be careful in choosing the filter size as too much convolution may lose important image data. This 3×3 created after using the filter is then used to analyze the data. 

While the convolution itself is a more complicated process, this is the most basic interpretation of its function to build one’s understanding of it. Below, is a visual example of this situation.

What is a Multilayer Perceptron?

The idea of using a Multilayer Perceptron (MLP), sometimes referred to as a Feed-Forward Network (FFN/FFNN), in computer vision is relatively simple For this reason it is considered one of the faster models to train and use, but is often found to be somewhat inaccurate because of the way the model is trained. 

The perceptrons within the MLP take an image, converts it into only its pixels, and analyzes these to determine weights to place on certain areas so that images can be differentiated using these weights. The upside to using an MLP is that it is quick, and gives results significantly faster; however, it is often less accurate, and more preprocessing of the data is required. Therefore, while it is not ideal for use in computer vision, I created this model purely for comparing and contrasting the other models I made. 

Models Created

In total, I created three models to test the efficiency of the two-layer convolutional layer model. Each of these models was trained using 20 epochs and a learning rate of 0.001. After training a model on the test set, it would then try to classify a given set of (control) data. With these models I used accuracy as the metric to assess them. Since the validation set already had labels associated with each image,  we would know the amount of images that each model was able to classify correctly, and we could put that number over the total amount of images to calculate the accuracy.

Conclusions

The real star of this project was the two-layer convolutional neural net (TLCNN), which was the bulk of the project, and what I really wanted to test. I created the MLPs and the one-layer convolutional neural net (OLCNN) in order to get metrics to compare the TLCNN. The reasoning behind this is that training a CNN takes a longer time to train than the MLP does. 

So, naturally you would want to collect evidence that the extended training time is worth it. When the models were all created and tested against the validation set from my own data, the accuracy of the TLCNN was consistently  ~93%, the accuracy of the OLCNN was consistently ~83%, and the accuracy of the MLP fell in the range of ~50-63%. This data thus proves my hypothesis that a TLCNN would classify images more accurately than either a OLCNN or a MLP. 

It is worth noting as an aside that I also tested my models on the vastly larger MNIST database, both the TLCNN and OLCNN were able to achieve an accuracy of 98%, while the MLP was around 95% accuracy. This data shows that the TLCNN could be more valuable when using a smaller dataset, which in itself would be an interesting theory to experiment with more. 

Moving Forward

In the future I would like to compare the accuracy of a three-layer CNN against the two-layer version. Additionally, I would  like to see how adjusting hyperparameters such as optimizers, activation functions, learning rates, and number of epochs affects these models.

Ultimately, I would  like to create a program that uses my TLCNN to classify images drawn by a person in real time. Imagine a window that you could draw a letter in, submit it to, and the program would then use the model to accurately classify the letters drawn by the person. 

Additional Important Terms

ReLU

The activation algorithm used in the convolutional layers. ReLU stands for rectified linear unit. ReLU is responsible for changing, or transforming the weights in the model that it is attached to. The function itself is piecewise, meaning it has two parts, and is described by two different equations. This sounds a lot more complicated than it really is, since ReLU really just returns the input weight if it is positive, and returns 0 otherwise. 

PyTorch

PyTorch is the framework that I used to make my models. While I could have created this stuff on my own, this project was created in a limited timeline, and I decided it would be best to use a framework.

MNIST Database

A large database of handwritten lowercase letters intended for use with ML models. I used this lightly in my models to be certain that they would also work on a large dataset, but for most of this project I used my own database.

Adam Optimizer

This optimizer is based off of stochastic gradient descent.. I decided to use Adam because it is quick, but also works well. CNNs can notoriously take a very long time to process data, but with Adam as the optimizer the processing time was relatively short. If you are interested in learning more about this optimizer, see the link below to the article I used to research it.

References

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
https://medium.com/@danqing/a-practical-guide-to-relu-b83ca804f1f7
https://machinelearningmastery.com/neural-networks-crash-course/
https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/

Leave a comment