Making an MNIST-like Database, and (a) Model to Recognise it.

Introduction

After building a few machine learning (ML) models that could recognize handwritten characters from the English alphabet, I realized I was only getting half the experience of creating them.The dataset I had used to make those models had been given to me, and I never needed to preprocess it,or even consider manipulating it while I was making the models. In doing this, I had missed out on a huge part of machine learning: gathering the data! Thus, I decided to create my own dataset to use with my new models to experience manipulating the data into a processable form. In order to have something to compare it to, and to test my models against a larger database to prove their efficacy, I made my dataset imitate the MNIST database.

 I began crafting my data set by taking examples of my own handwritten characters, as well as some from my peers. I edited the images and permuted them ,essentially changing shape of the images so the pixel information is different, in order to grow the sample size. I ended up with around 1000 samples, so around 40 of each character in the training set. In order to reduce the noise of the pictures, I transferred them from RGB color channels to BW color channels. Otherwise the Convolutional Neural Networks (CNN’s) would have to take in a red, blue, and green pixel data; however, with only black and white it is merely the two channels per pixel. After resizing them (making them smaller, from HD resolution to around 28×28) to reduce the data needed for processing, I used Python in combination with PyTorch and Google Colaboratory to make the models and process the data. To test the accuracy of these models, I created a training set, and a much smaller validation set. 

To avoid a lengthy, and convoluted explanation, my goal is to give the intuition behind this project. At the end of this article I’ve included some important terms that may help with your understanding of this content.

What is a Convolutional Neural Network?

In this project Convolutional Neural Networks (CNNs) are used to process images, recognize if the characteristics of one image are similar to that of another image, and label them accordingly. This is the goal of computer vision. 

The advantage of using a CNN over another neural net is that it can more easily assign importance to images, or aspects of images, that go beyond just the array of pixels that an image is. It does this by essentially analyzing an image, finding patterns, and then assigning importance to these. So, if there was a certain grouping of pixels, the CNN could record this pattern and, with the validation set, could perhaps recognize that this grouping is a car, for example. 

The truly great thing about CNNs though, is that they don’t simply process an image in the way that a Feed Forward Network would. They are also able to recognize the spatial and temporal traits of an image, like shading and other intricacies that make the image unique, and this leads to better identification of that image. This allows for more meaningful weights that seemingly understand the nature of what makes an image of a dog look like a dog. 

What do the Layers of a Convolutional Neural Network actually do?

The layers of the CNN are where the actual Convolution occurs. For example: Convolution takes, lets say, a 5×5 of pixels and using a filter of size 3×3 converts the 5×5 into a 3×3 by aggregating the values of certain pixels, as determined by the filter while scraping over the 5×5. This cuts the information that needs to be processed down considerably, but one must be careful in choosing the filter size as too much convolution may lose important image data. This 3×3 created after using the filter is then used to analyze the data. 

While the convolution itself is a more complicated process, this is the most basic interpretation of its function to build one’s understanding of it. Below, is a visual example of this situation.

What is a Multilayer Perceptron?

The idea of using a Multilayer Perceptron (MLP), sometimes referred to as a Feed-Forward Network (FFN/FFNN), in computer vision is relatively simple For this reason it is considered one of the faster models to train and use, but is often found to be somewhat inaccurate because of the way the model is trained. 

The perceptrons within the MLP take an image, converts it into only its pixels, and analyzes these to determine weights to place on certain areas so that images can be differentiated using these weights. The upside to using an MLP is that it is quick, and gives results significantly faster; however, it is often less accurate, and more preprocessing of the data is required. Therefore, while it is not ideal for use in computer vision, I created this model purely for comparing and contrasting the other models I made. 

Models Created

In total, I created three models to test the efficiency of the two-layer convolutional layer model. Each of these models was trained using 20 epochs and a learning rate of 0.001. After training a model on the test set, it would then try to classify a given set of (control) data. With these models I used accuracy as the metric to assess them. Since the validation set already had labels associated with each image,  we would know the amount of images that each model was able to classify correctly, and we could put that number over the total amount of images to calculate the accuracy.

Conclusions

The real star of this project was the two-layer convolutional neural net (TLCNN), which was the bulk of the project, and what I really wanted to test. I created the MLPs and the one-layer convolutional neural net (OLCNN) in order to get metrics to compare the TLCNN. The reasoning behind this is that training a CNN takes a longer time to train than the MLP does. 

So, naturally you would want to collect evidence that the extended training time is worth it. When the models were all created and tested against the validation set from my own data, the accuracy of the TLCNN was consistently  ~93%, the accuracy of the OLCNN was consistently ~83%, and the accuracy of the MLP fell in the range of ~50-63%. This data thus proves my hypothesis that a TLCNN would classify images more accurately than either a OLCNN or a MLP. 

It is worth noting as an aside that I also tested my models on the vastly larger MNIST database, both the TLCNN and OLCNN were able to achieve an accuracy of 98%, while the MLP was around 95% accuracy. This data shows that the TLCNN could be more valuable when using a smaller dataset, which in itself would be an interesting theory to experiment with more. 

Moving Forward

In the future I would like to compare the accuracy of a three-layer CNN against the two-layer version. Additionally, I would  like to see how adjusting hyperparameters such as optimizers, activation functions, learning rates, and number of epochs affects these models.

Ultimately, I would  like to create a program that uses my TLCNN to classify images drawn by a person in real time. Imagine a window that you could draw a letter in, submit it to, and the program would then use the model to accurately classify the letters drawn by the person. 

Additional Important Terms

ReLU

The activation algorithm used in the convolutional layers. ReLU stands for rectified linear unit. ReLU is responsible for changing, or transforming the weights in the model that it is attached to. The function itself is piecewise, meaning it has two parts, and is described by two different equations. This sounds a lot more complicated than it really is, since ReLU really just returns the input weight if it is positive, and returns 0 otherwise. 

PyTorch

PyTorch is the framework that I used to make my models. While I could have created this stuff on my own, this project was created in a limited timeline, and I decided it would be best to use a framework.

MNIST Database

A large database of handwritten lowercase letters intended for use with ML models. I used this lightly in my models to be certain that they would also work on a large dataset, but for most of this project I used my own database.

Adam Optimizer

This optimizer is based off of stochastic gradient descent.. I decided to use Adam because it is quick, but also works well. CNNs can notoriously take a very long time to process data, but with Adam as the optimizer the processing time was relatively short. If you are interested in learning more about this optimizer, see the link below to the article I used to research it.

References

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
https://medium.com/@danqing/a-practical-guide-to-relu-b83ca804f1f7
https://machinelearningmastery.com/neural-networks-crash-course/
https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/

ShelterCare Admin Portal

Introduction

This project started from a Hack-a-thon that a few of my friends and I attended in April of 2019. We were tasked with making an application that could be used cross platform on both computers and android phones. Given that android phones and computers would always be connected to the internet, my team and I decided that it would be best to create a web application. Our MVP was little more than a web page that could take in data and then place it into a google spreadsheet, so that entries could be viewed in real time by multiple ShelterCare admins. We used a combination of HTML, CSS, and JavaScript to drive it, and upon adding an entry the spreadsheet would generate and send an email alerting the relevant people that an entry had been added. Once we had shown ShelterCare the MVP that we had designed, they were eager for us to create a full fledged product that they could use in their day-to-day tasking. We agreed to come back and finish the product over the summer as we were all preoccupied with college, and would not be able to take on the responsibility until after Spring term was over.

Responsibilities

Admin Portal

The web page evolved into an admin portal that could change categories and select-ables inside of the android application and had items that were stored in the Firebase database. I headed, and designed the web page, and all of its functional parts using react in order to achieve dynamic tables that could be edited, queried, and viewed in real time. We ended up having multiple tables that had query-able items, and these items could be added or removed using only one click. Once any of these changes were enacted, they were immediately propagated up to the Firebase non-relational database and as these changes were instant, they would also have instant changes on the android application. Once the web page itself was formatted to the liking of the ShelterCare admins, we decided that we should also work in authentication, the obvious choice was using google auth, as all of the employees of ShelterCare were given a company gmail. So I designed the login page, and the drivers to get from the login page, to then check if the email was an admin, and then load the actual admin page. The last step was to make an admin guide, even though the admin portal was easy enough to use, we decided it would be best to have a short user guide, as we wanted to have something formal that new admins could read in order to get acquainted with the admin portal web site.

Spreadsheet

The automation of the spreadsheet started as being very basic, merely using google scripts to resize columns when a new row was added. I found that I was also able to timestamp entries when they were added, or moved. Next I worked on being able to automate the process of being able to archive when certain tasked had been marked as complete. Then I could check if the issues themselves were resolved and move them to an archive so that they main spreadsheet didn’t get overcrowded with entries that were no longer relevant. Using timers and triggers I was able to get the spreadsheet to move entries to an archive, once a day at the end of the work day, and was able to have the spreadsheet reformat itself whenever an entry was added to ensure maximum visibility of all the items within the spreadsheet.

Technologies Used

  • React.js
    • This is the backbone of the Admin portal, all of the working parts of this website are a combination of React, its many libraries, and JSX.
  • CSS/Bootstrap
    • We didn’t want to waste a lot of time on the styling of the web pages, as the ShelterCare people assured us that they were looking for a functional product, and not necessarily a beautiful one, but with Bootstrap we were able to give them something that was indeed functional first, but also had a modern look to it.
  • HTML5
    • This is the backbone of any website, and was used as such. While we mainly used JSX within this project, it is really just HTML that is used by react to create elements.
  • Firebase
    • Firebase is what we decided to use for the domain hosting of the website, and the storage of data for use inside the web pages.