MNIST image classification models from scratch

Ajay krishnan
7 min readJan 16, 2022

--

GitHub repository

The MNIST handwritten digit classification problem is a standard dataset used in computer vision . It can be used as the basis for learning and practicing how to develop, evaluate, and use machine learning models for image classification from scratch.

The MNIST dataset is very popular machine learning dataset, consisting of 70000 grayscale images of handwritten digits, of dimensions 28x28.

Downloading the images :

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()print('training image set shape',x_train.shape)print('training label set shape',y_train.shape)print('testing image set shape',x_test.shape)print('testing label set shape',y_test.shape)
MNIST dataset

Let’s pre-process the images by scaling them and add a channel dimension.

# preprocessing
x_train_scaled, x_test_scaled = x_train / 255.0, x_test / 255.0
# Add a channels dimension
x_train_scaled = x_train_scaled[..., tf.newaxis].astype("float32")
x_test_scaled = x_test_scaled[..., tf.newaxis].astype("float32")

Creating a dataset:

# create training and testing datasets
# shuffling and batching
train_ds = tf.data.Dataset.from_tensor_slices((x_train_scaled,y_train)).shuffle(60000).batch(32)test_ds = tf.data.Dataset.from_tensor_slices((x_test_scaled,y_test)).batch(32)

Visualize the dataset:

images , labels = next(iter(train_ds))
index = 0
for image in images:
index +=1
plt.subplot(4,8,index)
plt.imshow(np.reshape(image.numpy(),(28,28)), cmap=plt.cm.binary)
plt.axis('off')
MNIST images visualized

Building MNIST models

Create a Simple model

Let’s start by build a very simple model that has a single layer with 32 units.
For optimizer we are using popular ‘adam’ optimizer and for loss-function let’s use ‘SparseCategoricalCrossentropy’.

class SimpleModel(tf.keras.Model):
def __init__(self):
super(SimpleModel, self).__init__()
self.flatten = tf.keras.layers.Flatten()
self.d1 = tf.keras.layers.Dense(32, activation='relu')
self.output_layer = tf.keras.layers.Dense(10)

def call(self, x):
x = self.flatten(x)
x = self.d1(x)
x = self.output_layer(x)
return x

Model architecture :
The first layers is a flatten layer , the purpose of this layer is to flatten the image we feed in to the model. Since our dataset is mnist , the dimension of the image is [ 28 x 28 x 1 ] (the channel dimension is added by us).
Flattening layer takes the image and flattens it into a vector of shape (784 ,)

The second layer is a Dense layer with 32 units. Each of those 32 units get this flattened image vector. The dense layer has ReLU as the activation.
The number of parameters of this layer is (784*32) + 32 .

The output layer has 10-units corresponding to the number of classes and it has no activation.

# let's use sparse categorical cross entopy loss as the loss function and Adam optimizerloss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)optimizer = tf.keras.optimizers.Adam()

Define metrics

# Define metrics to measure the loss and the accuracy of the modeltrain_loss = tf.keras.metrics.Mean(name='train_loss')train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')test_loss = tf.keras.metrics.Mean(name='test_loss')test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

Model Training : Instantiate an instance of model class and train it

# instantiate simple model instance
my_simple_model = SimpleModel()
# Use tf.GradientTape to train the model:

def model_training(images, labels):
with tf.GradientTape() as tape:
# make predictions using the model
predictions = my_simple_model(images)
# compute loss
loss = loss_object(labels, predictions)
# backpropagate to compute the gradients
gradients = tape.gradient(loss,my_simple_model.trainable_variables)
# optimizer apply the gradients
optimizer.apply_gradients(zip(gradients, my_simple_model.trainable_variables))
# metrics
train_loss(loss)
train_accuracy(labels, predictions)

def model_testing(images, labels):
with tf.GradientTape() as tape:
# make prediciton
predictions = my_simple_model(images)
# calcualte loss
loss = loss_object(labels, predictions)
# metrics
test_loss(loss)
test_accuracy(labels, predictions)

Gradient tape records operations for automatic differentiation. Tensorflow can compute the derivative of a function with Gradient tape. The function is expressed with tensorflow ops only.

Weights are updated by updating the partial derivative of the loss with respect to each individual weights. In-order to perform automatic differentiation tensorflow needs to remember what operation happen in what order during the forward pass, so that during the backward pass tensorflow traverses the list of operations in reverse order to compute those gradients.

Gradinet tape is the context manager in which these partial differentiation are calculated

# TRAIN THE MODEL 

NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()

for image_batch, label_batch in train_ds:
model_training(image_batch, label_batch)

for image_batch, label_batch in test_ds:
model_testing(image_batch, label_batch)

print(
f'Epoch {epoch + 1}, '
f'Loss: {train_loss.result()}, '
f'Accuracy: {train_accuracy.result() * 100}, '
f'Test Loss: {test_loss.result()}, '
f'Test Accuracy: {test_accuracy.result() * 100}')
Training logs of simple model

Just 5 epochs and we can see that the training accuracy and validation accuracy has reached around 96% , that’s a pretty powerful way to start with.

Create a CNN model

Convolutional Neural Networks, or CNNs, were designed to map image data to an output variable.

They have proven so effective that they are the go-to method for any type of prediction problem involving image data as an input.

CNNs are efficient feature extractors as they can learn location independent features from an image

A Simple CNN Model architecture :

  • Convolution layer with 32 filters and kernel size [ 3*3 ]
  • Max pooling layer with pool-size (2, 2) and stride 2
  • Flattening layer
  • Dense layer with 32 units with ReLU activation
  • Output Dense layer with 10 units — no activation
class CNNModel(tf.keras.Model):
def __init__(self):
super(CNNModel, self).__init__()

self.cnn = tf.keras.layers.Conv2D(32,3, input_shape=(28,28,1))
self.maxpool = tf.keras.layers.MaxPool2D()
self.flatten = tf.keras.layers.Flatten()
self.d1 = tf.keras.layers.Dense(32, activation='relu')
self.output_layer = tf.keras.layers.Dense(10)

def call(self, x):

x = self.cnn(x)
x = self.maxpool(x)
x = self.flatten(x)
x = self.d1(x)
x = self.output_layer(x)
return x

For optimizer we are using popular ‘adam’ optimizer and for loss-function the ‘SparseCategoricalCrossentropy’. same as before.

# LOSS FUCNTION AND OPTIMIZER
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Define metrics to measure the loss and the accuracy of the model

train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

Training and testing utility are also very much the same

# Instantiate the cnn model
cnn = CNNModel()
# training utility

def train_cnn(images, labels):
with tf.GradientTape() as tape:
# make predicitons
predictions = cnn(images)
# compute loss
loss = loss_object(labels, predictions)
# backpropagation
gradients = tape.gradient(loss, cnn.trainable_variables)
# apply the gradients using the optimizer
optimizer.apply_gradients(zip(gradients, cnn.trainable_variables))
# metrics
train_loss(loss)
train_accuracy(labels, predictions)

# testing utility
def test_cnn(images, labels):
with tf.GradientTape() as tape:
# make predicitons
predictions = cnn(images)
# compute loss
loss = loss_object(labels, predictions)
# metrics
test_loss(loss)
test_accuracy(labels, predictions)

Training a CNN model

# TRAIN THE MODEL 

NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()

for image_batch, label_batch in train_ds:
train_cnn(image_batch, label_batch)

for image_batch, label_batch in test_ds:
test_cnn(image_batch, label_batch)

print(
f'Epoch {epoch + 1}, '
f'Loss: {train_loss.result()}, '
f'Accuracy: {train_accuracy.result() * 100}, '
f'Test Loss: {test_loss.result()}, '
f'Test Accuracy: {test_accuracy.result() * 100}')
CNN model training logs

That’s awesome we were able to hit 98% accuracy.

Embedding layers

An embedding is a dense vector of floating point values (the length of the vector is a parameter you specify). Instead of specifying the values for the embedding manually, they are trainable parameters (weights learned by the model during training, in the same way a model learns weights for a dense layer).

Can we use embedding layers to learn mnist digits ?

Yes !

Let’s create a model with Embedding layer that learns the mnist image pixel by pixel.

First of all we don’t need to scale the image

# Dataset for training and testingtrain_data = tf.data.Dataset.from_tensor_slices((x_train,y_train)).shuffle(1000).batch(32)
test_data = tf.data.Dataset.from_tensor_slices((x_test,y_test)).batch(32)

Simple model with embedding layer — architecture

  • Flatten layer
  • Embedding layer with input dimension 784 and output dimension 3
  • Flatten layer
  • Dense layer with 32 units with ReLU activation
  • Output layer with 10 units

we can use Subclassing API or keras Sequential API , both are good choices but the Sequential API is more easy to setup simple models.

Let’s create models using both and wrap up this tutorial.

# Simple model that uses an Embedding layer
class
EmbeddingModel(tf.keras.Model):
def __init__(self):
super(EmbeddingModel, self).__init__()

self.flatten = tf.keras.layers.Flatten()
self.embedding = tf.keras.layers.Embedding(input_dim=784, output_dim=3)
self.d1 = tf.keras.layers.Dense(32, activation='relu')
self.output_layer = tf.keras.layers.Dense(10)

def call(self, x):
x = self.flatten(x)
x = self.embedding(x)
x = self.flatten(x)
x = self.d1(x)
x = self.output_layer(x)
return x
# LOSS FUCNTION AND OPTIMIZER
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Define metrics to measure the loss and the accuracy of the model

train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

# Instantiate the cnn model
embed_model = EmbeddingModel()
# training utility

def train_embeddings(images, labels):
with tf.GradientTape() as tape:
# make predicitons
predictions = embed_model(images)
# compute loss
loss = loss_object(labels, predictions)
# backpropagation
gradients = tape.gradient(loss, embed_model.trainable_variables)
# apply the gradients using the optimizer
optimizer.apply_gradients(zip(gradients, embed_model.trainable_variables))
# metrics
train_loss(loss)
train_accuracy(labels, predictions)

# testing utility
def test_embeddings(images, labels):
with tf.GradientTape() as tape:
# make predicitons
predictions = embed_model(images)
# compute loss
loss = loss_object(labels, predictions)
# metrics
test_loss(loss)
test_accuracy(labels, predictions)
# TRAIN THE MODEL

NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()

for image_batch, label_batch in train_data:
train_embeddings(image_batch, label_batch)

for image_batch, label_batch in test_data:
test_embeddings(image_batch, label_batch)

print(
f'Epoch {epoch + 1}, '
f'Loss: {train_loss.result()}, '
f'Accuracy: {train_accuracy.result() * 100}, '
f'Test Loss: {test_loss.result()}, '
f'Test Accuracy: {test_accuracy.result() * 100}')
Model with Embedding layer training logs

With Sequential API

# Create a model
model = tf.keras.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Embedding(input_dim=784, output_dim=3),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32,'relu'),
tf.keras.layers.Dense(10)

])
# Compile the model
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
# Train the model
model.fit(train_data , epochs=3)
Model with Embedding layer Training Logs
https://github.com/Ajay-user/ML-DL-RL-repo/blob/master/Image%20Classification/Mnist_Model_from_scratch.ipynb
Github repository

GitHub repository
What’s next ?
Logistic regression in TF
Logistic regression in PyTorch

--

--

No responses yet