MNIST image classification models from scratch

7 min readJan 16, 2022

The MNIST handwritten digit classification problem is a standard dataset used in computer vision . It can be used as the basis for learning and practicing how to develop, evaluate, and use machine learning models for image classification from scratch.

The MNIST dataset is very popular machine learning dataset, consisting of 70000 grayscale images of handwritten digits, of dimensions 28x28.

Downloading the images :

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()print('training image set shape',x_train.shape)print('training label set shape',y_train.shape)print('testing image set shape',x_test.shape)print('testing label set shape',y_test.shape)

Let’s pre-process the images by scaling them and add a channel dimension.

# preprocessing
x_train_scaled, x_test_scaled = x_train / 255.0, x_test / 255.0# Add a channels dimension
x_train_scaled = x_train_scaled[..., tf.newaxis].astype("float32")x_test_scaled = x_test_scaled[..., tf.newaxis].astype("float32")

Creating a dataset:

# create training and testing datasets
# shuffling and batchingtrain_ds = tf.data.Dataset.from_tensor_slices((x_train_scaled,y_train)).shuffle(60000).batch(32)test_ds = tf.data.Dataset.from_tensor_slices((x_test_scaled,y_test)).batch(32)

Visualize the dataset:

images , labels = next(iter(train_ds))
index = 0
for image in images:
  index +=1
  plt.subplot(4,8,index)
  plt.imshow(np.reshape(image.numpy(),(28,28)), cmap=plt.cm.binary)
  plt.axis('off')

Building MNIST models

Create a Simple model

Let’s start by build a very simple model that has a single layer with 32 units.
For optimizer we are using popular ‘adam’ optimizer and for loss-function let’s use ‘SparseCategoricalCrossentropy’.

class SimpleModel(tf.keras.Model):
  def __init__(self):
    super(SimpleModel, self).__init__()
    self.flatten = tf.keras.layers.Flatten()
    self.d1 = tf.keras.layers.Dense(32, activation='relu')
    self.output_layer = tf.keras.layers.Dense(10)

  def call(self, x):
    x = self.flatten(x)
    x = self.d1(x)
    x = self.output_layer(x)
    return x

Model architecture :
The first layers is a flatten layer , the purpose of this layer is to flatten the image we feed in to the model. Since our dataset is mnist , the dimension of the image is [ 28 x 28 x 1 ] (the channel dimension is added by us).
Flattening layer takes the image and flattens it into a vector of shape (784 ,)

The second layer is a Dense layer with 32 units. Each of those 32 units get this flattened image vector. The dense layer has ReLU as the activation.
The number of parameters of this layer is (784*32) + 32 .

The output layer has 10-units corresponding to the number of classes and it has no activation.

# let's use sparse categorical cross entopy loss as the loss function and Adam optimizerloss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)optimizer = tf.keras.optimizers.Adam()

Define metrics

# Define metrics to measure the loss and the accuracy of the modeltrain_loss = tf.keras.metrics.Mean(name='train_loss')train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')test_loss = tf.keras.metrics.Mean(name='test_loss')test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

Model Training : Instantiate an instance of model class and train it

# instantiate simple model instance
my_simple_model = SimpleModel()# Use tf.GradientTape to train the model:

def model_training(images, labels):
  with tf.GradientTape() as tape:
    # make predictions using the model
    predictions = my_simple_model(images)
    # compute loss 
    loss = loss_object(labels, predictions)
  # backpropagate to compute the gradients
  gradients = tape.gradient(loss,my_simple_model.trainable_variables)
  # optimizer apply the gradients 
  optimizer.apply_gradients(zip(gradients, my_simple_model.trainable_variables))
  # metrics
  train_loss(loss)
  train_accuracy(labels, predictions)

def model_testing(images, labels):
  with tf.GradientTape() as tape:
    # make prediciton
    predictions = my_simple_model(images)
    # calcualte loss
    loss = loss_object(labels, predictions)
  # metrics
  test_loss(loss)
  test_accuracy(labels, predictions)

Gradient tape records operations for automatic differentiation. Tensorflow can compute the derivative of a function with Gradient tape. The function is expressed with tensorflow ops only.

Weights are updated by updating the partial derivative of the loss with respect to each individual weights. In-order to perform automatic differentiation tensorflow needs to remember what operation happen in what order during the forward pass, so that during the backward pass tensorflow traverses the list of operations in reverse order to compute those gradients.

Gradinet tape is the context manager in which these partial differentiation are calculated

# TRAIN THE MODEL 

NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
  # Reset the metrics at the start of the next epoch
  train_loss.reset_states()
  train_accuracy.reset_states()
  test_loss.reset_states()
  test_accuracy.reset_states()

  for image_batch, label_batch in train_ds:
    model_training(image_batch, label_batch)

  for image_batch, label_batch in test_ds:
    model_testing(image_batch, label_batch)

  print(
  f'Epoch {epoch + 1}, '
  f'Loss: {train_loss.result()}, '
  f'Accuracy: {train_accuracy.result() * 100}, '
  f'Test Loss: {test_loss.result()}, '
  f'Test Accuracy: {test_accuracy.result() * 100}')

Training logs of simple model

Just 5 epochs and we can see that the training accuracy and validation accuracy has reached around 96% , that’s a pretty powerful way to start with.

Create a CNN model

Convolutional Neural Networks, or CNNs, were designed to map image data to an output variable.

They have proven so effective that they are the go-to method for any type of prediction problem involving image data as an input.

CNNs are efficient feature extractors as they can learn location independent features from an image

A Simple CNN Model architecture :

Convolution layer with 32 filters and kernel size [ 3*3 ]
Max pooling layer with pool-size (2, 2) and stride 2
Flattening layer
Dense layer with 32 units with ReLU activation
Output Dense layer with 10 units — no activation

class CNNModel(tf.keras.Model):
  def __init__(self):
    super(CNNModel, self).__init__()

    self.cnn = tf.keras.layers.Conv2D(32,3, input_shape=(28,28,1))
    self.maxpool = tf.keras.layers.MaxPool2D()
    self.flatten = tf.keras.layers.Flatten()
    self.d1 = tf.keras.layers.Dense(32, activation='relu')
    self.output_layer = tf.keras.layers.Dense(10)

  def call(self, x):

    x = self.cnn(x)
    x = self.maxpool(x)
    x = self.flatten(x)
    x = self.d1(x)
    x = self.output_layer(x)
    return x

For optimizer we are using popular ‘adam’ optimizer and for loss-function the ‘SparseCategoricalCrossentropy’. same as before.

# LOSS FUCNTION AND OPTIMIZER
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Define metrics to measure the loss and the accuracy of the model

train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

Training and testing utility are also very much the same

# Instantiate the cnn model
cnn = CNNModel()# training utility

def train_cnn(images, labels):
  with tf.GradientTape() as tape:
    # make predicitons
    predictions = cnn(images)
    # compute loss
    loss = loss_object(labels, predictions)
  # backpropagation
  gradients = tape.gradient(loss, cnn.trainable_variables)
  # apply the gradients using the optimizer
  optimizer.apply_gradients(zip(gradients, cnn.trainable_variables))
  # metrics
  train_loss(loss)
  train_accuracy(labels, predictions)
  
# testing utilitydef test_cnn(images, labels):
  with tf.GradientTape() as tape:
    # make predicitons
    predictions = cnn(images)
    # compute loss
    loss = loss_object(labels, predictions)
  # metrics
  test_loss(loss)
  test_accuracy(labels, predictions)

Training a CNN model

# TRAIN THE MODEL 

NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
  # Reset the metrics at the start of the next epoch
  train_loss.reset_states()
  train_accuracy.reset_states()
  test_loss.reset_states()
  test_accuracy.reset_states()

  for image_batch, label_batch in train_ds:
    train_cnn(image_batch, label_batch)

  for image_batch, label_batch in test_ds:
    test_cnn(image_batch, label_batch)

  print(
  f'Epoch {epoch + 1}, '
  f'Loss: {train_loss.result()}, '
  f'Accuracy: {train_accuracy.result() * 100}, '
  f'Test Loss: {test_loss.result()}, '
  f'Test Accuracy: {test_accuracy.result() * 100}')

That’s awesome we were able to hit 98% accuracy.

Embedding layers

An embedding is a dense vector of floating point values (the length of the vector is a parameter you specify). Instead of specifying the values for the embedding manually, they are trainable parameters (weights learned by the model during training, in the same way a model learns weights for a dense layer).

Can we use embedding layers to learn mnist digits ?

Yes !

Let’s create a model with Embedding layer that learns the mnist image pixel by pixel.

First of all we don’t need to scale the image

# Dataset for training and testingtrain_data = tf.data.Dataset.from_tensor_slices((x_train,y_train)).shuffle(1000).batch(32)
test_data = tf.data.Dataset.from_tensor_slices((x_test,y_test)).batch(32)

Simple model with embedding layer — architecture

Flatten layer
Embedding layer with input dimension 784 and output dimension 3
Flatten layer
Dense layer with 32 units with ReLU activation
Output layer with 10 units

we can use Subclassing API or keras Sequential API , both are good choices but the Sequential API is more easy to setup simple models.

Let’s create models using both and wrap up this tutorial.

# Simple model that uses an Embedding layer
class EmbeddingModel(tf.keras.Model):
  def __init__(self):
    super(EmbeddingModel, self).__init__()

    self.flatten = tf.keras.layers.Flatten()
    self.embedding = tf.keras.layers.Embedding(input_dim=784, output_dim=3)
    self.d1 = tf.keras.layers.Dense(32, activation='relu')
    self.output_layer = tf.keras.layers.Dense(10)

  def call(self, x):
    x = self.flatten(x)
    x = self.embedding(x)
    x = self.flatten(x)
    x = self.d1(x)
    x = self.output_layer(x)
    return x# LOSS FUCNTION AND OPTIMIZER
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Define metrics to measure the loss and the accuracy of the model

train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

# Instantiate the cnn model
embed_model = EmbeddingModel()# training utility

def train_embeddings(images, labels):
  with tf.GradientTape() as tape:
    # make predicitons
    predictions = embed_model(images)
    # compute loss
    loss = loss_object(labels, predictions)
  # backpropagation
  gradients = tape.gradient(loss, embed_model.trainable_variables)
  # apply the gradients using the optimizer
  optimizer.apply_gradients(zip(gradients, embed_model.trainable_variables))
  # metrics
  train_loss(loss)
  train_accuracy(labels, predictions)
  
# testing utility
def test_embeddings(images, labels):
  with tf.GradientTape() as tape:
    # make predicitons
    predictions = embed_model(images)
    # compute loss
    loss = loss_object(labels, predictions)
  # metrics
  test_loss(loss)
  test_accuracy(labels, predictions)# TRAIN THE MODEL 

NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
  # Reset the metrics at the start of the next epoch
  train_loss.reset_states()
  train_accuracy.reset_states()
  test_loss.reset_states()
  test_accuracy.reset_states()

  for image_batch, label_batch in train_data:
    train_embeddings(image_batch, label_batch)

  for image_batch, label_batch in test_data:
    test_embeddings(image_batch, label_batch)

  print(
  f'Epoch {epoch + 1}, '
  f'Loss: {train_loss.result()}, '
  f'Accuracy: {train_accuracy.result() * 100}, '
  f'Test Loss: {test_loss.result()}, '
  f'Test Accuracy: {test_accuracy.result() * 100}')

Model with Embedding layer training logs

With Sequential API

# Create a model
model = tf.keras.Sequential([
                             tf.keras.layers.Flatten(),
                             tf.keras.layers.Embedding(input_dim=784, output_dim=3),
                             tf.keras.layers.Flatten(),
                             tf.keras.layers.Dense(32,'relu'),
                             tf.keras.layers.Dense(10)

])# Compile the model
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])# Train the model
model.fit(train_data , epochs=3)

https://github.com/Ajay-user/ML-DL-RL-repo/blob/master/Image%20Classification/Mnist_Model_from_scratch.ipynb — Github repository

GitHub repository
What’s next ?
Logistic regression in TF
Logistic regression in PyTorch

MNIST image classification models from scratch

Building MNIST models

Create a Simple model

Create a CNN model

Embedding layers

Written by Ajay krishnan

No responses yet