Convolutional Neural Network (CNN)

6 min readMar 23, 2022

For learning location independent patterns in the image

CNN is a type of artificial neural network used in image recognition and processing that is specifically designed to process pixel data. Convolutional neural networks use multiple filters to find image features that will allow for pattern recognition. In the CNN scheme there are many kernels responsible for extracting these features. Today lets study the basics.

We all are familiar with a block, consisting of the convolutional layer with ReLU activation, and the maximum pooling layer. What are these ?
Why are we using this convolutional layers ?
Can’t MLPs do the job for us?
What are the challenges in the image domain ?

Traditional machine learning approach consist of fully connected dense layers. We flatten the inputs and then feed them to the this fully connected network to get an output. For an example Multi Layer Perceptron (MLP).

Traditional machine learning methods do not handle translations well. Comparing feature vectors with straight line distance work for structured data. But for unstructured data, the straight line distance won’t work well.

Suppose : if someone showed us an image of cat and a rotated version of the same image , humans can recognize them both as the same image of a cat.

In the case of computers, our traditional method of comparing pixel to pixel is not robust to these translations. Computers sees different pixels in the corresponding locations when we give it an image and a flipped version of the same , so it thinks that these are two different images.

IMAGE → FLATTEN → MLP → output class

Now the idea is to make our model robust against the translations like rotation, shifting , zooming etc…
The object in the images are often the same under spatial translations like rotation , zooming etc…

CNN was the solution , a powerful model that works great in the image domain. Our idea was to decouple filters from specific locations in the image.

CNN learn a filter by sliding it through the image, whenever the filter aligns with a pattern in the image it give us a strong correlation. In the early layers CNN filters learn very basic patterns but as we go deep the CNN filters will learn more complex patterns. Essentially CNN learns a hierarchy of features.

The feature extraction

Filter an image for a particular feature (convolution)
Detect that feature within the filtered image (ReLU)
Condense the image to enhance the features (maximum pooling)

A convolutional layer carries out the filtering step.

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Conv2D(filters=64, kernel_size=3), # activation is None
    # More layers follow
])

The weights of a convolution layer are learn’t during the model training, we can call the weigths kernels. A kernel operates by scanning over an image and producing a weighted sum of pixel values.

https://cs231n.github.io/convolutional-networks/

The kernels in a convolutional layer determine what patterns to extract from the image. During training, a convolutional layer tries to learn what features it needs to extract in order to solve the problem at hand. This means finding the best values for its kernels.

After filtering, the feature maps pass through the activation function.

The Action function are essential because without them everything can be collapsed back to a single layer.

The Neural networks combines layers of perceptrons making them more powerful , however without non linear activation functions all the additional layers can be compressed back down to a single linear layer and there is no additional benefit

The are different activation functions to learn about, but lets focus on ReLU for now. Link to activation functions tutorial : Please refer this link for more on activation functions

The ReLU activation says that negative values are not important , so force them to Zero but let the positive values as it is.

When we apply this ReLU (non linear activation function ) to our extracted features we can see the patterns getting isolated.

Let’s see this in action: Lets see how to detect edges in an image to better understand the concepts we talked about.

Edge detection from scratch

# download an image
file = tf.keras.utils.get_file('car',origin=url)# reading the image
raw = tf.io.read_file(file)
# decode
img = tf.image.decode_jpeg(raw)# view the image
plt.imshow(img);

Let’s define an edge detection kernel

kernel = tf.constant([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1],
])

What will we do of the image is color , ie has three color channels ? We can stack three kernels along the depth dimesion

kernel3d = tf.stack([kernel,kernel,kernel],axis=2)

tf.nn.conv2d

Computes a 2-D convolution given input and 4-D filter/kernel tensors.

Wait what ?? 4-D kernel !

The input tensor may have rank 4 or higher , for example an input tensor of shape : batch_shape + [ in_height, in_width, in_channels ]

Kernel / Filter that slides over the image computing weighted sum , for example tensor of shape [ filter_height, filter_width, in_channels, out_channels ]

# input tensor of shape batch_shape + [in_height, in_width, in_channels]
# convert to float dtype
input_img = tf.image.convert_image_dtype(img, tf.float32)
# add batch dimension to the image
input_img = tf.expand_dims(input_img, axis=0)

# filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]
input_kernel = tf.reshape(kernel3d, shape=[*kernel.shape,3,1])
input_kernel = tf.cast(input_kernel, dtype=tf.float32)

Convolution

filtered_img = tf.nn.conv2d(input=input_img,
                            filters=input_kernel,
                            strides=1,
                            padding='SAME')plt.figure(figsize=(20,30))
plt.imshow(tf.squeeze(filtered_img))

Now lets apply activation function to isolate the patterns extacted

# lets apply ReLU
image_detect = tf.nn.relu(filtered_img)

plt.figure(figsize=(20,30))
plt.imshow(tf.squeeze(image_detect));

Convolution operation and ReLU activation

That’s it folks . Today was all about convolution 101. The idea is to learn location independent filters that help us extract patterns in the image.
We learned the big picture behind powerful image models and we workout an edge detector from scratch.
Stay tuned for more !

Before wrapping up let’s build an utility that will help you detect edges in a image. Here is a link to the notebooks : [ Githublink ]

def edge_detection(image_url,
                   kernel,
                   is_color_image=True,
                   figsize=(10,10)):
  
  # download the image
  file = tf.keras.utils.get_file(origin=url)
  # reading the image
  raw = tf.io.read_file(file)
  # decode
  if not is_color_image:
     img = tf.image.decode_jpeg(raw, channels=1) 
  else:
     img = tf.image.decode_jpeg(raw, channels=3)

  # image => input tensor of shape
  # batch_shape + [in_height, in_width, in_channels]
  input_img = tf.expand_dims(img, axis=0)  # convert to float dtype
  input_img = tf.image.convert_image_dtype(input_img, tf.float32)
    
  # kernel => input tensor of shape 
  # batch_shape + [in_height, in_width, in_channels]
  if not is_color_image:
      input_kernel = kernel 
  else:
      input_kernel = tf.stack([kernel]*3, axis=2)
  
  # filter / kernel tensor of shape 
  # [filter_height, filter_width,   in_channels, out_channels]
  if not is_color_image:
    input_kernel = tf.reshape(input_kernel,
                     shape=[*input_kernel.shape,1,1])
  else:
    input_kernel = tf.reshape(input_kernel,
                     shape=[*input_kernel.shape,1])
  input_kernel = tf.cast(input_kernel, dtype=tf.float32)


  # convolution opereation
  filtered_img = tf.nn.conv2d(input=input_img,
                              filters=input_kernel,
                              strides=1,
                              padding='SAME')

  # applying relu activation
  image_detect = tf.nn.relu(filtered_img)

  # visualization
  plt.figure(figsize=figsize)
  plt.subplot(3,1,1)
  plt.imshow(tf.squeeze(img))
  plt.title('Input image')
  plt.figure(figsize=figsize)
  plt.subplot(3,1,2)
  plt.imshow(tf.squeeze(filtered_img))
  plt.title('Convolution output before applying activation')
  plt.axis('off')  
  plt.subplot(3,1,3)
  plt.imshow(tf.squeeze(image_detect))
  plt.title('Convolution output after applying activation')
  plt.axis('off')


  return filtered_img, image_detect

Here are some cool results ; check out the notebook for more !

# author = https://unsplash.com/@sashafreemind
url = ' ----- paste your image url here ---- 'edge_detection(url, kernel, is_color_image=False, figsize=(20,30))