Skip to content

Homework 4 - Mini-batch Gradient Descent and Neural Networks

This homework reviews mini-batch gradient descent, epochs, and explores feature detection in neural networks trained on MNIST digit images.

Problem 1: Mini-batch Gradient Descent and Training

Section titled “Problem 1: Mini-batch Gradient Descent and Training”

In this exercise, we will review the ending portion of our neural networks unit.

(a) Explain what is mini-batch gradient descent and how is this different from (regular/batch) gradient descent.

Solution

Mini-batch gradient descent is a technique that breaks the training data into smaller batches and performs weight and parameter updates after processing each batch. The difference from regular/batch gradient descent where updates happen only after processing the entire training dataset, mini-batch gradient descent provides updates after every batch or subset from the original training data.

(b) What is an epoch in machine learning?

Solution

An epoch is a single complete pass through the entire dataset. When using mini-batch gradient descent, multiple epochs are used to find the most optimal weights and parameters.

(c) Why do we train over multiple epochs?

Solution

We use multiple epochs during training because a single pass through the dataset is usually insufficient. Multiple passes through the dataset allow us get better parameter values in order to get a lower cost function value.

(d) Let’s suppose our mini-batches are of size 20. If our training data has 50,000 images, then how many updates occur per each epoch?

Solution

To get the number of updates per epoch, we divide the total number of images by the batch size. With 50,000 images and a batch size of 20, we use 50,00020=2500\frac{50,000}{20} = 2500 batch updates per epoch.

Problem 2: Feature Detection in Neural Networks

Section titled “Problem 2: Feature Detection in Neural Networks”

Suppose we have a neural network with 784 neurons in the input layer, 30 neurons in the hidden layer, and 10 neurons in the output layer. The neural network takes the grayscale values of a 28×2828 \times 28 image of a digit and attempts to recognize the digit. Allow us to focus on one of the neurons in the hidden layer which detects if the following feature is present:

MNIST feature showing curved line

(a) For this neuron, which of the three images below will yield the highest activation?

Digits 0, 2, and 4 from MNIST dataset Digits 0, 2, and 4 from MNIST dataset Digits 0, 2, and 4 from MNIST dataset

Solution

The image with the number 2 will yield the highest activation.

(b) For this neuron, which of the three images below will yield the second highest activation?

Digits 0, 2, and 4 from MNIST dataset Digits 0, 2, and 4 from MNIST dataset Digits 0, 2, and 4 from MNIST dataset

Solution

The image of the number 0 will yield the second highest activation.

(c) For this neuron, which of the three images below will yield the lowest activation?

Digits 0, 2, and 4 from MNIST dataset Digits 0, 2, and 4 from MNIST dataset Digits 0, 2, and 4 from MNIST dataset

Solution

The image of the number 4 will yield the lowest activation.