Mark As Completed Discussion

Backward Propagation

We have already learned about the gradient descent algorithm in lesson 4. We will discuss the same and apply it to the neural networks backpropagation method. We already have the NeuralNet class along with other layers. All we need to do is add a new method called backward to each of them so that it is possible to do backward propagation to the network.

What will the backward method do in each case? The backward method will output the gradient of the input, with respect to W. The input we just mentioned is not X, it is the gradient we got from another layer.

Remember, gradient descendent algorithm and backward propagation are different algorithms. The first one is used only to get the gradient of the loss with respect to W. The second one is responsible for updating the weights using that gradient taken from the first algorithm.

You will understand this better if you take a look at the following equations. Nerd Alert! Calculus Ahead.

dW[l]=LW[l]=1mdZ[l]A[l1]T

$db[l]=Lb[l]=1mdZ[l](l)$

dA[l1]=LA[l1]=W[l]TdZ[l]

dZ[l]=dA[l]g(Z[l])

After calculating gradients of all the layers and activation functions using the above equations, we can update the weights via:

W[l]=W[l]αdW[l]

b[l]=b[l]αdb[l]

Let us implement this in python. We are only implementing it on the LinearLayer class, sigmoid, and relu activation functions. You can look at the derivative of other activation functions here.

PYTHON
1class SigmoidActivation():
2    # ...
3    def backward(self, A):
4        # Just the derivative
5        return A*(1-A)
6
7class ReLUActivation():
8    # ...
9    def backward(self, A):
10        A = A.copy()
11        A[A <= 0] = 0;
12        A *= self.a
13        return A;
14
15class LinearLayer():
16    # ...
17    def backward(dA_curr, lr=0.001):
18        m = A_prev.shape[1]
19        
20        dZ_curr = self.activation.backward(dA_curr, self.Z_curr)
21        dW_curr = np.dot(dZ_curr, self.A_prev.T) / m
22        db_curr = np.sum(dZ_curr, axis=1, keepdims=True) / m
23        dA_prev = np.dot(self.W.T, dZ_curr)
24
25        self.W -= lr * dW_curr
26        self.b -= lr * db_curr
27
28        return dA_prev

Now that we have the method backward for all classes, we can implement the fit method in our NeuralNet class.

PYTHON
1class NeuralNet():
2    # ...
3    def fit(X, y, lr=0.001, epoch=1):
4        for i in range(epoch):
5            for layer in self.layers:
6                X = layer.forward(X)
7
8            dA_curr = categorical_crossentropy(y, X)
9
10            for layer in reversed(self.layers):
11                dA_curr, = layer.backward(dA_curr, lr=self.lr)

Now the only work left is to instantiate and train the model. We will train it on the iris dataset we discussed in the Scikit-learn chapter.

PYTHON
1from sklearn.datasets import load_iris
2from sklearn.metrics import accuracy_score # To calculate accuracy
3
4iris = load_iris()
5
6net = NeuralNet([
7    LinearLayer(4, 10, activation=ReLUActivation()),
8    LinearLayer(10, 5, activation=ReLUActivation()),
9    LinearLayer(5, 3, activation=SoftMaxActivation()),
10])
11
12# We do preprocessing like the last lesson
13X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=42)
14
15net.fit(X_train, y_train)
16pred = net.predict(X_test)
17print(accuracy_score(pred, y_test))