Backward Propagation
We have already learned about the gradient descent algorithm in lesson 4. We will discuss the same and apply it to the neural networks backpropagation method. We already have the NeuralNet
class along with other layers. All we need to do is add a new method called backward
to each of them so that it is possible to do backward propagation to the network.
What will the backward
method do in each case? The backward method will output the gradient of the input, with respect to W. The input we just mentioned is not X
, it is the gradient we got from another layer.
Remember, gradient descendent algorithm and backward propagation are different algorithms. The first one is used only to get the gradient of the loss with respect to W. The second one is responsible for updating the weights using that gradient taken from the first algorithm.
You will understand this better if you take a look at the following equations. Nerd Alert! Calculus Ahead.
$
After calculating gradients of all the layers and activation functions using the above equations, we can update the weights via:
Let us implement this in python. We are only implementing it on the LinearLayer class, sigmoid, and relu activation functions. You can look at the derivative of other activation functions here.
1class SigmoidActivation():
2 # ...
3 def backward(self, A):
4 # Just the derivative
5 return A*(1-A)
6
7class ReLUActivation():
8 # ...
9 def backward(self, A):
10 A = A.copy()
11 A[A <= 0] = 0;
12 A *= self.a
13 return A;
14
15class LinearLayer():
16 # ...
17 def backward(dA_curr, lr=0.001):
18 m = A_prev.shape[1]
19
20 dZ_curr = self.activation.backward(dA_curr, self.Z_curr)
21 dW_curr = np.dot(dZ_curr, self.A_prev.T) / m
22 db_curr = np.sum(dZ_curr, axis=1, keepdims=True) / m
23 dA_prev = np.dot(self.W.T, dZ_curr)
24
25 self.W -= lr * dW_curr
26 self.b -= lr * db_curr
27
28 return dA_prev
Now that we have the method backward
for all classes, we can implement the fit
method in our NeuralNet
class.
1class NeuralNet():
2 # ...
3 def fit(X, y, lr=0.001, epoch=1):
4 for i in range(epoch):
5 for layer in self.layers:
6 X = layer.forward(X)
7
8 dA_curr = categorical_crossentropy(y, X)
9
10 for layer in reversed(self.layers):
11 dA_curr, = layer.backward(dA_curr, lr=self.lr)
Now the only work left is to instantiate and train the model. We will train it on the iris dataset we discussed in the Scikit-learn chapter.
1from sklearn.datasets import load_iris
2from sklearn.metrics import accuracy_score # To calculate accuracy
3
4iris = load_iris()
5
6net = NeuralNet([
7 LinearLayer(4, 10, activation=ReLUActivation()),
8 LinearLayer(10, 5, activation=ReLUActivation()),
9 LinearLayer(5, 3, activation=SoftMaxActivation()),
10])
11
12# We do preprocessing like the last lesson
13X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=42)
14
15net.fit(X_train, y_train)
16pred = net.predict(X_test)
17print(accuracy_score(pred, y_test))