Linear Layer
You might be thinking that we have to loop through all the features and then multiply them with the corresponding weights to get the output. But this will take forever for a very simple task. Remember, there could be thousands of features in an input (think of an image of size 4000*2000, then the number of pixels is 8000000). And this is only for one neuron. A modern ANN contains hundreds of such neurons.
The power of parallel processing with TPUs and GPUs has made this computation very fast and efficient. Graphics Processing Units (GPU) and Tensor Processing Units (TPU) can do matrix manipulation almost as fast as the CPU does simple arithmetics between two numbers. So we need to convert the above equations into matrix algebra.
Think of the input as a matrix where each column represents a feature and each row represents a sample.

Now we need to multiply all the features by the same weight for all samples. So we need to multiply each column by a value. If we consider a vector of weights of the same size as the number of features in input, then we can do matrix multiplication among them to get the desired result along with the summation.

But what about the BIAS we talked about??
The bias can also be added very easily in this matrix multiplication. That bias is appended to the weight vector in the first place. And a column (like a new feature) of 1 is appended in the first of the matrix .

We have successfully understood the most basic part of deep learning, a layer of many unit cells. This is known as a linear layer. In deep learning, there will be a lot of layers joined together. Let's see how we can implement it in python.
1class LinearLayer():
2 # We will initialize W and b. They should be very small, so multiplying by 0.1
3 def __init__(self, input_size, output_size, activation=None, name=None):
4 self.W = np.random.randn(output_size, input_size) * 0.1
5 self.b = np.random.randn(output_size, 1) * 0.1
6
7 # Activation function are in the next section
8 self.activation = activation
9 if not activation:
10 self.activation = IdentityActivation()
11
12 # The is the prediction of the layer.
13 def forward(X):
14 self.A_prev = X.copy()
15 self.Z_curr = np.dot(self.W, X) + self.b
16 self.A_curr = self.activation.forward(self.Z_curr)
17 return self.A_curr.copy()
18
19## For our test purpose
20if __name__ = '__main__':
21 ll1 = LinearLayer(4, 1)
22 print(ll1.forward([
23 [1, 2, 3, 4]
24 ])