Hands-On Code: One Encoder Block Forward Pass (Python, stdlib only)
Below is a toy “encoder layer” forward pass. We’ll fake:
- multi-head attention with just one head,
- a feed-forward network,
- residual + layer norm.
This is not a full Transformer, but it mirrors how data flows in a single encoder layer.
xxxxxxxxxx108
main()# file: tiny_encoder_block.py# A super-simplified encoder layer forward pass with:# attention -> add&norm -> feedforward -> add&norm# No external libraries. Run with `python tiny_encoder_block.py`.import mathimport randomdef layer_norm(vec): # simple per-vector layer norm mean = sum(vec)/len(vec) var = sum((x-mean)**2 for x in vec)/len(vec) eps = 1e-6 return [(x-mean)/math.sqrt(var+eps) for x in vec]def linear(x, W, b): # x: [d_in], W: [d_out x d_in], b: [d_out] -> [d_out] out = [] for row, bias in zip(W, b): s = 0.0 for xi, wi in zip(x, row): s += xi * wi out.append(s + bias) return outdef relu(v): return [max(0.0, x) for x in v]def softmax(xs):OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment

