Hands-On Code: Scaled Dot-Product Attention
Below is a tiny runnable demo of scaled dot-product attention for one attention head, using only the standard library.
It:
- Computes attention weights from Q and K.
- Applies softmax with scaling.
- Uses those weights to mix the values V.
xxxxxxxxxx72
main()# file: scaled_dot_attention.py# Minimal scaled dot-product attention.# Only standard library. Run with `python scaled_dot_attention.py`.βimport mathimport randomβdef softmax(xs): # subtract max for numerical stability m = max(xs) exps = [math.exp(x - m) for x in xs] s = sum(exps) return [e / s for e in exps]βdef matmul(a, b): # a: [n x d], b: [d x m] => [n x m] n = len(a) d = len(a[0]) m = len(b[0]) out = [[0.0]*m for _ in range(n)] for i in range(n): for j in range(m): s = 0.0 for k in range(d): s += a[i][k] * b[k][j] out[i][j] = s return outβdef transpose(m):OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment


