Devices & Kernels: How It Gets Fast
A kernel
is the device-specific implementation of an op
. TensorFlow schedules ops on devices:
- CPU: general-purpose; great for control-heavy tasks.
- GPU: massive parallel math (matmuls, convs).
- TPU: matrix math ASIC for large-scale training.
TensorFlow picks placements, manages memory copies, and runs kernels in parallel streams when possible.