Devices & Kernels: How It Gets Fast
A kernel is the device-specific implementation of an op. TensorFlow schedules ops on devices:
- CPU: general-purpose; great for control-heavy tasks.
 - GPU: massive parallel math (matmuls, convs).
 - TPU: matrix math ASIC for large-scale training.
 
TensorFlow picks placements, manages memory copies, and runs kernels in parallel streams when possible.


