Mark As Completed Discussion

Devices & Kernels: How It Gets Fast

A kernel is the device-specific implementation of an op. TensorFlow schedules ops on devices:

  • CPU: general-purpose; great for control-heavy tasks.
  • GPU: massive parallel math (matmuls, convs).
  • TPU: matrix math ASIC for large-scale training.

TensorFlow picks placements, manages memory copies, and runs kernels in parallel streams when possible.