Mark As Completed Discussion

Measuring Latency and Throughput

Why this matters for HFT engineers (beginner-friendly)

  • In HFT the difference between 1,500 ns and 2,500 ns per tick can change whether your order wins a trade or not. Think of latency like a fast-break in basketball: a small delay can be the difference between an easy layup and a contested shot.
  • Throughput (ops/sec) is how many ticks your system can handle per second — like how many possessions a team can run in a game.

Quick ASCII diagram: where measurement fits in the pipeline

[Market feed NIC] --(packets)--> [Capture / Handler] --(parse)--> [Strategy inner-loop] --(orders)--> [Exchange gateway] ^ | |------ instrument (timestamps) ---|

  • The critical path (where latency matters) is from packet arrival to order emission.
  • We measure: per-event latency (ns) and overall throughput (ops/sec).

Core approaches and tools (what to reach for)

  • Software timers: use std::chrono::steady_clock in C++, time.perf_counter() in Python, System.nanoTime() in Java, clock_gettime(CLOCK_MONOTONIC_RAW) in C, performance.now() in JS. These give you program-side timings.
  • Kernel/hardware timestamps: NICs and kernel support (SO_TIMESTAMPING or PTP). These give lower-level absolute times and remove user-space scheduling jitter.
  • Packet capture: tcpdump -tt -i eth0 -w out.pcap and analyze timestamps with Wireshark. Use hardware timestamping where available.
  • Profilers and counters: perf record / perf stat for CPU metrics and hotspots. perf helps find the hot function you should optimize.

Commands (beginner-safe examples)

  • Capture packets (software timestamps):
    • sudo tcpdump -i eth0 -w feed.pcap
  • Profile CPU to find hotspots:
    • sudo perf record -F 99 -- ./your_binary
    • sudo perf report --stdio
  • Check NIC timestamping capability:
    • ethtool -T eth0

How to interpret measurements (simple rules)

  • Look at percentiles, not just average: p95 and p99 show tail latency which kills HFT performance.
  • Correlate throughput and latency: higher throughput often raises latency (queueing).
  • Watch for long tails caused by GC, page faults, IRQs, or CPU frequency scaling.

Analogy to basketball (keep it intuitive)

  • Average latency = team's average shot time.
  • p99 latency = worst possession in the last 100 possessions (the play that cost you the game).
  • Throughput = possessions per minute.

The supplied C++ example (in the code block) shows a reproducible microbenchmark:

  • It builds deterministic ticks (vector<Tick>) so results are reproducible.
  • Measures per-tick latency (nanoseconds) and computes min, avg, p50, p95, p99, max and ops/sec.
  • Prints SLO breaches for a simple service-level check.

Beginner challenges (try these after running the code)

  • Change ITERATIONS to 10000 and 500000. How do ops/sec and p99 change?
  • Toggle the heavy boolean to true to simulate a slower inner loop (like an unoptimized Python hotspot migrated to C++). What happens to throughput?
  • Replace the synthetic price generator with a replay from CSV: read timestamps and prices into ticks and rerun the benchmark.
  • Implement the same microbenchmark in Python using time.perf_counter() and compare ops/sec. (Hint: Python will be much slower per-op; that’s why we migrate hotspots.)

Practical next steps and what to measure in the field

  • For network I/O benchmarks, use pcap with hardware timestamps when possible and compute hop-to-order latency.
  • Use perf to see if allocations, syscalls, or branch mispredictions dominate the time.
  • Establish SLOs early (e.g., p99 < 5us) and continuously measure against them; alert when breached.

Try a small modification now (exercise):

  • Edit the C++ example and:
    • increase ITERATIONS by 10x,
    • or add std::this_thread::sleep_for(std::chrono::nanoseconds(2000)); inside the loop to simulate NIC queueing jitter,
    • or switch to the heavy workload.

Observe how the numbers change (min, p95, p99 and ops/sec). Understanding how these metrics move when you change workload or environment is the key skill here.

CPP
OUTPUT
:001 > Cmd/Ctrl-Enter to run, Cmd/Ctrl-/ to comment