High-Level Architecture of an HFT System
Understand the big picture first — then we dive into code. This screen shows the main components you'll meet when building a tiny HFT service and explains the latency‑critical path you must shrink. You're a beginner in C++
& Python
(and have background in Java
, C
, JS
) — so I'll point out where those languages typically live in this stack.
ASCII diagram (simple, left-to-right data flow):
[Exchange multicast / TCP] --> [NIC / Hardware timestamp] --> [Kernel / Driver] --> [Market Data Feed Handler] | v [Strategy Engine (decision)] --> [Order Gateway] --> [Exchange] | | v v [Risk] [Logging / Telemetry]
Key components (short, practical notes):
Market Data Feed Handler
- Role: receive, parse, and sequence-recover exchange messages (often UDP multicast / binary protocols like ITCH).
- Typical implementation: C++ for lowest latency (tight parsing, zero-copy), or
Python
for prototyping (slow path). - Things to watch: copying, memory allocation, and parse branching.
Strategy Engine
- Role: use parsed market data to decide orders. Could be simple rules (crossing SMA) or complex signals.
- Typical flow: prototype algorithm quickly in
Python
(numpy
,pandas
), then move hot code paths toC++
(or bind withpybind11
). - Keep decision logic in-memory and branch-minimal for microseconds.
Order Gateway
- Role: serialize orders and send to exchange; track acknowledgements and resend logic.
- Typical implementation: low-level
C++
for performance and strict socket handling.
Risk
andLogging
- Risk checks should be inline and extremely fast (pre-trade); heavy risk policies are off the hot-path.
- Logging must not block: use async/batched writers, ring buffers, or route logs off-thread.
Latency-critical path (what to optimize first):
- From the NIC timestamp to the bytes on the wire back to exchange:
NIC -> Kernel -> Feed handler -> Strategy -> Order Gateway -> NIC
. - Focus on: zero/allocation-free parsing, cache-friendly data layout, avoiding syscalls in the hot path, and hardware timestamping.
Language mapping and analogies for your background:
- If you come from
Java
: think ofC++
here as Java without the GC — you must manage memory but you get predictable pauses. - If you come from
C
: same low-level control, plus modern tools (std::vector
, RAII) to avoid bugs. - If you come from
JS
: imagine the market feed as events on an event loop — but instead of a single-threaded loop, we design threads and lockless queues for microsecond latencies. Python
is your rapid-prototyping notebook — don't ship it on the hot path without moving bottlenecks toC++
.
Quick checklist (visual):
1[ ] NIC hardware timestamping enabled
2[ ] Feed handler: zero-copy parsing
3[ ] Strategy: branch-light, cache-friendly data
4[ ] Order gateway: async socket send, minimal syscalls
5[ ] Risk: pre-trade checks inline
6[ ] Logging: non-blocking, batched
Hands-on challenge (run the C++ program below):
- The C++ snippet simulates the component chain and prints per-stage and total microsecond latencies. It's a model — not a real network stack — but it helps you reason about which stages dominate.
- Try these experiments:
- Change stage latencies to see which component pushes you past the critical threshold.
- Replace the
Strategy
stage with a smaller value to simulate migrating Python logic to C++. - Edit
favorite_player
to your favorite athlete (or coder) — a tiny personalization tie-in to keep learning playful.
Below is an executable C++
snippet that models this pipeline. Modify the stage times and rerun to explore the latency profile.
xxxxxxxxxx
}
using namespace std;
using Clock = chrono::high_resolution_clock;
using us = chrono::microseconds;
int main() {
// Personalize this (change to your favorite player or coder):
string favorite_player = "Kobe Bryant"; // change for fun
// Each pair is: (stage name, simulated latency in microseconds)
// These numbers are coarse simulations to help you reason about hotspots.
vector<pair<string,int>> stages = {
{"NIC/hardware rx (hw ts)", 30},
{"Kernel / driver copy", 20},
{"Feed handler parse (zero-copy)", 60},
{"Strategy (in-memory decision)", 120},
{"Risk check (inline)", 40},
{"Order serialization", 30},
{"Socket send / NIC tx", 50},
{"Exchange ack RTT (mock)", 300}
};
us critical_threshold(500); // microseconds: quick example threshold