High-Level Architecture of an HFT System
Understand the big picture first — then we dive into code. This screen shows the main components you'll meet when building a tiny HFT service and explains the latency‑critical path you must shrink. You're a beginner in C++ & Python (and have background in Java, C, JS) — so I'll point out where those languages typically live in this stack.
ASCII diagram (simple, left-to-right data flow):
[Exchange multicast / TCP] --> [NIC / Hardware timestamp] --> [Kernel / Driver] --> [Market Data Feed Handler] | v [Strategy Engine (decision)] --> [Order Gateway] --> [Exchange] | | v v [Risk] [Logging / Telemetry]
Key components (short, practical notes):
Market Data Feed Handler- Role: receive, parse, and sequence-recover exchange messages (often UDP multicast / binary protocols like ITCH).
- Typical implementation: C++ for lowest latency (tight parsing, zero-copy), or
Pythonfor prototyping (slow path). - Things to watch: copying, memory allocation, and parse branching.
Strategy Engine- Role: use parsed market data to decide orders. Could be simple rules (crossing SMA) or complex signals.
- Typical flow: prototype algorithm quickly in
Python(numpy,pandas), then move hot code paths toC++(or bind withpybind11). - Keep decision logic in-memory and branch-minimal for microseconds.
Order Gateway- Role: serialize orders and send to exchange; track acknowledgements and resend logic.
- Typical implementation: low-level
C++for performance and strict socket handling.
RiskandLogging- Risk checks should be inline and extremely fast (pre-trade); heavy risk policies are off the hot-path.
- Logging must not block: use async/batched writers, ring buffers, or route logs off-thread.
Latency-critical path (what to optimize first):
- From the NIC timestamp to the bytes on the wire back to exchange:
NIC -> Kernel -> Feed handler -> Strategy -> Order Gateway -> NIC. - Focus on: zero/allocation-free parsing, cache-friendly data layout, avoiding syscalls in the hot path, and hardware timestamping.
Language mapping and analogies for your background:
- If you come from
Java: think ofC++here as Java without the GC — you must manage memory but you get predictable pauses. - If you come from
C: same low-level control, plus modern tools (std::vector, RAII) to avoid bugs. - If you come from
JS: imagine the market feed as events on an event loop — but instead of a single-threaded loop, we design threads and lockless queues for microsecond latencies. Pythonis your rapid-prototyping notebook — don't ship it on the hot path without moving bottlenecks toC++.
Quick checklist (visual):
1[ ] NIC hardware timestamping enabled
2[ ] Feed handler: zero-copy parsing
3[ ] Strategy: branch-light, cache-friendly data
4[ ] Order gateway: async socket send, minimal syscalls
5[ ] Risk: pre-trade checks inline
6[ ] Logging: non-blocking, batchedHands-on challenge (run the C++ program below):
- The C++ snippet simulates the component chain and prints per-stage and total microsecond latencies. It's a model — not a real network stack — but it helps you reason about which stages dominate.
- Try these experiments:
- Change stage latencies to see which component pushes you past the critical threshold.
- Replace the
Strategystage with a smaller value to simulate migrating Python logic to C++. - Edit
favorite_playerto your favorite athlete (or coder) — a tiny personalization tie-in to keep learning playful.
Below is an executable C++ snippet that models this pipeline. Modify the stage times and rerun to explore the latency profile.
xxxxxxxxxx}using namespace std;using Clock = chrono::high_resolution_clock;using us = chrono::microseconds;int main() { // Personalize this (change to your favorite player or coder): string favorite_player = "Kobe Bryant"; // change for fun // Each pair is: (stage name, simulated latency in microseconds) // These numbers are coarse simulations to help you reason about hotspots. vector<pair<string,int>> stages = { {"NIC/hardware rx (hw ts)", 30}, {"Kernel / driver copy", 20}, {"Feed handler parse (zero-copy)", 60}, {"Strategy (in-memory decision)", 120}, {"Risk check (inline)", 40}, {"Order serialization", 30}, {"Socket send / NIC tx", 50}, {"Exchange ack RTT (mock)", 300} }; us critical_threshold(500); // microseconds: quick example threshold


