Setting Up the Python Environment
Welcome — this screen gets your Python workspace ready for prototyping HFT strategies and for migrating hotspots to C++. You're a multi-language beginner (C++, Python, Java, C, JS): think of Python as your fast sketchpad (like a REPL version of javac + quick scripts) and C++ as the production engine you call when speed matters.
Why a dedicated Python env?
- Isolation: a
venv/condaprevents library-version clashes (like keepingnode_modulesfor different JS projects separate). - Reproducibility: pin
numpy/pandas/numba/cython/pybind11versions so your backtests don't silently change behavior across machines. - Iterate fast: prototype a strategy in Python, profile it, then move the hot loop to C++ (via
pybind11) if needed.
Quick visual: Prototype -> Profile -> Push to C++
Prototype (Python) ---> Profile (cProfile / line_profiler / numba) ---> C++ (pybind11) ---> Deploy
ASCII flow:
[Python REPL / Jupyter] | v [Prototype: pandas + numpy] | v [Profile: find hot loop] | v [C++ function exposed with pybind11] | v [Import extension in Python]
Create an environment (venv)
venv (lightweight, stdlib):
SNIPPET1python3 -m venv .venv 2source .venv/bin/activate # macOS / Linux 3.\.venv\Scripts\activate # Windows (PowerShell) 4python -m pip install --upgrade pip 5pip install numpy pandas numba cython pybind11conda (easier binary packages on some systems):
SNIPPET1conda create -n hft_py python=3.10 -y 2conda activate hft_py 3conda install -c conda-forge numpy pandas numba cython pybind11 -y
Tip: For HFT work, prefer conda or pip wheels built for your CPU to avoid long compile times for packages like numba/cython.
Install list (minimum for this course)
numpy— numeric arrays (likestd::vector<double>but with fast vectorized ops)pandas— dataframes for tick/bar data processingnumba— JIT speedups for numerical loops (great before deciding to rewrite in C++)cython— compile Python-like code to C for intermediate speed gainspybind11— clean bridge to call C++ from Python
Pin them in requirements.txt or a conda YAML for reproducible setups.
Pybind11 workflow (short)
- Prototype in Python with
numpy. - Profile to find the hot loop (e.g., computing a moving average over millions of ticks).
- Reimplement the hot function in C++ and expose it with
pybind11. - Build the extension,
importit from Python, and compare results and timings.
A tiny conceptual pybind11 binding looks like:
1// (concept only) expose `double fast_sma(ndarray prices, int window)` to Python
2#include <pybind11/pybind11.h>
3#include <pybind11/numpy.h>
4
5namespace py = pybind11;
6
7py::array_t<double> fast_sma(py::array_t<double> prices, int window) {
8 // ... C++ implementation using raw pointers for speed
9}
10
11PYBIND11_MODULE(myhft, m) {
12 m.def("fast_sma", &fast_sma);
13}(You will later compile this into a Python extension; for now, focus on environment and prototyping.)
Rapid prototyping vs production
- Rapid: use
pandas+numpyornumbain avenv; iterate in Jupyter. - Production: compile C++ components with pinned compiler flags, link via
pybind11or run them as a separate microservice (RPC). Use CI to build wheels or containers.
Challenge (try this now)
- Create a
venvand install the packages above. - Run the C++ example in the
codepane (compile + run). It computes a simple moving average (SMA) on a small price array — the same logic you'd first write in Python. - Then implement the same SMA in Python using
numpy.convolveand compare outputs and readability.
Questions to reflect on:
- Where does Python make iteration easy but slow? (Answer: per-element Python loops.)
- When does
numbamake sense vs jumping straight to C++ withpybind11? (Answer: if JIT gives enough speed-up and you want faster iteration without C++ build complexity.)
Next step: after running the C++ example, we'll show a short pybind11 binding and the setup.py/CMake recipe to build it so you can import it directly into Python.
xxxxxxxxxx}using namespace std;// Simple moving average (SMA) over a window. This mirrors what you'd first// prototype in Python with numpy, then port when it's a hotspot.double compute_sma_window(const vector<double>& prices, int start, int window) { double sum = 0.0; for (int i = start; i < start + window; ++i) { sum += prices[i]; } return sum / window;}int main() { // Example tick prices (think: small simulated price stream) vector<double> prices = {100.5, 100.7, 100.2, 100.9, 101.1, 100.8, 101.3}; int window = 3; cout << fixed << setprecision(4); cout << "Prices: "; for (double p : prices) cout << p << " "; cout << "\nWindow: " << window << "\n"; cout << "SMA results:\n"; for (size_t i = 0; i + window <= prices.size(); ++i) {

