Setting Up the Python Environment
Welcome — this screen gets your Python workspace ready for prototyping HFT strategies and for migrating hotspots to C++. You're a multi-language beginner (C++, Python, Java, C, JS): think of Python as your fast sketchpad (like a REPL version of javac
+ quick scripts) and C++ as the production engine you call when speed matters.
Why a dedicated Python env?
- Isolation: a
venv
/conda
prevents library-version clashes (like keepingnode_modules
for different JS projects separate). - Reproducibility: pin
numpy
/pandas
/numba
/cython
/pybind11
versions so your backtests don't silently change behavior across machines. - Iterate fast: prototype a strategy in Python, profile it, then move the hot loop to C++ (via
pybind11
) if needed.
Quick visual: Prototype -> Profile -> Push to C++
Prototype (Python) ---> Profile (cProfile / line_profiler / numba) ---> C++ (pybind11) ---> Deploy
ASCII flow:
[Python REPL / Jupyter] | v [Prototype: pandas + numpy] | v [Profile: find hot loop] | v [C++ function exposed with pybind11] | v [Import extension in Python]
Create an environment (venv)
venv (lightweight, stdlib):
SNIPPET1python3 -m venv .venv 2source .venv/bin/activate # macOS / Linux 3.\.venv\Scripts\activate # Windows (PowerShell) 4python -m pip install --upgrade pip 5pip install numpy pandas numba cython pybind11
conda (easier binary packages on some systems):
SNIPPET1conda create -n hft_py python=3.10 -y 2conda activate hft_py 3conda install -c conda-forge numpy pandas numba cython pybind11 -y
Tip: For HFT work, prefer conda
or pip
wheels built for your CPU to avoid long compile times for packages like numba
/cython
.
Install list (minimum for this course)
numpy
— numeric arrays (likestd::vector<double>
but with fast vectorized ops)pandas
— dataframes for tick/bar data processingnumba
— JIT speedups for numerical loops (great before deciding to rewrite in C++)cython
— compile Python-like code to C for intermediate speed gainspybind11
— clean bridge to call C++ from Python
Pin them in requirements.txt
or a conda
YAML for reproducible setups.
Pybind11 workflow (short)
- Prototype in Python with
numpy
. - Profile to find the hot loop (e.g., computing a moving average over millions of ticks).
- Reimplement the hot function in C++ and expose it with
pybind11
. - Build the extension,
import
it from Python, and compare results and timings.
A tiny conceptual pybind11 binding looks like:
1// (concept only) expose `double fast_sma(ndarray prices, int window)` to Python
2#include <pybind11/pybind11.h>
3#include <pybind11/numpy.h>
4
5namespace py = pybind11;
6
7py::array_t<double> fast_sma(py::array_t<double> prices, int window) {
8 // ... C++ implementation using raw pointers for speed
9}
10
11PYBIND11_MODULE(myhft, m) {
12 m.def("fast_sma", &fast_sma);
13}
(You will later compile this into a Python extension; for now, focus on environment and prototyping.)
Rapid prototyping vs production
- Rapid: use
pandas
+numpy
ornumba
in avenv
; iterate in Jupyter. - Production: compile C++ components with pinned compiler flags, link via
pybind11
or run them as a separate microservice (RPC). Use CI to build wheels or containers.
Challenge (try this now)
- Create a
venv
and install the packages above. - Run the C++ example in the
code
pane (compile + run). It computes a simple moving average (SMA) on a small price array — the same logic you'd first write in Python. - Then implement the same SMA in Python using
numpy.convolve
and compare outputs and readability.
Questions to reflect on:
- Where does Python make iteration easy but slow? (Answer: per-element Python loops.)
- When does
numba
make sense vs jumping straight to C++ withpybind11
? (Answer: if JIT gives enough speed-up and you want faster iteration without C++ build complexity.)
Next step: after running the C++ example, we'll show a short pybind11
binding and the setup.py
/CMake recipe to build it so you can import it directly into Python.
xxxxxxxxxx
}
using namespace std;
// Simple moving average (SMA) over a window. This mirrors what you'd first
// prototype in Python with numpy, then port when it's a hotspot.
double compute_sma_window(const vector<double>& prices, int start, int window) {
double sum = 0.0;
for (int i = start; i < start + window; ++i) {
sum += prices[i];
}
return sum / window;
}
int main() {
// Example tick prices (think: small simulated price stream)
vector<double> prices = {100.5, 100.7, 100.2, 100.9, 101.1, 100.8, 101.3};
int window = 3;
cout << fixed << setprecision(4);
cout << "Prices: ";
for (double p : prices) cout << p << " ";
cout << "\nWindow: " << window << "\n";
cout << "SMA results:\n";
for (size_t i = 0; i + window <= prices.size(); ++i) {