Building a Real-Time F1 Analytics Dashboard with ML
Formula 1 strategy is one of the most complex real-time optimization problems in sports. Teams spend millions on proprietary tools to predict when to pit, which tire compound to use, and how to respond to safety cars. I wanted to build something accessible for fans that actually predicts rather than just reports.
Why F1?
I've been obsessed with F1 data since Max Verstappen's championship run in 2021. The sport generates terabytes of telemetry per race — throttle position at 50Hz, GPS at 10Hz, tire temperatures, fuel loads. Most of this data is publicly available via the FastF1 Python library. It was too interesting to ignore.
The Stack
``python
Core pipeline
import fastf1
import xgboost as xgb
import fastapi
import streamlit as st
import pandas as pd
import numpy as np
`
Feature Engineering
The XGBoost model takes 40+ features per lap snapshot:
`python
features = [
'lap_number',
'tire_compound', # SOFT, MEDIUM, HARD
'tire_age_laps', # How old the current tires are
'lap_time_delta_p1', # Delta to race leader
'sector_1_time',
'sector_2_time',
'sector_3_time',
'fuel_load_estimate', # Calculated from weight loss model
'track_temp',
'air_temp',
'rain_probability', # From weather API
# ... 30+ more
]
`
The fuel load estimate was the hardest feature to engineer — F1 teams don't publish fuel loads. I approximated it using the known fuel consumption rate (~110kg over 305km) and fit a degradation curve per car.
The Model
XGBoost worked better than neural networks here because:
`python
model = xgb.XGBClassifier(
n_estimators=500,
max_depth=6,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
eval_metric='logloss',
)
`
After Optuna hyperparameter tuning: 92% test accuracy on held-out 2024 season data.
The API Layer
FastAPI serves predictions with async refresh:
`python
@app.get("/predict/{session_key}/{driver_number}")
async def predict_pit_window(session_key: int, driver_number: int):
features = await extract_live_features(session_key, driver_number)
prediction = model.predict_proba(features)[0][1]
return {
"driver": driver_number,
"pit_probability": float(prediction),
"recommended_window": calculate_window(features, prediction)
}
`
Response time: <50ms P99 in production (Streamlit Cloud region: us-east-1).
Key Challenges
1. Missing Telemetry Data
FastF1 sometimes returns incomplete laps — crashes, red flags, VSC periods. I handled these with cubic spline interpolation over the gap, following the same approach F1 teams use internally.
2. Rate Limiting
The unofficial F1 API has limits. I built a caching middleware that stores session data in SQLite and only fetches fresh data every 30 seconds during live sessions.
3. Class Imbalance
Only ~3% of laps end in a pit stop. Class weight balancing (scale_pos_weight=33`) in XGBoost handled this without needing SMOTE.
What I Learned
What's Next
- →Multi-driver simultaneous strategy comparison
- →Live undercut/overcut simulation
- →MongoDB for persistent session cache across deployments
---
Want to see more ML projects? Back to portfolio →