Bench Coach — Transparency

Data Sources

Every input to the Bench Coach model — what it is, where it comes from, how often it refreshes, and what it costs. No hidden feeds.

Document revised2026-05-04·Sources6

Live Game Data

SourceMLB Stats API (statsapi.mlb.com)
FeedGUMBO live feed v1.1
What we use it forIn-game state: current outs, base runners, score, inning, at-bat, and play-by-play events. Drives the 75-state lead-aware Markov chain input on every pitch.
Update cadenceReal-time — polled every 15 seconds
Also used forDaily schedule (game slate), rosters, team standings, starting pitcher confirmation, and lineup cards
License / costPublic / free (MLB official API)

Historical Data — Model Training

SourceRetrosheet
What we use it forHistorical play-by-play data (2010–2024) — 2.6 million plate appearances — used to estimate the underlying 25×25 baseline transition matrix; production uses the 75-state lead-aware chain
Volume2.6M plate appearances across 15 seasons
Update cadenceSeasonal — model retrains annually when new season data is released
LicensePublic, non-commercial license (retrosheet.org/notice.htm)

Retrosheet data is used exclusively for training the underlying 25×25 baseline transition matrix; production uses the 75-state lead-aware Markov chain. The deployed model trains on the full 2010–2024 corpus; calibration metrics come from the post-audit 2017–2025 out-of-sample suite — 361,519 predictions across 20,325 games, Brier 0.1598–0.1677 — see the methodology page for Brier and accuracy figures. No Retrosheet data is queried at runtime.

Odds & Market Data

SourceThe Odds API
TierStarter 20K (20,000 credits/month)
What we use it forLive sportsbook odds for MLB moneylines and totals (DraftKings, FanDuel, BetMGM, and others). Used for de-vig, expected value (EV) calculation, and Kelly criterion bet sizing.
Update cadenceCached with 5-minute TTL; refreshed on live poller tick
License / costCommercial API — Starter 20K tier

Environmental Enrichment

SourceOpen-Meteo
What we use it forGame-time weather at 30 MLB stadiums: temperature, wind speed, wind direction (bearing), humidity, and precipitation probability. Wind vector decomposed into blowing-out / blowing-in component relative to each park's home plate bearing.
Update cadenceHourly — pulled before first pitch
License / costFree tier (non-commercial)
Second sourceESPN Injury Feed
What we use it forIL status and injury reports for starting pitchers and key position players. Supplemented by MLB Transactions API for official roster moves.
Update cadenceStarting pitcher slot polled every 60 seconds during active game window; injury list polled every 15 minutes
License / costPublic endpoints; terms respected

Player Profiles — Statcast

SourceBaseball Savant / Statcast (via pybaseball)
What we use it forPer-batter and per-pitcher outcome distributions (singles, doubles, home runs, strikeouts, walks, groundouts) used to personalize the Markov transition matrix per at-bat via Bayesian blending
Volume358 batter profiles, 354 pitcher profiles
Update cadenceLocal-only — refreshed manually and shipped as static cache artifacts
Production boundaryBaseball Savant blocks Railway's production IP. Statcast is queried locally only; results are serialized to .json cache files and deployed with the application.
License / costPublic / free (non-commercial)

Baseball Savant (Statcast) blocks Railway's production IP. Statcast calls do not run from the production server. Batter and pitcher matrices are computed locally via pybaseball, then shipped as static cache artifacts at deploy time. Production reads the cache only — it never calls Statcast directly. The cache is refreshed locally and re-deployed as needed. Player profile data in production is current to the last local refresh, not real-time.

Quick Reference

All sources by cadence.

SourceCadenceLicense / Tier
MLB Stats APIReal-time (15s)Public / free
RetrosheetAnnual (training only)Non-commercial
The Odds API5-min cacheCommercial API
Open-MeteoHourlyFree tier
ESPN / MLB Transactions60s / 15mPublic / free
Baseball SavantLocal cache (seasonal)Non-commercial