THE EVOLUTION OF AI IN BASEBALL BETTING

From Elo Ratings to Deep Learning
January 27, 2026
14 min read

The idea of using mathematics to predict baseball games is as old as the sport itself. But the tools have changed beyond recognition. What started as a guy in his basement calculating on-base percentages with a calculator has become a global industry where neural networks process seven terabytes of data per game and AI systems manage professional baseball teams in real time.

This is the story of how we got here, told through five distinct eras that each fundamentally changed how we think about predicting baseball. If you're betting on MLB in 2026, understanding this evolution isn't just interesting trivia. It tells you exactly where the edge is today and where it's headed tomorrow.

Era 1: The Sabermetrics Revolution (1977-2002)

The Foundation

Bill James published his first Baseball Abstract in 1977, and nothing was ever the same. Working from his night shift at a pork and beans cannery in Lawrence, Kansas, James asked a question that now sounds obvious but was radical at the time: what if the stats we've been using for a century are actually terrible at measuring player value?

The answer turned out to be: yes, they're terrible. Batting average ignores walks. RBI depends on who bats in front of you. Pitcher wins are more about run support than pitching quality. James and his growing community of "sabermetricians" developed replacements: on-base percentage, runs created, range factor, and dozens of others that more accurately captured what actually produces wins.

For bettors, the sabermetric revolution created the first real information edge. If the betting market priced games based on traditional stats (batting average, ERA, win-loss record), and you knew those stats were misleading, you could find value by evaluating teams through the more accurate sabermetric lens.

But the tools were primitive. You were still doing the analysis by hand or with basic spreadsheets. The data was limited to what appeared in box scores. And the market inefficiencies, while real, were exploited slowly because there was no way to systematically process information across an entire day's slate.

Era 2: Moneyball and the Data Analytics Boom (2002-2014)

Going Mainstream

When Michael Lewis published Moneyball in 2003, the secret was out. Billy Beane and the Oakland Athletics had used sabermetric analysis to build a playoff contender on a shoestring budget, and suddenly every front office wanted in on the data game. More importantly for our story, the betting market started to catch up.

The A's famously targeted on-base percentage when the market undervalued it. But once Moneyball made that strategy public knowledge, the inefficiency disappeared almost overnight. OBP became properly valued in contracts and, consequently, in betting lines. The easy edge was gone.

This era also saw the first serious computer models for baseball prediction. Elo ratings, originally developed for chess by physicist Arpad Elo, were adapted for baseball. These systems assigned each team a numerical rating that adjusted after every game. Beat a strong team, your rating goes up a lot. Lose to a weak team, it drops. Simple, transparent, and surprisingly effective.

FiveThirtyEight's MLB predictions, powered by an Elo-based model, became the public face of quantitative baseball forecasting. But Elo models had clear limitations. They couldn't account for pitching matchups in granular detail, they were slow to react to in-season roster changes, and they treated all wins and losses the same regardless of the quality of the underlying performance.

Meanwhile, projection systems like PECOTA (developed by Nate Silver at Baseball Prospectus) and ZiPS (by Dan Szymborski) were taking a different approach: projecting individual player performance and aggregating those projections into team-level forecasts. This was more sophisticated than Elo, but still fundamentally a statistical regression exercise, not machine learning in the modern sense.

Era 3: Statcast and the Tracking Data Explosion (2015-2020)

The Data Deluge

MLB installed Statcast in all 30 stadiums starting in 2015. Suddenly, every pitch had a measured velocity, spin rate, and three-dimensional movement profile. Every batted ball had an exit velocity and launch angle. Every baserunner had a sprint speed. Every fielder had positioning data and route efficiency.

The volume of available data exploded by orders of magnitude. Where a traditional box score might contain a few dozen data points per game, Statcast was generating millions. This wasn't an incremental improvement. It was a fundamentally different type of data that made previously invisible aspects of the game measurable for the first time.

This is where machine learning entered the baseball betting landscape in earnest. The data was now too complex and too voluminous for traditional statistical methods. You couldn't build a regression model with hundreds of interacting Statcast features. But you could train a Random Forest or gradient-boosted ensemble to find patterns in that high-dimensional space.

The early ML models in this era were relatively simple by today's standards: Random Forests and logistic regression with Statcast-derived features. But they represented a genuine leap forward. For the first time, models could systematically process pitch-level data, identify matchup advantages invisible to the naked eye, and quantify the gap between a player's expected performance (based on quality of contact) and their actual results.

Expected stats like xwOBA, xBA, and xSLG became available through Baseball Savant, giving both humans and algorithms a luck-adjusted lens on player performance. The implications for betting were profound: if you could identify which players were over- or underperforming relative to their expected stats, you could project regression before the market priced it in.

Era 4: Deep Learning and Ensemble Methods (2020-2024)

Neural Networks Take Over

As compute power grew cheaper and frameworks like TensorFlow and PyTorch matured, deep learning models entered the baseball prediction space. These neural networks could model nonlinear interactions between hundreds of features simultaneously, capturing relationships that simpler models missed entirely.

The key innovation wasn't just deeper architectures. It was ensemble methods: combining multiple model types (Random Forests, gradient-boosted trees, neural networks) into a single prediction system. Each model sees the data differently and makes different types of errors. By aggregating their predictions, the ensemble produces more accurate and more robust outputs than any individual model alone.

This era also saw the rise of temporal modeling, where recurrent neural networks and transformer architectures processed game sequences over time. Instead of treating each game as an independent event, these models captured momentum, fatigue effects, travel impact, and form streaks in ways that static models couldn't.

Statcast migrated to Google Cloud in 2020, enabling real-time data processing at scale. MLB partnered with Google's Vertex AI platform to build custom machine learning models for everything from catch probability to steal success rates. The infrastructure was now in place for truly sophisticated, real-time prediction systems.

For bettors, this era created a clear divide. Casual bettors using traditional analysis or basic models found it increasingly difficult to find edge. Sophisticated bettors with access to advanced ML pipelines, or the insight to understand what those pipelines were finding, maintained and even expanded their advantage.

Era 5: LLMs, Real-Time AI, and the Modern Frontier (2025-Present)

The Current Revolution

We're now in the most exciting period in the history of AI baseball analysis. Large language models like Claude, GPT, and Gemini have entered the prediction landscape, bringing something entirely new to the table: the ability to process and reason about unstructured information. Injury reports, press conference quotes, clubhouse chemistry, weather narratives, historical context. These are things that traditional ML models can't ingest but that clearly affect game outcomes.

In September 2025, the Oakland Ballers made history when every strategic decision in a game, from lineups to substitutions to defensive positioning, was made by an AI system. The AI-managed team won, demonstrating that artificial intelligence can handle the complexity of real-time baseball strategy under pressure.

The AI in sports analytics market has reached $10.8 billion in 2025 and is projected to surge past $60 billion by 2034. That growth is driven by real results: analysts have documented that AI-guided betting tools can improve prediction accuracy by 15-20%, which in a market where a few percentage points separates winners from losers, is transformative.

7 TB
Data generated per MLB game
$10.8B
AI sports analytics market (2025)
12
Statcast cameras per stadium

MLB's Automated Ball-Strike Challenge System (ABS), rolling out for 2026, adds another dimension. Using 12 synchronized Hawk-Eye cameras with 5G connectivity, the system tracks every pitch with accuracy within one-fifth of an inch. Players can challenge questionable calls with AI adjudicating the result in under 15 seconds. For betting models, this standardization of the strike zone has direct implications: it reduces the variance introduced by inconsistent umpiring, making pitcher matchup data more predictive.

Perhaps the most significant development is the convergence of traditional ML and LLM capabilities. The best modern systems use neural networks for quantitative prediction (processing Statcast data, calculating win probabilities) while using LLMs for qualitative analysis (interpreting injury reports, evaluating managerial tendencies, assessing situational factors). It's a hybrid approach that leverages the strengths of both paradigms.

The Timeline at a Glance

1977
Bill James publishes first Baseball Abstract
The sabermetrics revolution begins with OBP, Runs Created, and statistically rigorous player evaluation.
2002
Oakland A's Moneyball season
Analytics go mainstream. The market begins repricing players based on advanced stats.
2006
PITCHf/x installed in MLB stadiums
First pitch-tracking system. Velocity, movement, and location data become available for every pitch.
2015
Statcast launches across all 30 parks
Exit velocity, launch angle, sprint speed, and dozens of new metrics create an unprecedented data ecosystem.
2020
Statcast migrates to Google Cloud
Real-time processing at scale. ML models can now ingest Statcast data as it happens, enabling live prediction.
2023-24
LLMs enter sports analysis
Claude, GPT, and Gemini begin processing unstructured baseball data: injury reports, scouting notes, contextual factors.
2025
First AI-managed professional baseball game
Oakland Ballers win a game where every strategic decision was made by an AI system built by Distillery and powered by OpenAI.
2026
ABS Challenge System rolls out
Hawk-Eye cameras with 5G connectivity standardize the strike zone. AI adjudicates challenged calls in under 15 seconds.

What This Means for Bettors in 2026

If you're betting on baseball today, you're operating in the most data-rich, analytically sophisticated environment in the sport's history. The models available, whether you build your own or follow the outputs of services that do, are capable of finding edges that would have been invisible even five years ago.

But here's the critical nuance: the sportsbooks have access to the same data and many of the same tools. The closing lines at major books reflect significant analytical sophistication. The edge isn't in having data. It's in interpreting it better, processing it faster, or combining quantitative signals with qualitative insight in ways the market hasn't fully priced.

That's why understanding the full evolution matters. Each era built on the last. Sabermetrics gave us better metrics. Statcast gave us better data. Machine learning gave us better processing. LLMs gave us better context. The bettors who thrive in 2026 are the ones who understand how all these layers work together, not just the latest shiny tool, but the entire analytical stack that produces a genuine informational edge.

The question isn't whether AI will keep transforming baseball betting. It will. The question is whether you'll be using these tools or competing against people who do.