⚾ Daily MLB Picks

Technical Documentation

How Our Models Actually Work

No fluff, no marketing speak. This is the real technical breakdown of our MLB prediction systems—from data pipelines to model architectures to ensemble methods. Written for people who know the difference between overfitting and underfitting.

The Full Pipeline: Data to Decision

Building a profitable MLB betting model isn't just about training an algorithm. It's an end-to-end system that starts with raw data and ends with actionable predictions. Here's our complete pipeline:

Phase 1: Data Collection & Cleaning

We pull from multiple sources because no single API has everything we need:

Data cleaning is unglamorous but critical. We've built custom scripts to:

Data Quality Nightmare: In 2023, we discovered a bug where Statcast was incorrectly tagging some sliders as curveballs. This poisoned three months of training data before we caught it. Lesson: never trust a single source, always validate.

Phase 2: Feature Engineering

This is where the real work happens. Raw stats are okay, but derived features are where predictive power comes from. We've engineered 180+ features across these categories:

Pitcher Performance Features (45 features)

Hitter/Lineup Features (52 features)

Matchup & Situational Features (38 features)

Bullpen & Relief Features (22 features)

Environmental & Park Features (23 features)

Feature Engineering Win: We created a "Pitcher Fatigue Index" combining days rest, recent pitch count, and velocity trends. This single feature improved model accuracy by 2.3% because it catches fatigued pitchers before they blow up. The betting markets often miss this.

Model Architectures: The Four-Model Ensemble

We run four different models and combine their predictions. Here's the technical breakdown:

Model 1: Random Forest (Primary Workhorse)

Architecture:

Why It Works:

Random Forests excel at capturing non-linear relationships. For example, temperature's effect on run scoring isn't linear—games at 55°F and 95°F both suppress scoring, but 75°F is optimal. A linear model can't capture this, but decision trees can split on multiple temperature ranges.

The ensemble nature (500 trees voting) reduces variance dramatically. Individual trees overfit, but averaging their predictions smooths out noise. It's like polling—one poll might be wrong, but averaging 500 polls gets closer to truth.

Performance Metrics (2025):

Classification Accuracy

Overall Accuracy: 58.7%
Precision: 59.2%
Recall: 61.4%
F1 Score: 0.603

Betting Performance

Win Rate (ML): 58.7%
ROI: 17.2%
Units Profit: +18.4U
Sample Size: 214 picks

Model 2: Neural Network (Pattern Finder)

Architecture:

Why It Works:

Neural networks discover complex, non-obvious interactions between features. For instance, the model learned that high wind speed at Coors Field (altitude) has a different effect than high wind at sea-level parks. It identified that certain pitcher-hitter matchups (fastball velocity + batter swing speed + launch angle tendency) predict outcomes better than analyzing each variable independently.

The dropout layers prevent overfitting by randomly "turning off" neurons during training, forcing the network to learn robust features that don't depend on any single pathway.

Performance Metrics (2025):

Classification Accuracy

Overall Accuracy: 56.9%
Precision: 57.3%
Training Loss: 0.623
Validation Loss: 0.641

Betting Performance

Win Rate (Totals): 63.2%
ROI: 22.1%
Units Profit: +9.7U
Sample Size: 87 picks
Neural Network Specialty: This model crushes totals predictions. Its ability to model complex scoring dynamics (how pitcher velocity + weather + bullpen fatigue interact) makes it our go-to for over/under bets. When the neural net has high confidence on a total, we listen.

Model 3: Gradient Boosting (XGBoost)

Architecture:

Why It Works:

XGBoost builds trees sequentially, where each new tree corrects mistakes from previous trees. It's like having 300 experts where each one focuses on fixing what the previous experts got wrong. This iterative error correction makes it deadly accurate for specific scenarios.

XGBoost particularly excels at run line predictions because it captures the nuance of "close game" vs "blowout" dynamics. It learned that certain pitcher-offense matchups tend toward one-run games, while others frequently produce blowouts.

Performance Metrics (2025):

Classification Accuracy

Overall Accuracy: 57.4%
AUC-ROC Score: 0.614
Log Loss: 0.652

Betting Performance

Win Rate (RL): 57.1%
ROI: 14.8%
Units Profit: +7.2U
Sample Size: 96 picks

Model 4: Logistic Regression (Baseline & Sanity Check)

Architecture:

Why We Use It:

Logistic regression is intentionally simple. It can't capture complex interactions, but that's the point. If our fancy neural network predicts something wildly different from logistic regression, we investigate why. Often, the neural net is overfitting or picking up spurious correlations.

Plus, logistic regression gives interpretable coefficients. We can see exactly which features drive predictions: "A 1 mph increase in fastball velocity = 2.3% higher win probability." That helps us understand what the models are learning.

Performance Metrics (2025):

Classification Accuracy

Overall Accuracy: 55.2%
Coefficient Stability: High

Betting Performance

Win Rate: 55.2%
ROI: 9.4%
Units Profit: +4.1U
Sample Size: 87 picks

Ensemble Method: Combining Predictions

Here's where it gets interesting. We don't just pick the "best" model—we combine all four using weighted averaging. Here's our approach:

Dynamic Weighting Strategy

Model weights aren't static. They adjust based on recent performance and bet type:

Model Moneyline Weight Run Line Weight Totals Weight
Random Forest 40% 30% 25%
Neural Network 25% 20% 45%
XGBoost 25% 40% 20%
Logistic Regression 10% 10% 10%

These weights were optimized using historical out-of-sample data from 2020-2024. We tested 10,000+ weight combinations to find the optimal balance.

Confidence Score Calculation

Our confidence score (0-100) considers:

confidence = (model_agreement * 0.40) + (prediction_strength * 0.30) + (recent_performance * 0.20) + (data_quality * 0.10) if confidence >= 85: bet_size = 3.0 # units elif confidence >= 70: bet_size = 1.0 elif confidence >= 55: bet_size = 0.5 else: bet_size = 0 # no bet
Ensemble Advantage: In 2025, the ensemble outperformed any individual model by 3.7% in win rate and 6.2% in ROI. Diversification works in modeling just like it does in investing. Different models catch different edges.

Validation & Testing Protocols

Anyone can build a model that performs great on training data. The trick is building one that performs on new data. Here's how we validate:

1. Train-Test Split (Temporal)

We split data chronologically, never randomly. Training on 2010-2023, testing on 2024, then deploying for 2025. This simulates real-world usage where you're predicting the future, not randomly sampling the past.

Common Mistake: Many "backtests" use random splits, which leaks future information into training. A model might see Game 2 of a series in training and Game 1 in testing—that's cheating. Always split temporally for time-series data.

2. Walk-Forward Validation

We retrain models monthly using only data available at that point in time. For example, our June 2025 model was trained on games through May 2025. This ensures we're not using future data to predict the past.

3. Cross-Validation (K-Fold, Temporal)

We run 5-fold temporal cross-validation: split data into 5 sequential chunks, train on 4, test on 1. Repeat for all combinations. This gives confidence intervals around performance metrics.

4. Out-of-Sample Testing (2024 Holdout)

The entire 2024 season was held out during model development. We used it once for final validation before deploying for 2025. Results: 56.8% accuracy, confirming the models generalize.

5. Real-Money Tracking (2025 Season)

The ultimate test. We've tracked every pick since Opening Day 2025 with real betting lines (not closing lines, but lines available when we post picks). Current record: 198-144-3 (58.0%), +24.8 units profit.

What We're Working On Next

Model development never stops. Here's what's in our pipeline:

Planned Improvements (Winter 2025-2026)

Machine learning for sports betting is an arms race. As markets get sharper, we need better models. The moment we stop improving is the moment our edge disappears.

Ready to See the Models in Action?

Daily picks with full transparency—see which models agree, confidence scores, and reasoning

View Today's Picks Back to AI Overview