CFB Methodology
How the 2026 college football preview gets built: where the data comes from, how the model is trained, how 50,000-season simulations turn weekly point spreads into win totals, conference championship probabilities, and national title odds.
Data sources
- Game results & scores:CollegeFootballData.com (cfbd) API, 2010–2025, FBS only, regular season + bowls + CFP. ~10,000 games.
- Team-season aggregates:success rate, explosive rate, finishing drives, havoc, line yards, ppa — all derived from cfbd play-level data per team-season.
- Market lines: closing spreads + totals from cfbd bookmaker feeds. Used for backtest grading and live edge calc on futures.
- 2026 schedule + early lines: cfbd 2026 endpoint, 779 games loaded, ~10 with early lines posted. Reloaded weekly.
- Scouting overrides: 68 spring scouting deep-dives + 12 conference tier reports, parsed into structured signals (returning production, transfer portal grade, coaching changes, schedule difficulty). These adjust the team-strength prior before simulation.
The strength model
Two-model ensemble that predicts margin (home minus away points) for a single game, given team-season features for both sides plus venue, rest, and conference.
Linear baseline on the same feature matrix. Stable, no overfit, handles thin sub-samples (G5, FCS games) gracefully. Acts as the anchor when XGB extrapolates poorly.
Captures non-linear interactions (e.g. an elite pass-rush vs a weak pass-blocking line). Tuned on 2010–2022, validated on 2023–2024.
Final pick = 0.55 × ridge + 0.45 × xgb. Backtest MAE on out-of-sample 2023–2024 games: ~13.2 points per game. (College football variance is structurally higher than NFL — expect that.)
Scouting overrides — exact math
Returning production and recruiting features lag what spring practice actually shows. So after the ensemble predicts a margin andbefore the simulation rolls dice, every team gets a per-game points adjustment from three independent sources:
unit_margin = Σ (tier - 3) × weight weights: QB=2.5 OL=1.5 DL=1.5 back7=1.0 skill=1.0 ST=0.3
Tier 3 contributes 0 (the model already saw it via talent / returning_ppa / SP+). Only 1, 2, 4, 5 produce a delta — this prevents double-counting signal already encoded in features.
spring_margin = signal × 0.5 # signal ∈ {-2, -1, 0, +1, +2}Captures stuff visible in April that won't hit a stat sheet until September: QB rehab updates, OL injury news, transfer integration reports.
big_game flags (fire only in CFP / conf championship / rivalry):
big_game_qb_concern -3.0 big_game_qb_boost +3.0
big_game_qb_minor_concern -1.5 big_game_qb_minor_boost +1.5
championship_hangover -1.5 new_oc / new_dc -0.75 each
regular-season flag margin = 0 (key_injury / portal_heavy / qb_uncertainty
widen σ instead of moving the mean)Coordinator changes and championship hangovers used to fire every game — empirically over-penalized teams with double coordinator changes, so they were moved to bracket-only where install quality actually matters.
team_margin = unit_margin + flag_margin + spring_margin team_margin = max(team_margin, -2.0) # negative cap only # in big games (bracket / rivalry / top-12 vs top-12): # team_margin uses big_game flag set # unit boost is damped by 0.5 (model already knows bracket teams are good)
The negative cap exists because flag stacking compounded into implausible per-game penalties (3+ win UNDER edges). Teams can still be positively adjusted past +2.0; only the floor is fixed.
σ_baseline = 17.0 points widening flags (take max of any active): qb_uncertainty 22.0 key_injury 21.0 portal_heavy 20.0 big_game_qb_concern 22.0 big_game_qb_minor_concern 20.5
Wider σ for teams with structural uncertainty — a team with a QB question mark has fatter tails on both sides, even if their mean prediction is unchanged.
game_margin = base_prediction + (home_team_margin - away_team_margin) game_σ = sqrt(home_σ² + away_σ²) / sqrt(2)
Both sides' adjustments stack into the head-to-head. A +5.0 team playing a -1.5 team gets a +6.5 ppg swing on top of the base model.
Every team's intel JSON (units, flags, spring_signal, schedule tier, qb tier, confidence) is in version control and rendered on the team page — nothing is hidden.
Why a stronger team can have a lower championship %
The unit-margin number is head-to-head team strength. Conference championship probability depends on three things, only one of which is team strength:
- Win-total distribution— how often does this team finish 9-3 or better? Driven by their schedule, not just talent. Two teams in the same conference can have very different schedules.
- Tiebreakers and standings shape— if multiple teams reach 9-3, head-to-head and division records decide who plays in the title game.
- Title-game win probability— conditional on making the championship game, what's the team's chance of winning it (often vs the conference's top dog).
A team that's objectively stronger but plays the conference's three best opponents will often have a lower championship probability than a slightly weaker team that misses all three. Talent beats schedule head-to-head; schedule beats talent over a 12-game path. The simulation captures both. If a team-vs-team intuition disagrees with a conference-title number, both can be right.
Season simulation
For 2026: 50,000 full seasonssimulated. Each game's outcome is drawn from the model's predicted margin distribution (point estimate + a residual sample sized to its position in the strength gap). Across all 50K seasons we count:
- Win totals: regular-season wins distribution per team.
- Conference championships:probability of appearing in & winning each conference's title game.
- CFP berths:12-team field, conf-champ autobids + at-large bids by simulated CFP committee proxy (resume score = wins · SOS).
- National title: probability of winning the full bracket given seed.
Edge calculation
For futures markets where books have posted lines, the displayed edge is:
edge = model_implied_prob - market_implied_prob
market_implied_prob = 1 / american_to_decimal(odds) # de-vigged via
# proportional methodAnything above ~3% is worth looking at. Anything above ~6% deserves a closer look at the team page — small edges from weak priors aren't bets.
What this is not
- Not in-game: no live win probability, no quarter- by-quarter. The pipeline runs weekly.
- Not bet tracking:no login, no wallet, no log of user picks. We publish the model's view; what you do with it is on you.
- Not certainty:CFB has more variance than the model wants to admit. Treat the top edges as “the spots worth thinking harder about,” not as locks.
Want to see it applied? Open the 2026 preview or browse scouting reports .