Did Bayesian Projection (Stan/Ridge) Predict the 2021 NPB Last-to-First Upsets?

Published: (March 4, 2026 at 09:06 PM EST)
5 min read
Source: Dev.to

Source: Dev.to

Background

In a previous article I analyzed why Marcel projection missed the 2021 NPB upsets—both pennant winners came from last place the year before:

  • Tokyo Yakult Swallows: Last in 2020 → Central League champions in 2021
  • Orix Buffaloes: Last in 2020 → Pacific League champions in 2021

Marcel identified three patterns it couldn’t see:

PatternExamples
Breakout rookie startersOkugawa Yasanobu, Miyagi Hiromu
Foreign hitter jackpotSantana, Osuna
Breakout performance by established playerSugimoto Yutaro, Takahashi Keiji

This article applies a Bayesian model (Stan + Ridge correction) to the same question: would it have done better?
→ GitHub:


Bayesian Model Corrections

The Bayes model uses Marcel projections as a starting point, then applies three corrections:

  1. K% / BB% adjustment
    Batting average and ERA are heavily influenced by park, opponent quality, and luck. Strikeout rate (K%) and walk rate (BB%) are more environment‑stable indicators of underlying skill. K% is especially useful for pitchers: high‑strikeout pitchers tend to outperform their Marcel ERA projection.

  2. BABIP correction
    Batted‑ball luck (BABIP) fluctuates year to year. Players with unusually low BABIP may have been unlucky; high BABIP may have inflated their stats. The model adjusts Marcel projections accordingly.

  3. Better initialization for foreign players
    Marcel uses league‑average for first‑year foreign players (no NPB history). The Bayes model converts prior‑league (MLB/KBO) stats into NPB scale as an initial value.


Results

Team Projections

Yakult Swallows

MetricMarcelStanActual
2021 wins63.364.373
Error-9.7-8.7

Orix Buffaloes

MetricMarcelStanActual
2021 wins71.773.570
Error+1.7+3.5

Stan reduced Yakult’s error by about 1 win. Orix was close to actual in both models. Neither model “predicted” the upsets.

Player Projections

Rookie starter breakout (Miyagi Hiromu)

PlayerActual ERAMarcelStanStan improvement
Miyagi (2021)2.513.783.40-0.38

Stan is 0.38 ERA points closer, but still off. “Is this pitcher MLB‑ready?” cannot be read from box‑score data alone.

First‑year foreign hitters

PlayerActual wOBAMarcelStan
Santana (2021).392.318 (lg avg).316 (lg avg)
Osuna (2021).311.318 (lg avg).315 (lg avg)

Even with MLB conversion factors, a league‑average conversion cannot predict individual player adaptation. Both models missed Santana’s .392 by roughly the same margin.

Established pitcher (Takahashi Keiji)

PlayerActual ERAMarcelStanStan improvement
Takahashi (2021)2.874.554.22-0.33

Takahashi’s K% was already high in 2020. The “high K% → better ERA” correction moved the projection in the right direction.

Established hitter (Sugimoto Yutaro)

PlayerActual wOBAMarcelStanStan change
Sugimoto (2021).413.310.299-0.011 (worse)

Sugimoto hit .695 OPS in only 141 PA in 2020—was he unlucky or genuinely poor? The BABIP correction moved the projection further down. Without Statcast‑type exit‑velocity data, the box score alone is ambiguous.

Yakult pitchers (2021, IP ≥ 40)

PitcherActual ERAMarcelStanStan improvement
Takahashi Keiji2.874.554.22-0.33
Konno Ryuta2.763.472.92-0.56
McGough2.523.593.32-0.26
Kanakubo Yuto2.743.983.69-0.29
Shimizu Noboru2.394.214.11-0.10

Orix pitchers (2021, IP ≥ 40)

PitcherActual ERAMarcelStanStan improvement
Yamamoto Yoshinobu1.392.462.23-0.23
Miyagi Hiromu2.513.783.40-0.38
Higgins2.533.192.86-0.34

Stan consistently outperformed Marcel for high‑K% pitchers, but the actual ERAs of Shimizu (2.39) and Yamamoto (1.39) were still well beyond what either model projected.

Model Limitations Summary

FactorMarcel limitationStan improvement
Rookie starter breakout (Okugawa, Miyagi)Out of scope (no data)None (same limitation)
Foreign hitter jackpot (Santana)League‑average substitutionNone (initialization gap remains)
Established hitter breakout (Sugimoto)Drags prior‑year low statsNone (actually worse)
Pitcher improvement (Takahashi, etc.)Drags prior‑year statsYes — K% correction helps

The Bayesian K%/BB% correction is useful for pitchers. For batters, foreign‑player unknowns, and rookies, the bottleneck remains the same as Marcel: lack of relevant data.


Conclusions

Both models give the same bottom line for 2021: the upsets were caused by changes that past performance data can’t capture. The missing piece is batted‑ball quality data. In MLB, Statcast records exit velocity and launch angle for every batted ball; a hitter with a poor BABIP but strong exit velocity is a candidate for positive regression. NPB does not publish this level of data, so the ceiling for projection accuracy from box scores alone is structural, not algorithmic.


References

  • Previous article (Marcel analysis):
  • Bayes model development log:
  • Prediction app (Marcel + Stan projections):
  • GitHub (Marcel):
  • GitHub (Bayes model):
  • Data sources: baseball-data.com / npb.jp
0 views
Back to Blog

Related posts

Read more »