Did Bayesian Projection (Stan/Ridge) Predict the 2021 NPB Last-to-First Upsets?

Published: 2 months ago (March 4, 2026 at 09:06 PM EST)

6 min read

Source: Dev.to

Source: Dev.to

Source: Did Bayesian Projection (Stanridge) Predict the 2021 NPB “Last‑to‑First” Upsets? – Dev.to

Background

In a previous article I analyzed why Marcel’s projection missed the 2021 NPB upsets—both pennant winners came from last place the year before:

Tokyo Yakult Swallows – last in 2020 → Central League champions in 2021
Orix Buffaloes – last in 2020 → Pacific League champions in 2021

Marcel identified three patterns it couldn’t see:

Pattern	Examples
Breakout rookie starters	Okugawa Yasanobu, Miyagi Hiromu
Foreign‑hitter jackpot	Santana, Osuna
Breakout performance by established player	Sugimoto Yutaro, Takahashi Keiji

This article applies a Bayesian model (Stan + Ridge correction) to the same question: Would it have done better?

→ GitHub: https://github.com/your-repo (replace with the actual repository link)

Bayesian Model Corrections

The Bayesian model builds on Marcel projections and then applies three systematic adjustments:

#	Adjustment	Why it matters	How it’s applied
1	K % / BB % adjustment	Batting average and ERA are heavily influenced by park factors, opponent quality, and luck. In contrast, strike‑out rate (K %) and walk rate (BB %) are relatively stable across environments and better reflect a player’s true skill. For pitchers, a high K % usually signals performance above the Marcel ERA projection.	Replace or weight the original Marcel output with a factor derived from the player’s observed K % and BB % (e.g., a regression toward league‑average K %/BB % while preserving the player‑specific deviation).
2	BABIP correction	Batted‑ball luck (BABIP) varies from season to season. Players with unusually low BABIP are likely unlucky, while those with high BABIP may be benefitting from favorable luck.	Adjust the Marcel projection up or down based on the deviation of a player’s actual BABIP from the league norm, using a regression‑to‑the‑mean factor to avoid over‑correction.
3	Better initialization for foreign players	Marcel treats first‑year foreign players (no NPB history) as league‑average, which can mis‑represent their true ability.	Convert the player’s prior‑league statistics (MLB, KBO, etc.) to the NPB scale using historical translation factors, then use this converted value as the Bayesian prior instead of a generic league average.

These three corrections together produce a more nuanced projection that accounts for skill‑stable metrics, random variation, and cross‑league translation.

Results

Team Projections

Yakult Swallows

Metric	Marcel	Stan	Actual
2021 wins	63.3	64.3	73
Error	–9.7	–8.7	—

Orix Buffaloes

Metric	Marcel	Stan	Actual
2021 wins	71.7	73.5	70
Error	+1.7	+3.5	—

Stan reduced Yakult’s error by about one win. Orix was close to the actual in both models, but neither model “predicted” the upsets.

Player Projections

Rookie starter breakout (Miyagi Hiromu)

Player	Actual ERA	Marcel	Stan	Stan improvement
Miyagi (2021)	2.51	3.78	3.40	–0.38

Stan is 0.38 ERA points closer, but still off. “Is this pitcher MLB‑ready?” cannot be read from box‑score data alone.

First‑year foreign hitters

Player	Actual wOBA	Marcel	Stan
Santana (2021)	.392	.318 (lg avg)	.316 (lg avg)
Osuna (2021)	.311	.318 (lg avg)	.315 (lg avg)

Even with MLB conversion factors, a league‑average conversion cannot predict individual player adaptation. Both models missed Santana’s .392 by roughly the same margin.

Established pitcher (Takahashi Keiji)

Player	Actual ERA	Marcel	Stan	Stan improvement
Takahashi (2021)	2.87	4.55	4.22	–0.33

High K% in 2020 helped the “high K% → better ERA” correction move the projection in the right direction.

Established hitter (Sugimoto Yutaro)

Player	Actual wOBA	Marcel	Stan	Stan change
Sugimoto (2021)	.413	.310	.299	–0.011 (worse)

Sugimoto hit .695 OPS in only 141 PA in 2020—was he unlucky or genuinely poor? The BABIP correction moved the projection further down. Without Statcast‑type exit‑velocity data, the box score alone is ambiguous.

Yakult pitchers (2021, IP ≥ 40)

Pitcher	Actual ERA	Marcel	Stan	Stan improvement
Takahashi Keiji	2.87	4.55	4.22	–0.33
Konno Ryuta	2.76	3.47	2.92	–0.56
McGough	2.52	3.59	3.32	–0.26
Kanakubo Yuto	2.74	3.98	3.69	–0.29
Shimizu Noboru	2.39	4.21	4.11	–0.10

Orix pitchers (2021, IP ≥ 40)

Pitcher	Actual ERA	Marcel	Stan	Stan improvement
Yamamoto Yoshinobu	1.39	2.46	2.23	–0.23
Miyagi Hiromu	2.51	3.78	3.40	–0.38
Higgins	2.53	3.19	2.86	–0.34

Stan consistently outperformed Marcel for high‑K% pitchers, but the actual ERAs of Shimizu (2.39) and Yamamoto (1.39) were still well beyond what either model projected.

Model Limitations Summary

Factor	Marcel limitation	Stan improvement
Rookie starter breakout (Okugawa, Miyagi)	Out of scope (no data)	None (same limitation)
Foreign‑hitter jackpot (Santana)	League‑average substitution	None (initialization gap remains)
Established hitter breakout (Sugimoto)	Drags prior‑year low stats	None (actually worse)
Pitcher improvement (Takahashi, etc.)	Drags prior‑year stats	Yes — K% correction helps

The Bayesian K%/BB% correction is useful for pitchers. For batters, foreign‑player unknowns, and rookies, the bottleneck remains the same as Marcel: lack of relevant data.

Conclusions

Both models give the same bottom line for 2021: the upsets were caused by changes that past‑performance data can’t capture. The missing piece is batted‑ball quality data. In MLB, Statcast records exit velocity and launch angle for every batted ball; a hitter with a poor BABIP but strong exit velocity is a candidate for positive regression. NPB does not publish this level of data, so the ceiling for projection accuracy from box scores alone is structural, not algorithmic.

References

Previous article (Marcel analysis): [link → …]
Bayes model development log: [link → …]
Prediction app (Marcel + Stan projections): [link → …]
GitHub (Marcel): [link → …]
GitHub (Bayes model): [link → …]
Data sources: baseball-data.com, npb.jp

Did Bayesian Projection (Stan/Ridge) Predict the 2021 NPB Last-to-First Upsets?

Background

Bayesian Model Corrections

Results

Team Projections

Yakult Swallows

Orix Buffaloes

Player Projections

Rookie starter breakout (Miyagi Hiromu)

First‑year foreign hitters

Established pitcher (Takahashi Keiji)

Established hitter (Sugimoto Yutaro)

Yakult pitchers (2021, IP ≥ 40)

Orix pitchers (2021, IP ≥ 40)

Model Limitations Summary

Conclusions

References

Related posts

Helios: Real real-time long video generation model

[Startup’s Story #524] “네가 나한테 꽤 무례했어” – 여섯 번째 창업으로 AI의 기억을 만드는 사람

The Agent Scope Creep Problem: Why AI Agents That Grow Without Limits Become Unreliable

Giving AI agents Ethereum wallets and the ability to sign transactions

Background

Bayesian Model Corrections

Results

Team Projections

Yakult Swallows

Orix Buffaloes

Player Projections

Rookie starter breakout (Miyagi Hiromu)

First‑year foreign hitters

Established pitcher (Takahashi Keiji)

Established hitter (Sugimoto Yutaro)

Yakult pitchers (2021, IP ≥ 40)

Orix pitchers (2021, IP ≥ 40)

Model Limitations Summary

Conclusions

References

Related posts

Helios: Real real-time long video generation model

[Startup’s Story #524] “네가 나한테 꽤 무례했어” – 여섯 번째 창업으로 AI의 기억을 만드는 사람

The Agent Scope Creep Problem: Why AI Agents That Grow Without Limits Become Unreliable

Giving AI agents Ethereum wallets and the ability to sign transactions

Rookie starter breakout (Miyagi Hiromu)

Established pitcher (Takahashi Keiji)

Established hitter (Sugimoto Yutaro)

Yakult pitchers (2021, IP ≥ 40)

Orix pitchers (2021, IP ≥ 40)