What I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models

Published: 1 month ago (March 13, 2026 at 09:18 PM EDT)

4 min read

Source: Dev.to

Source: Dev.to

Introduction

As a practical testing ground for verifying reasoning optimization and model handling, I first touched an OSS shogi software in January 2026.

As a result, I reached rank 1 by playing over 200 games with a rating exceeding 4500 on Floodgate (an online shogi server for computer shogi). Since I started programming in December 2025, this was achieved in approximately two months after touching the OSS.

This article is not a how‑to guide on implementation, but rather discusses what was learned through shogi AI and how it can be applied to LLM research from the perspective of an LLM/RAG researcher.

Why Chess AI?

In LLM research, challenges such as reasoning optimization and model selection are common, but evaluation can be ambiguous—“Is the answer good?” often involves subjectivity.
Shogi AI provides clear wins/losses and ratings, allowing immediate numerical verification of strategy effectiveness.
Skill sets like CUDA/TensorRT builds and batch‑processing optimization are shared between LLM and shogi AI, making shogi AI an ideal experimental ground with a strict win/loss feedback loop.

Overall Architecture: 3‑Layer Hybrid

The constructed system follows a three‑layer architecture:

Phase 1 – Book (Opening Database)
- Immediate move via Python dictionary lookup.
- No C++ engine startup, zero GPU/CPU load.
Phase 2 – MCTS + DL Model
- Inference of a large 40‑block ResNet using TensorRT.
- Quantized to fit within RTX 5090’s 32 GB VRAM.
Phase 3 – α‑β + NNUE
- Fast position evaluation via CPU search.
- Handles endgame reading victories.

A Python wrapper manages phase switching and protocol communication, selecting engines based on position characteristics. This “winning with the entire architecture rather than a single model” mirrors the RAG system composition (search → ranking → generation multi‑stage pipeline).

OSS Modification: The Value of Cutting

I forked two OSS engines (DL and NNUE) and removed unnecessary features at the source level.

DL engine – removed multi‑GPU support, multiple backend branching, and various mate search routines to specialize for RTX 5090 × TensorRT. USI options were reduced from 63 to 43 (‑32 %).
NNUE engine – stripped test commands, book generation commands, and learning‑related code, shrinking the binary from 916 KB to 514 KB (44 % reduction).

This “cutting” work directly applies to LLM operations. Instead of adding functionality via LoRA or fine‑tuning to distilled models, reduce unnecessary branches and control behavior via prompts — a policy aligned with the article An Era Without LoRA or FT: How to Approach Distilled Models.

Real‑Time Book Rewriting: RAG‑Inspired Approach

Managed a database of ~7 million book positions on the Python side.
Accelerated book loading.
Implemented real‑time rewriting of the book during matches: after a loss, early‑game branching points are identified, and the book is modified to select different moves in the next match. The book continuously refines as matches accumulate.

This “updating the database from experience and reflecting it in subsequent reasoning” cycle mirrors the feedback loop in RAG, analogous to improving search result quality from dialogue logs.

LLM Utilization and Limitations

During development, I used Claude Opus as a coding partner. For niche specialized tools like dlshogi and YaneuraOu, LLM hallucinations frequently occur. Blindly trusting confidently generated code can lead to incorrect modifications that not only fail but also lower shogi strength.

Key lesson: LLM is translation, not reasoning. Use specialized engines (e.g., shogi search engines, domain‑specific logic) for calculations, and employ LLM for natural‑language translation of inputs/outputs. This aligns with RAG design principles: “Don’t give LLM knowledge, but generate based on facts obtained from external sources.”

Conclusion: Research is Cyclical

After organizing insights from two months of shogi AI development:

Additional learning on distilled models is ineffective or leads to overfitting → Prompt control is the correct approach.
Winning with the entire architecture, not a single model → RAG pipeline design philosophy.
Updating the database from experience and reflecting it in subsequent reasoning → RAG feedback loop.
LLM is translation, not reasoning → Domain logic should be handled by specialized engines.

This shogi AI experience has been fed back into LLM research, and LLM research insights have been applied to shogi AI architecture design. The cycle is the greatest value of venturing into different fields. I’m currently back to researching local LLMs (building systems using NVIDIA’s Nemotron models) and will participate again when the GPU is free. It was very enjoyable.

Hardware Used

GPU: NVIDIA RTX 5090 (32 GB GDDR7)
CPU: Intel Core Ultra 9 285K
RAM: 64 GB
OS: Linux (WSL2)

What I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models

Introduction

Why Chess AI?

Overall Architecture: 3‑Layer Hybrid

OSS Modification: The Value of Cutting

Real‑Time Book Rewriting: RAG‑Inspired Approach

LLM Utilization and Limitations

Conclusion: Research is Cyclical

Hardware Used

Related posts

Show HN: Open-source playground to red-team AI agents with exploits published

Learning athletic humanoid tennis skills from imperfect human motion data

Figuring out why AIs get flummoxed by some games

[Paper] ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning