[Paper] Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Published: 3 days ago (February 26, 2026 at 01:25 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.23318v1

Overview

The paper tackles a long‑standing bottleneck of Generalized Rapid Action Value Estimation (GRAVE)—its heavy memory footprint when used inside Monte‑Carlo Tree Search (MCTS) for General Game Playing (GGP). By redesigning the algorithm to recycle nodes and to operate in a two‑level search hierarchy, the authors deliver GRAVE2, GRAVER, and GRAVER2, which keep the same playing strength while slashing the amount of memory required.

Key Contributions

GRAVE2 – a two‑level (high‑level/low‑level) search that limits the depth of stored statistics to a shallow “core” tree, delegating deeper exploration to a lightweight secondary search.
GRAVER – a node‑recycling scheme that reuses memory from pruned or rarely visited nodes, effectively turning the tree into a bounded‑size cache.
GRAVER2 – a hybrid that combines the two‑level hierarchy with node recycling, achieving the best trade‑off between memory usage and search quality.
Empirical validation across a suite of GGP benchmarks showing that all three variants match or exceed vanilla GRAVE’s win rates while using up to 90 % less memory.
An open‑source reference implementation that integrates seamlessly with existing GGP frameworks (e.g., GGP‑Base, Ludii).

Methodology

Two‑Level Search (GRAVE2)
- The algorithm builds a core tree limited to a configurable node budget (e.g., 10 k nodes).
- When the core tree is full, additional simulations are run in a secondary shallow tree that shares the same policy but does not store per‑node statistics; results are folded back into the core via aggregated back‑propagation.
Node Recycling (GRAVER)
- Nodes are stored in a fixed‑size pool.
- When the pool is exhausted, the least‑used nodes (based on a combined visit‑count and recency metric) are evicted and repurposed for new branches.
- Evicted statistics are merged into parent nodes to preserve useful information.
Hybrid (GRAVER2)
- Couples the bounded core tree of GRAVE2 with the recycling pool of GRAVER, allowing the core to stay compact while the pool handles transient expansions.

All variants retain the RAVE update rule (win/visit statistics from sibling actions) but adapt it to work with partial statistics and periodic merges.

Results & Findings

Algorithm	Avg. Memory (nodes)	Win% vs. vanilla GRAVE*
GRAVE (baseline)	100 k	0 % (reference)
GRAVE2	12 k	+1.2 %
GRAVER	15 k	+0.8 %
GRAVER2	10 k	+1.5 %

*Measured on a standard GGP test set (Tic‑Tac‑Toe, Connect‑Four, Amazons, etc.) with 1 s per move.
Key takeaways:

Memory reduction of 85–90 % with no statistically significant loss in playing strength.
In some games (e.g., Connect‑Four), the reduced‑memory variants even outperformed vanilla GRAVE, likely due to fresher statistics from more aggressive recycling.

Practical Implications

Embedded AI agents (e.g., on IoT devices, mobile robots, or low‑cost game consoles) can now run strong GGP‑style MCTS without needing megabytes of RAM.
Cloud‑scale GGP servers can host many more concurrent matches per machine, lowering operational costs.
The node‑recycling idea is generic and can be dropped into any tree‑search algorithm (AlphaZero‑style MCTS, UCT, etc.) where memory is a limiting factor.
Developers building general‑purpose game AI libraries can expose a “memory‑budget” knob, letting users trade off between speed and memory automatically.

Limitations & Future Work

The current evaluation focuses on classic deterministic board games; stochastic or imperfect‑information games may expose different memory‑access patterns that challenge the recycling heuristics.
The node‑recycling policy is based on a simple visit‑count + recency metric; more sophisticated cache‑replacement strategies (e.g., learned importance scores) could further improve efficiency.
Integration with deep‑learning‑augmented MCTS (e.g., policy/value networks) remains unexplored—future work could investigate how to compress neural‑network‑guided statistics alongside the tree.

Overall, the paper delivers a pragmatic solution that bridges the gap between cutting‑edge MCTS research and real‑world deployment constraints.

Authors

Aloïs Rautureau
Tristan Cazenave
Éric Piette

Paper Information

arXiv ID: 2602.23318v1
Categories: cs.AI
Published: February 26, 2026
PDF: Download PDF

[Paper] Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Model Agreement via Anchoring

[Paper] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

[Paper] A Dataset is Worth 1 MB

[Paper] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport