[Paper] Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Published: (February 26, 2026 at 01:25 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.23318v1

Overview

The paper tackles a long‑standing bottleneck of Generalized Rapid Action Value Estimation (GRAVE)—its heavy memory footprint when used inside Monte‑Carlo Tree Search (MCTS) for General Game Playing (GGP). By redesigning the algorithm to recycle nodes and to operate in a two‑level search hierarchy, the authors deliver GRAVE2, GRAVER, and GRAVER2, which keep the same playing strength while slashing the amount of memory required.

Key Contributions

  • GRAVE2 – a two‑level (high‑level/low‑level) search that limits the depth of stored statistics to a shallow “core” tree, delegating deeper exploration to a lightweight secondary search.
  • GRAVER – a node‑recycling scheme that reuses memory from pruned or rarely visited nodes, effectively turning the tree into a bounded‑size cache.
  • GRAVER2 – a hybrid that combines the two‑level hierarchy with node recycling, achieving the best trade‑off between memory usage and search quality.
  • Empirical validation across a suite of GGP benchmarks showing that all three variants match or exceed vanilla GRAVE’s win rates while using up to 90 % less memory.
  • An open‑source reference implementation that integrates seamlessly with existing GGP frameworks (e.g., GGP‑Base, Ludii).

Methodology

  1. Two‑Level Search (GRAVE2)

    • The algorithm builds a core tree limited to a configurable node budget (e.g., 10 k nodes).
    • When the core tree is full, additional simulations are run in a secondary shallow tree that shares the same policy but does not store per‑node statistics; results are folded back into the core via aggregated back‑propagation.
  2. Node Recycling (GRAVER)

    • Nodes are stored in a fixed‑size pool.
    • When the pool is exhausted, the least‑used nodes (based on a combined visit‑count and recency metric) are evicted and repurposed for new branches.
    • Evicted statistics are merged into parent nodes to preserve useful information.
  3. Hybrid (GRAVER2)

    • Couples the bounded core tree of GRAVE2 with the recycling pool of GRAVER, allowing the core to stay compact while the pool handles transient expansions.

All variants retain the RAVE update rule (win/visit statistics from sibling actions) but adapt it to work with partial statistics and periodic merges.

Results & Findings

AlgorithmAvg. Memory (nodes)Win% vs. vanilla GRAVE*
GRAVE (baseline)100 k0 % (reference)
GRAVE212 k+1.2 %
GRAVER15 k+0.8 %
GRAVER210 k+1.5 %

*Measured on a standard GGP test set (Tic‑Tac‑Toe, Connect‑Four, Amazons, etc.) with 1 s per move.
Key takeaways:

  • Memory reduction of 85–90 % with no statistically significant loss in playing strength.
  • In some games (e.g., Connect‑Four), the reduced‑memory variants even outperformed vanilla GRAVE, likely due to fresher statistics from more aggressive recycling.

Practical Implications

  • Embedded AI agents (e.g., on IoT devices, mobile robots, or low‑cost game consoles) can now run strong GGP‑style MCTS without needing megabytes of RAM.
  • Cloud‑scale GGP servers can host many more concurrent matches per machine, lowering operational costs.
  • The node‑recycling idea is generic and can be dropped into any tree‑search algorithm (AlphaZero‑style MCTS, UCT, etc.) where memory is a limiting factor.
  • Developers building general‑purpose game AI libraries can expose a “memory‑budget” knob, letting users trade off between speed and memory automatically.

Limitations & Future Work

  • The current evaluation focuses on classic deterministic board games; stochastic or imperfect‑information games may expose different memory‑access patterns that challenge the recycling heuristics.
  • The node‑recycling policy is based on a simple visit‑count + recency metric; more sophisticated cache‑replacement strategies (e.g., learned importance scores) could further improve efficiency.
  • Integration with deep‑learning‑augmented MCTS (e.g., policy/value networks) remains unexplored—future work could investigate how to compress neural‑network‑guided statistics alongside the tree.

Overall, the paper delivers a pragmatic solution that bridges the gap between cutting‑edge MCTS research and real‑world deployment constraints.

Authors

  • Aloïs Rautureau
  • Tristan Cazenave
  • Éric Piette

Paper Information

  • arXiv ID: 2602.23318v1
  • Categories: cs.AI
  • Published: February 26, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Model Agreement via Anchoring

Numerous lines of aim to control model disagreement -- the extent to which two machine learning models disagree in their predictions. We adopt a simple and stan...

[Paper] A Dataset is Worth 1 MB

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on divers...