How I built AI model that plays Whot! card game

Published: 1 month ago (December 21, 2025 at 05:13 PM EST)

6 min read

Source: Dev.to

Cover image for “How I built AI model that plays Whot! card game”

Problem Statement

The task is to build a practical AI model that can play the Whot! card game at a human level.
Whot! is a blend of skill and luck: skill if you know how to utilize your available cards effectively, and luck if you are fortunate enough to draw more win‑bound special cards than your opponent.

Introduction

As artificial intelligence keeps advancing, it seems there is virtually nothing that cannot be affected by AI.
We wish to build an AI model that can be deployed to the public, equipped with the capability of playing the Whot! card game. This vision has given birth to Whot! AI.

The project can also be used to demystify AI. You might think AI is an unreachable, monolithic thing, but you can build your own AI. If you study statistics a bit, then machine learning (a subset of AI) is similar to predicting the future using an available dataset and your knowledge of statistics. For example, if a price is $1000 under certain conditions, how will the price look given new conditions?

The Whot! AI

The new Whot! AI is a model that plays the Whot! card game. This model is trained using a machine‑learning approach known as Reinforcement Learning (RL).

Reinforcement learning exposes an AI agent to an environment and lets it learn by directly interacting with that environment.
It is an unsupervised learning method that does not require labeled data for training.

In typical supervised learning, we gather labeled data (i.e., we know the questions and answers about the data). For instance, we might ask:

Does the animal have small weight?
Does it have whiskers?
Does it have a long tail?
Is its face round or oval?
What is its weight in kilograms?

By answering these questions we can build a model that classifies an image of an animal as either a dog or a cat. This works well when we have enough data; otherwise, building a trustworthy AI model becomes very difficult.

For a rule‑based game like Whot!, gathering such data is practically impossible, and training an AI that follows the rules 100 % correctly using traditional supervised learning is infeasible.
If the call card is Circle 7 and the model predicts to play Star 7, the game rule is broken. Hence, supervised learning is not suitable for this application.

Reinforcement Learning and Whot! AI

This barrier can be broken using Reinforcement Learning! In 2016, AlphaGo, developed by Google DeepMind using RL, defeated world champion Lee Sedol. Reinforcement learning is applied not only in gaming but also in finance (e.g., fraud detection) and robotics.

One beautiful thing about RL is that we don’t need to tell the agent how to do something; we only tell it what to achieve, and it learns how to achieve it by itself.

How it works

Environment – The AI agent is placed in an environment (e.g., a Whot! game).
State – The current condition of the environment (e.g., the current call card, draw‑pile size, your hand size, opponent’s hand size).
Action – The “right” or most proper thing to do in a given state.

Example
If the call card is Pick Two and you have a Pick Two card in your hand, the two valid actions are:

Action	Description
1	Play a Pick Two card to defend.
2	Draw two cards from the draw pile.

These actions lead to a state‑value and state‑action value function. Our goal is to derive a policy, denoted π (Pi), that maps a given state to an action. Once we have such a policy, we can model an AI that plays Whot!.

Initially, the AI may not know the right action for any state, so it is allowed to take random valid actions. Over time, it will stop taking random actions and start selecting actions that maximize its rewards. This process builds a Q‑table (think of it as a database table) that the agent references for actions. For any state, it looks up the action with the maximum reward.

When the state space becomes large and continuous, a Q‑table becomes ineffective, so we adopt a Q‑network approach using deep learning. The Q‑network can generalize better than a static table.

Reward design

The core idea is to design a reward function that guides the agent’s learning journey:

Positive reward (e.g., +1) – given when the agent takes a good action, such as defending a Pick Two when it has a Pick Two card.
Negative reward (e.g., ‑1) – given when the agent takes a bad action, such as drawing two cards when it could have defended.

These rewards act like feedback: “Good boy! You are doing well!” for positive actions, and a gentle “nope” for negative ones.

Reinforcement Learning for Whot!

The agent receives rewards (positive) or punishments (negative) after each move. Its learning policy is updated using the Bellman Equation. The agent keeps playing, receiving feedback, and improving its strategy. This process can be repeated for 10 000 episodes or more.

For the stochastic Whot! game, the model was trained for 50 000 episodes. During training the agent learns to:

Avoid actions that give negative rewards.
Prefer actions that give positive rewards.

After the training episodes, the agent should be able to interact with the game environment optimally.

Example: Reinforcement Learning in Robotics

Consider training robots to follow a leader’s dance moves:

A human dancer (the leader) performs a step (e.g., raises hands).
The robot initially takes a random step.
If the robot’s step doesn’t match the leader’s, it receives a negative reward.
If the robot’s step matches the leader’s, it receives a positive reward.

Repeating this process thousands of times enables the robots to learn the correct dance movements.

The Traditional Rule‑Based Engine and RL

To train an AI model to play Whot! we need a set of game rules, which we obtain from a rule‑based engine.

The current Whot! game you’ve been playing is a rule‑based engine: the rules are explicitly programmed, and the computer follows them.
We use this engine as the opponent during training. The RL agent first plays against the rule‑based engine, then plays against itself, continuously gathering experience through the reward function.

Training results (50 000 episodes):

Outcome	Wins
First to discard all cards	28 315
Winning by counting	18 095
Total	≈ 46 410 (≈ 92 % win rate)

The AI agent outperformed the rule‑based engine that trained it, indicating that it has learned deeper patterns and strategies than the handcrafted opponent. The traditional engine is a strong opponent—it explicitly computes the best card to play for the longest streak (e.g., HoldOn, Suspension, General Market, PickTwo, Check up!). Surpassing such an opponent demonstrates the agent’s ability to generalize beyond the engineered rules.

Download and Test the Model

You can try the model by downloading the Android app:

Whot! AI on Google Play

The app includes:

The traditional rule‑based engine (classic computer opponent).
Multiplayer support via Bluetooth, Wi‑Fi, and online play.

We hope you enjoy Whot! in a modern way. Please play, rate, and review the app. Thank you!

For Developers

Training notebook:
Card image‑recognition notebook: