Great news for xAI: Grok is now pretty good at answering questions about Baldur’s Gate

Published: 3 days ago (February 20, 2026 at 01:26 PM EST)

2 min read

Source: TechCrunch

Background

Different AI labs have different priorities. OpenAI has traditionally focused on consumer users, while its rival Anthropic tends to target enterprises. Elon Musk’s xAI, we discovered recently, has been placing particular emphasis on video‑game walkthroughs.

On Friday, Business Insider’s Grace Kay published a detailed report about xAI — the AI startup recently acquired by SpaceX — with particular emphasis on how Musk is making life difficult for employees. One anecdote stood out:

In one instance last year, a model release was delayed for several days because Musk was dissatisfied with how the chatbot answered detailed questions about the video game “Baldur’s Gate,” according to people familiar with the matter. High‑level engineers were pulled from other projects to improve the responses before launch.

The story raises a pressing question: Did Musk end up getting the gaming skills he wanted?

BaldurBench Test

Our resident RPG‑enthusiast Ram Iyer put together a set of five general questions about Baldur’s Gate and ran them against xAI’s Grok and three major competing models in a quasi‑benchmark we’ve called BaldurBench.

For full transparency, the chat transcripts are public:

Results

Grok

Provides fairly good information, though its answers are dense with gamer jargon (e.g., “save‑scumming” instead of saving, “DPS” instead of damage).
Answers are useful and well‑informed if you understand the terminology.
Frequently uses tables and theorycraft (see the Wikipedia entry on Theorycraft).

ChatGPT

Prefers bulleted lists and sentence fragments.
Draws from the same guide sources as the other models, resulting in similar factual content.

Gemini

Likes to bold important words for emphasis.
Stylistically distinct but otherwise comparable in substance.

Claude

Shows a strong reluctance to provide spoilers.
When asked about good party compositions, it closed with a generic encouragement: “don’t stress too much and just play what sounds fun to you.”

Overall, the biggest differences among the models are stylistic rather than informational. All models reference the same set of Baldur’s Gate guides, so the core advice is largely the same.

Takeaway

It’s worth noting, as reported by Business Insider, that xAI has specifically focused on reaching parity with other leading models in niche domains. Consequently, Grok’s performance—matching that of ChatGPT, Claude, and Gemini after a focused sprint—demonstrates that xAI can deliver competent results when it puts effort into a particular subject area.

Great news for xAI: Grok is now pretty good at answering questions about Baldur’s Gate

Background

BaldurBench Test

Results

Takeaway

Related posts

Does AI have a hero gene?

Anthropic accuses three Chinese AI labs of abusing Claude to improve their own models

4 AI Models (That aren’t Opus 4.6) on Our Minds This Week

Guide Labs debuts a new kind of interpretable LLM