Great news for xAI: Grok is now pretty good at answering questions about Baldur’s Gate
Source: TechCrunch
Background
Different AI labs have different priorities. OpenAI has traditionally focused on consumer users, while its rival Anthropic tends to target enterprises. Elon Musk’s xAI, we discovered recently, has been placing particular emphasis on video‑game walkthroughs.
On Friday, Business Insider’s Grace Kay published a detailed report about xAI — the AI startup recently acquired by SpaceX — with particular emphasis on how Musk is making life difficult for employees. One anecdote stood out:
In one instance last year, a model release was delayed for several days because Musk was dissatisfied with how the chatbot answered detailed questions about the video game “Baldur’s Gate,” according to people familiar with the matter. High‑level engineers were pulled from other projects to improve the responses before launch.
The story raises a pressing question: Did Musk end up getting the gaming skills he wanted?
BaldurBench Test
Our resident RPG‑enthusiast Ram Iyer put together a set of five general questions about Baldur’s Gate and ran them against xAI’s Grok and three major competing models in a quasi‑benchmark we’ve called BaldurBench.
For full transparency, the chat transcripts are public:
Results
Grok
- Provides fairly good information, though its answers are dense with gamer jargon (e.g., “save‑scumming” instead of saving, “DPS” instead of damage).
- Answers are useful and well‑informed if you understand the terminology.
- Frequently uses tables and theorycraft (see the Wikipedia entry on Theorycraft).
ChatGPT
- Prefers bulleted lists and sentence fragments.
- Draws from the same guide sources as the other models, resulting in similar factual content.
Gemini
- Likes to bold important words for emphasis.
- Stylistically distinct but otherwise comparable in substance.
Claude
- Shows a strong reluctance to provide spoilers.
- When asked about good party compositions, it closed with a generic encouragement: “don’t stress too much and just play what sounds fun to you.”
Overall, the biggest differences among the models are stylistic rather than informational. All models reference the same set of Baldur’s Gate guides, so the core advice is largely the same.
Takeaway
It’s worth noting, as reported by Business Insider, that xAI has specifically focused on reaching parity with other leading models in niche domains. Consequently, Grok’s performance—matching that of ChatGPT, Claude, and Gemini after a focused sprint—demonstrates that xAI can deliver competent results when it puts effort into a particular subject area.