Show HN: Open-source playground to red-team AI agents with exploits published

Published: 1 month ago (March 15, 2026 at 06:29 PM EDT)

1 min read

Source: Hacker News

Overview

We build runtime security for AI agents. The playground started as an internal tool used to test our own guardrails, but we kept encountering the same types of vulnerabilities because we tend to think about attacks in a limited way. At some point you need people who don’t think like you, so we open‑sourced it.

Each challenge is a live agent equipped with real tools and a published system prompt. When a challenge ends, the full winning conversation transcript and guardrail logs are documented publicly.

Building the Agent

Creating the general‑purpose agent was probably the most fun part. Getting it to reliably use tools, stay in character, and follow instructions while still being useful is harder than it sounds. That alone reminded us how early we all are in understanding and deploying these systems at scale.

Challenges

First challenge: Get an agent to call a tool it has been told never to call. Someone succeeded in about 60 seconds without ever asking for the secret directly, which taught us a lot.
Next challenge: Focused on data exfiltration with harder defenses. Try it here:

Comments URL:
Points: 13
Comments: 1

Show HN: Open-source playground to red-team AI agents with exploits published

Overview

Building the Agent

Challenges

Related posts

Agent Skills – Open Security Database

What I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models

Nvidia is reportedly planning its own open source OpenClaw competitor

TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization