Show HN: Open-source playground to red-team AI agents with exploits published

Published: (March 15, 2026 at 06:29 PM EDT)
1 min read
Source: Hacker News

Source: Hacker News

Overview

We build runtime security for AI agents. The playground started as an internal tool used to test our own guardrails, but we kept encountering the same types of vulnerabilities because we tend to think about attacks in a limited way. At some point you need people who don’t think like you, so we open‑sourced it.

Each challenge is a live agent equipped with real tools and a published system prompt. When a challenge ends, the full winning conversation transcript and guardrail logs are documented publicly.

Building the Agent

Creating the general‑purpose agent was probably the most fun part. Getting it to reliably use tools, stay in character, and follow instructions while still being useful is harder than it sounds. That alone reminded us how early we all are in understanding and deploying these systems at scale.

Challenges

  • First challenge: Get an agent to call a tool it has been told never to call. Someone succeeded in about 60 seconds without ever asking for the secret directly, which taught us a lot.
  • Next challenge: Focused on data exfiltration with harder defenses. Try it here:

Comments URL:
Points: 13
Comments: 1

0 views
Back to Blog

Related posts

Read more »

Agent Skills – Open Security Database

About the Index The Skills Security Index is a centralized repository providing security risk analysis for agentic AI skill definitions. As AI agents increasin...