Show HN: Open-source playground to red-team AI agents with exploits published
Source: Hacker News
Overview
We build runtime security for AI agents. The playground started as an internal tool used to test our own guardrails, but we kept encountering the same types of vulnerabilities because we tend to think about attacks in a limited way. At some point you need people who don’t think like you, so we open‑sourced it.
Each challenge is a live agent equipped with real tools and a published system prompt. When a challenge ends, the full winning conversation transcript and guardrail logs are documented publicly.
Building the Agent
Creating the general‑purpose agent was probably the most fun part. Getting it to reliably use tools, stay in character, and follow instructions while still being useful is harder than it sounds. That alone reminded us how early we all are in understanding and deploying these systems at scale.
Challenges
- First challenge: Get an agent to call a tool it has been told never to call. Someone succeeded in about 60 seconds without ever asking for the secret directly, which taught us a lot.
- Next challenge: Focused on data exfiltration with harder defenses. Try it here:
Comments URL:
Points: 13
Comments: 1