Giving an AI agent a recon toolbox: wiring 30+ security tools into an MCP server

Published: 16 hours ago (May 3, 2026 at 10:56 PM EDT)

5 min read

Source: Dev.to

Cover image for Giving an AI agent a recon toolbox: wiring 30+ security tools into an MCP server

Introduction

If you’ve watched a junior pen‑tester spend a Monday morning typing the same six commands into a fresh EC2 box, you’ve seen the recon setup tax up close.

amass enum -passive -d $TARGET
subfinder -d $TARGET -silent | httpx | naabu
# feed surviving hosts into nuclei, dump JSON somewhere, repeat next quarter when the scope changes

The work isn’t hard. The glue is. Every team I’ve talked to has rebuilt this glue at least twice, usually in a different language each time.

This post is about a different shape of the problem: what happens when you stop writing the glue yourself and instead expose the recon toolbox as MCP tools that an AI agent can call?

Why MCP, specifically

Agents have been doing “tool use” for a couple of years now via bespoke function‑calling adapters. The problem with those adapters is that every agent framework wants its own JSON shape, every tool needs its own auth, and every team writes its own retry/timeout/rate‑limit middleware.

MCP (Model Context Protocol) collapses all of that into one server‑side contract. Once your tools are MCP tools, any compliant client — Claude Desktop, Cursor, your own LangGraph agent — can drive them.

For recon, the value is asymmetric. Recon is one of the rare security workflows that’s iterative and branching:

enumerate subdomains → resolve → port‑scan live hosts →
fingerprint services → run targeted vuln checks → pivot to new assets →
loop

That loop is exactly the shape an LLM is good at orchestrating, provided the tools return structured data and the agent can hold the inventory in state. You don’t want the LLM running nmap. You want it deciding when to run nmap and on what.

What we wrapped

In HailBytes ASM (full disclosure, this is our product — built specifically for pen‑test firms and MSSPs), the MCP server exposes the same surface as the REST API:

Discovery: start_subdomain_scan, start_port_scan, start_dns_scan
Vulnerability: start_nuclei_scan, start_template_scan
Inventory: list_assets, get_asset_history, diff_scans
Reporting: export_findings, get_scan_summary

Each tool returns JSON with stable schemas — not log scrapes — so the agent can plan multi‑step workflows without the model having to parse stderr.

A working loop

A real session looks like this (paraphrased from one of our internal eval runs):

User: "Map the external attack surface for example.com and flag anything that looks like an exposed staging environment."

Agent → start_subdomain_scan(domain="example.com")
Agent → list_assets(scan_id=...)          // 312 hosts
Agent → start_port_scan(targets=[...], top_ports=1000)
Agent → start_nuclei_scan(targets=live_hosts, severity=["medium","high"])
Agent → list_assets(filter="hostname matches /staging|stg|dev|qa/")
Agent → get_asset_history(asset_id=...)   // appeared 6 days ago
Agent → "Found 4 hosts matching staging‑like patterns; one (stg-admin.example.com) appeared 6 days ago and exposes a Jenkins instance with a known CVE..."

The interesting part is what the agent doesn’t do: it doesn’t shell out, doesn’t manage AWS credentials, doesn’t worry about rate limits, doesn’t re‑implement scan diffing. The MCP tools take care of all of that. The agent’s job is the part that’s actually hard — choosing the next action.

What broke (and what we changed)

A few honest notes from running this at customer sites:

Pagination kills agents. Our first cut returned all assets in a single response. With 30k+ subdomains in a real engagement, the agent’s context filled up before it got to the analysis step. We added cursor pagination and a summarize_assets tool that returns aggregates.
Implicit state is hostile. Agents are bad at remembering “the most recent scan.” Every tool that takes a scan_id now requires it explicitly, even if there’s only ever one running.
Long‑running scans need a status protocol. Recon scans take minutes to hours. We added wait_for_scan(scan_id, timeout) so the agent can block politely instead of polling in a tight loop.

Where this fits

If you’re already running recon in‑house, you don’t need to buy anything to try this pattern — wrap your own scripts in an MCP server and you’ll get ~70 % of the value. The harder parts are the things that show up at production scale: scan diffing, asset deduplication across runs, multi‑tenant isolation, scheduled cadence, audit trails for compliance. That’s the part we’ve spent the last year on.

If you want to see it end‑to‑end, the platform is at hailbytes.com/asm — deploys from the AWS or Azure Marketplace, runs in your account, and exposes the MCP endpoint out of the box.

Either way, I think MCP‑native security tooling is going to be the default within 18 months. The gap between “agent can read a Splunk dashboard” and “agent can drive a recon engagement” is closing fast, and the teams that wire their own toolbox up early are going to have a real edge.

Giving an AI agent a recon toolbox: wiring 30+ security tools into an MCP server

Introduction

Why MCP, specifically

What we wrapped

A working loop

What broke (and what we changed)

Where this fits

Related posts

The Folder Structure That Makes Client Handoffs Painless

Retrospective: Migrating from Nginx to Kong 3.0 Improved API Observability 40%

I Built a Real-Time Voice AI in 50 Minutes. Here's How (and Why)

'Why I stopped trusting npm audit (and built my own)'