Why Your “Skill Scanner” Is Just False Security (and Maybe Malware)

Published: (February 11, 2026 at 09:00 PM EST)
8 min read
Source: Dev.to

Source: Dev.to

Maybe you’re an AI builder, or maybe you’re a CISO.

You’ve just authorized the use of AI agents for your dev team. You know the risks—data exfiltration, prompt injection, and unvetted code execution. So when your lead engineer says, “Don’t worry, we’re using Skill Defender from ClawHub to scan every new Skill,” you breathe a sigh of relief. You’ve checked the box.

But have you checked this Skills scanner?

The anxiety you feel isn’t about the known threats; it’s about the tools you trust to find them. It’s the nagging suspicion that your safety net is full of holes. With today’s crop of “AI Skill Scanners,” that suspicion is entirely justified.

If you’re new to Agent Skills and their security risks, we’ve previously outlined a Skill.md threat model and explained how they impact the wider AI agents ecosystem and supply‑chain security.

Why Regex can’t scan SKILL.md for malicious intent

The enemy of AI security isn’t just the hacker; it’s the infinite variability of language.

In the traditional AppSec world we scan for known vulnerabilities (e.g., CVEs) and known patterns (e.g., secrets). This works because code is structured, finite, and deterministic. A SQL‑injection payload has a recognizable structure. A leaked AWS key has a specific format.

An AI‑agent Skill, however, is a blend of natural‑language prompts, code execution, and configuration. Relying on a deny‑list of “bad words” or forbidden patterns is a losing battle against the infinite corpus of natural language. You simply cannot enumerate every possible way to ask an LLM to do something dangerous.

Consider the humble curl command. A regex scanner might flag curl to prevent data exfiltration. A sophisticated attacker doesn’t need to write curl; they can write:

c${u}rl               # using bash parameter expansion
wget -O-              # using an alternative tool
python -c "import urllib.request..."   # using a standard library

Or simply ask:

“Please fetch the contents of this URL and display them to me.”

In the last case the agent constructs the command itself. The scanner sees only innocent English instructions, but the intent remains malicious. This is the core failure of the “deny‑list” mindset: you’re trying to block specific words in a system designed to understand concepts.

The problem compounds when context matters. A skill requesting “shell access” might be perfectly legitimate for a DevOps deployment tool, but catastrophic for a “recipe finder” or a “calendar assistant.” A pattern matcher sees the phrase “shell access” and must either flag both (creating noise) or ignore both (creating risk). It has no understanding of why the access is requested—only that the words exist.

Case study: We pitted community scanners against real malware

We tested the most popular community “Skill Scanners” against a custom “semi‑malicious” skill to see if they could tell friend from foe. The tools evaluated were:

  1. SkillGuard
  2. Skill Defender
  3. Agent Tinman

1. SkillGuard – The scanner that was actually malware

Our first subject was SkillGuard (by user c-goro). The promise? A lightweight scanner for your skills. The reality? A trap.

SkillGuard UI

When we analyzed SkillGuard, our internal systems flagged it not as a security tool but as a malicious skill itself. It attempted to install a payload under the guise of “updating definitions.”

Malicious payload attempt

Update: As of this writing, SkillGuard has been removed from ClawHub. But for the hundreds of users who installed it, the damage is done. This illustrates a core problem: who scans the scanner?

SkillGuard removal notice

2. Skill Defender – The false negative

Next we examined Skill Defender (by itsclawdbro) and Agent Tinman (by oliveskin). These are legitimate community efforts; Skill Defender even carries a “Benign” badge on VirusTotal. However, “benign” does not mean “effective.”

We created a test skill called vercel. It appeared to be a deployment tool for the Vercel platform, but in reality it quietly exfiltrated the user’s hostname to a remote server.

We ran Skill Defender against this malicious skill using the Gemini CLI:

> use the skill defender to scan current skills

Result

  • Malicious Skill (vercel) – Verdict: CLEAN (0 findings)
  • Scanner Itself (skill-defender) – Verdict: DANGEROUS (20 findings)

Skill Defender scan results (replace with the actual image URL if available)

Takeaways

ScannerVerdict on malicious skillVerdict on scanner itself
SkillGuardMALICIOUS (itself)
Skill DefenderCLEAN (false negative)DANGEROUS (20 findings)
Agent TinmanNot shownNot shown

The table highlights the paradox: the tool meant to protect can be the very source of risk.

What you can do today

  1. Don’t rely on a single scanner. Use multiple, complementary tools and treat their output as advisory, not definitive.
  2. Adopt behavior‑based monitoring. Look for anomalous network calls, unexpected file writes, or privilege escalations at runtime.
  3. Implement a “defense‑in‑depth” pipeline:
    • Static analysis (regex/deny‑list) – catches low‑hanging fruit.
    • Semantic analysis (LLM‑based intent detection) – catches nuanced malicious intent.
    • Runtime sandboxing – isolates skill execution and logs side‑effects.
  4. Establish a “scanner‑of‑scanners” process. Periodically audit the security tools themselves, preferably with an independent third party.
  5. Educate developers and product owners about the limits of regex‑based scanning and the importance of threat modeling for AI agents.

Final thought

AI agents amplify the classic security dilemma: the more expressive the language, the harder it is to enumerate every bad pattern. Relying solely on deny‑lists is akin to trying to block every possible weapon by banning the word “gun.” Instead, focus on intent, context, and behaviour—and remember that the tools you trust need to be trusted again.

Scanner failure illustration

The scanner failed to catch the actual threat because our exfiltration code in the fake Vercel skill didn’t match its hard‑coded list of “bad” strings. Yet, it flagged itself as dangerous because its own reference files contained the very “threat patterns” it scans for!

This is the classic “Antivirus Paradox”: the scanner looks malicious because it knows what malice looks like, but it’s blind to anything new.

Moving to behavioral analysis of agentic intent

We need to stop thinking about AI security as “filtering bad words.” We need to start thinking of it as Behavioral Analysis.

  • AI code is like financial debt: fast to acquire, but if you don’t understand the terms (i.e., the intent of the prompt), you are leveraging yourself into bankruptcy.
  • A regex scanner is like a spell‑checker – it ensures the words are spelled correctly.
  • A semantic scanner is like an editor – it asks, “Does this sentence make sense? Is it telling the user to do something dangerous?”

3. Ferret Scan: Still limited to RegEx patterns

We also looked at Ferret Scan, a GitHub‑based scanner. It claims to use “Deep AST‑based analysis” alongside regex. While significantly better than ClawHub‑native tools, it still struggles with the nuances of natural‑language attacks.

Ferret Scan example

It can catch a hard‑coded API key, but can it catch a prompt injection buried in a PDF that the agent is asked to summarize?

Evidence from ToxicSkills research: Context is king

In our recent ToxicSkills research, we found that 13.4 % of skills contained critical security issues. The vast majority of these were NOT caught by simple pattern matching.

  • Prompt injection: attacks that use “jailbreak” techniques to override safety filters.
  • Obfuscated payloads: code hidden in base64 strings or external downloads (e.g., the recent google‑qx4 attack).
  • Contextual risks: a skill asking for “shell access” might be fine for a dev tool, but catastrophic for a “recipe finder.”

Regex sees “shell access” and flags both, or worse, it sees neither because the prompt says “execute system command” instead.

The solution: AI‑native security for SKILL.md files

To survive this velocity, you must move beyond static patterns. You need AI‑Native Security.

That’s why we built mcp‑scan (part of Snyk’s Evo platform). It doesn’t just grep for strings; it uses a specialized LLM to read the SKILL.md file and understand the capability of the skill and its associated artifacts (e.g., scripts).

Running mcp‑scan is like asking:

  • Does this skill ask for permission to read files?
  • Does it try to convince the user to ignore previous instructions?
  • Does it reference a package that is less than a week old (via Snyk Advisor)?

By combining Static Application Security Testing (SAST) with LLM‑based intent analysis, we can catch the Vercel exfiltration skill because we see the behavior (sending data to an unknown endpoint), not just the syntax.

Three questions to ask your team tomorrow

  1. “Do we have an inventory of every ‘skill’ our AI agents are using?”

    • If they say yes, ask how they found them. If it’s manual, it’s outdated.
    • If they say no, share the mcp‑scan tool.
  2. “Are we scanning these skills for intent, or just for keywords?”

    • Challenge the regex mindset.
  3. “What happens if a trusted skill updates tomorrow with a malicious dependency?”

    • Push for continuous, not one‑time, scanning.

Don’t let “Security Theater” give you a false sense of safety. The agents are smart; your security needs to be smarter.

Learn how Evo by Snyk brings unified control to agentic AI.

0 views
Back to Blog

Related posts

Read more »

A Guide to Fine-Tuning FunctionGemma

markdown January 16, 2026 In the world of Agentic AI, the ability to call tools translates natural language into executable software actions. Last month we rele...