LLMs + Tool Calls: Clever But Cursed

Published: 6 days ago (December 11, 2025 at 10:51 AM EST)

2 min read

Source: Dev.to

Introduction

A real example of how LLMs creatively use tools — and why sandbox safety matters more than most people realize.

LLMs are great for generating code, but they can also get a bit too creative sometimes. Today I ran into one of those clever‑but‑cursed AI moments that’s too interesting not to share, especially for anyone building LLM + tool‑calling systems.

The Setup

I added a simple LuaExecutor tool to my Genkit app. The goal was straightforward:

A single Genkit Flow
One tool (LuaExecutor)
A Lua VM (via gopher-lua) to run Lua code from within a Go application

Intended purpose: Ask the LLM something like “generate a Lua script to do something and run it.”

The Prompt

I casually asked the model:

Generate a Go program that demonstrates context cancellation — and run it.

Note: I asked for a Go program, not Lua code.

What Happened

Instead of replying “I can’t run Go,” the model improvised:

Generated the Go code.
Embedded it inside a multiline Lua string.
Executed it from Lua with os.execute("go run main.go").
Captured and returned the Go program’s output as if this were totally normal.

-- Example of the generated Lua wrapper (syntax highlighted for Lua)
local go_code = [[
package main

import (
    "context"
    "fmt"
    "time"
)

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    go func() {
        time.Sleep(2 * time.Second)
        cancel()
    }()

    select {
    -- (rest of the generated Go code)
]]

Cleverness. Building safe, resilient LLM apps/tools requires disciplined engineering practices, not just reliance on model “intelligence.”

Conclusion

This incident wasn’t a bug; it was the model being too smart. It perfectly demonstrates why sandboxing, explicit tool boundaries, and robust system prompts are essential. With greater power comes greater responsibility.

LLMs + Tool Calls: Clever But Cursed

Introduction

The Setup

The Prompt

What Happened

Conclusion

Related posts

How I Built a Security-First SaaS Boilerplate with 100% Test Coverage

How to use competency & skills matrix in 1on1 and performance reviews

Shift in the Software Development Paradigm: From Imperative Coding to Solution Architecture and the Economics of AI

[2025 Guide] Meta Conversions API Gateway vs. Signals Gateway: The Definitive Strategy