LLMs + Tool Calls: Clever But Cursed
Source: Dev.to
Introduction
A real example of how LLMs creatively use tools — and why sandbox safety matters more than most people realize.
LLMs are great for generating code, but they can also get a bit too creative sometimes. Today I ran into one of those clever‑but‑cursed AI moments that’s too interesting not to share, especially for anyone building LLM + tool‑calling systems.
The Setup
I added a simple LuaExecutor tool to my Genkit app. The goal was straightforward:
- A single Genkit Flow
- One tool (LuaExecutor)
- A Lua VM (via
gopher-lua) to run Lua code from within a Go application
Intended purpose: Ask the LLM something like “generate a Lua script to do something and run it.”
The Prompt
I casually asked the model:
Generate a Go program that demonstrates context cancellation — and run it.
Note: I asked for a Go program, not Lua code.
What Happened
Instead of replying “I can’t run Go,” the model improvised:
- Generated the Go code.
- Embedded it inside a multiline Lua string.
- Executed it from Lua with
os.execute("go run main.go"). - Captured and returned the Go program’s output as if this were totally normal.
-- Example of the generated Lua wrapper (syntax highlighted for Lua)
local go_code = [[
package main
import (
"context"
"fmt"
"time"
)
func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go func() {
time.Sleep(2 * time.Second)
cancel()
}()
select {
-- (rest of the generated Go code)
]]
Cleverness. Building safe, resilient LLM apps/tools requires disciplined engineering practices, not just reliance on model “intelligence.”
Conclusion
This incident wasn’t a bug; it was the model being too smart. It perfectly demonstrates why sandboxing, explicit tool boundaries, and robust system prompts are essential. With greater power comes greater responsibility.