如何防止 AI Agent Cron 作业静默无限循环

发布: 14小时前 (2026年3月5日 GMT+8 22:32)

4 分钟阅读

原文: Dev.to

Source: Dev.to

TL;DR

当 AI 代理执行循环的 cron 任务且没有明确的退出条件时，它可能会无限循环，浪费 token 和 API 调用。
解决方案包括：定义清晰的完成标准、设置迭代上限、添加幂等性检查，以及使用操作系统级别的超时。

Problem

一个自主的 AI 代理（OpenClaw）运行一个心跳任务来检查业务指标：

Revenue Watch
- 通过 RevenueCat API 检查 MRR
- 若检测到异常则在 Slack 上报警

在一次心跳会话中，代理调用 RevenueCat API 30+ 次，每次返回相同的结果（MRR $28，5 位订阅者）。代理不断循环：报告 → “下一步：检查 Revenue Watch” → 重复。

Root Cause

缺少完成标准 – “检查 X”却没有“检查后停止”
没有状态持久化 – 代理不会记得几秒前已经检查过
隐式继续 – 大多数框架会一直运行，直到任务列表为空

人类默认“检查一次后继续”，而代理会字面执行指令。

Solution

添加显式退出条件

## Revenue Watch
- Check MRR via RevenueCat API
- **Exit:** After one successful API response, log the result and stop
- Only alert Slack if values changed from last check

限制循环迭代次数

## Revenue Watch
max_iterations: 1
# Fetch MRR → log → done

框架层面的安全网（LangChain 示例）

# LangChain example
agent = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=5,
    early_stopping_method="force"
)

即使任务的完成标准有缺陷，代理也会在 max_iterations 后停止。

实现幂等性（缓存）

## Revenue Watch
- Previous result: {cached_result}
- If current == previous: report "no change" and exit immediately
- If changed: report delta and update cache

这可以防止冗余的 API 调用，使任务自然收敛。

操作系统级别的超时（最后防线）

# Force‑kill after 60 seconds
timeout 60 openclaw heartbeat run

如果其他手段都失效，操作系统会终止进程。

Metrics (Before vs. After)

Metric	Before	After
API calls per execution	30+	1
Token consumption	10× normal	1× normal
Execution time	Minutes (looping)	< 30 seconds
Accuracy	Repeated same data	Reports only on changes

Lessons Learned

始终定义任务何时结束。 代理不会自行推断“一次就够了”。
max_iterations 不容妥协。 它为有缺陷的退出条件提供安全网。
幂等性消除浪费。 当输入未变化时跳过重新执行。
操作系统级别的超时是最后的手段。 它捕获所有残余的失控循环。

Takeaway

在为 AI 代理设计 cron 任务时，要问 “它何时应该停止？” 而不仅仅是 “它应该做什么？” 没有明确的停止条件，代理会一直无限调用同一 API。