에이전트 시리즈 (3): 계획·해결 — 먼저 생각하고 행동하라

발행: 2주 전 (2026년 5월 24일 AM 10:20 GMT+9)

8 분 소요

출처: Dev.to

ReAct은 어디에서 한계에 부딪히는가?

이전 글에서는 ReAct의 탐욕적인 전략을 소개했습니다 — 각 단계는 현재 상태만을 보고 다음 행동을 결정합니다. 대부분의 경우 잘 작동하지만, 한 종류의 작업에서는 문제가 발생합니다.
예를 들어 에이전트에게 다음과 같은 일을 시키는 경우를 생각해 보세요:

Python, Java, Go의 출시 연도를 검색한다.
연도를 연대순으로 정렬한다.
Python과 Go 사이의 연도 차이를 계산한다.

전형적인 ReAct 실행 흐름은 다음과 같을 수 있습니다:

Action: web_search("Python release year")
Action: web_search("Java release year")
Action: web_search("Go release year")
Action: calculator("...")
(가끔 검색을 반복하거나 불필요한 단계를 추가하기도 함)

그 자체가 크게 나쁜 것은 아니지만, 숨은 문제가 있습니다: ReAct은 행동하기 전에 전역적인 계획이 없습니다. 작업에 몇 단계가 필요한지, 어느 단계가 어느 단계에 의존하는지, 전체 작업에서 현재 위치가 어디인지 모릅니다. 각 단계는 국부적으로는 최적이지만 전역적으로는 최적이 아닙니다.
의존 관계가 명확한 다단계 작업에 대해 이는 지도 없이 길을 찾는 것과 같습니다 — 결국 도착은 하지만 우회가 많아집니다.

Plan-and-Solve의 해결책: LLM에게 전체 행동 계획을 먼저 만들게 한 뒤, 단계별로 실행한다.
이 패러다임은 2023년 논문 Plan-and-Solve Prompting에서 제안되었습니다. 핵심 아이디어는 두 단계로 구성됩니다:

Plan 단계 — LLM에게 작업을 한눈에 분석하도록 요청하고, 순서가 정해진 단계 리스트를 출력하게 합니다. 이 단계에서는 도구를 호출하지 않으며 순수히 생각만 합니다.
Solve 단계 — 계획에 있는 각 단계를 차례대로 실행합니다. 각 단계에서 도구를 호출할 수 있으며, 이전 단계의 결과가 다음 단계의 컨텍스트에 주입됩니다.

생산 환경에서 필요한 내결함성 메커니즘을 추가하면 전체 아키텍처는 다음과 같습니다:

Task
 │
 ▼
[Plan Node]     ← LLM이 3~7 단계의 계획을 생성 (실행 없이, 단순 계획)
 │
 ▼
[Execute Node]  ← 현재 단계 실행 (내장된 ReAct, 도구 호출 가능)
 │
 ├─ Step failed? ─→ [Replan Node] ← 지금까지 진행 상황을 바탕으로 남은 단계 재계획
 │                      │
 │                      └──────────────┐
 │                                     ▼
 ├─ More steps? ─→ back to Execute    Execute (continue)
 │
 └─ All done? ─→ [Finalize Node] ← 최종 답변 출력
                       │
                       ▼
                      END

ReAct과의 핵심 차이점: ReAct은 열린 루프이며, Plan-and-Solve는 정해진 종료점을 가진 순차 구조입니다.
LangGraph는 이 아키텍처에 최적화된 도구입니다 — 에이전트를 상태 머신(StateGraph)으로 모델링하고, 상태가 노드 간에 흐르게 합니다.

from typing import TypedDict

class PlanSolveState(TypedDict):
    task: str                    # 원본 사용자 작업
    plan: list[str]              # 현재 계획 (단계 리스트)
    completed_steps: list[str]   # 결과 요약과 함께 완료된 단계
    current_step_index: int      # 현재 진행 중인 단계 (0부터 시작)
    step_result: str             # 현재 단계의 결과
    replan_count: int            # 재계획 횟수
    final_answer: str            # 최종 답변

상태는 그래프 전체의 혈류와 같습니다 — 모든 노드가 상태를 읽고 씁니다. 상태 설계를 잘하면 절반은 이미 승리한 셈입니다.

def plan_node(state: PlanSolveState) -> dict:
    messages = [
        SystemMessage(content=PLANNER_SYSTEM),  # 플래너 전문가 프롬프트
        HumanMessage(content=f"Task: {state['task']}"),
    ]
    response = llm.invoke(messages)
    plan = parse_plan(response.content)  # "1. xxx\n2. xxx" 형식 파싱

    return {
        "plan": plan,
        "current_step_index": 0,
        "completed_steps": [],
    }

플래너 시스템 프롬프트는 매우 중요합니다:

PLANNER_SYSTEM = """You are a task planning expert.
Rules:
1. Break the task into 3-7 independent steps
2. Each step must be concrete and actionable
3. Steps must have clear dependencies (later steps can use earlier results)
4. The final step should be "synthesize all information and deliver the answer"

Output format (only the step list, nothing else):
1. [step description]
2. [step description]
...
"""

def execute_node(state: PlanSolveState) -> dict:
    idx = state["current_step_index"]
    current_step = state["plan"][idx]

    # 실행 컨텍스트 구성 (완료된 단계들의 결과 포함)
    system_prompt = EXECUTOR_SYSTEM.format(
        completed_steps=format_completed_steps(state["completed_steps"]),
        current_step=current_step,
    )

    # 단일 단계를 실행하기 위해 ReAct 서브 에이전트 사용 (도구 필요 시 호출)
    sub_agent = create_react_agent(model=llm, tools=[calculator, web_search])
    result = sub_agent.invoke(
        {"messages": [
            SystemMessage(content=system_prompt),
            HumanMessage(content=f"Execute this step: {current_step}"),
        ]},
        config={"recursion_limit": 8},
    )

    step_result = result["messages"][-1].content
    new_completed = state["completed_steps"] + [
        f"{current_step} → {step_result[:100]}"
    ]

    return {
        "step_result": step_result,
        "completed_steps": new_completed,
        "current_step_index": idx + 1,
    }

여기서 중요한 설계 선택이 있습니다: Execute 노드에 ReAct 서브 에이전트를 내장한다는 점. Plan-and-Solve와 ReAct는 상호 배타적인 것이 아니라, Plan-and-Solve가 전역적인 구조를 제공하고, ReAct가 각 단계 내에서 도구 호출을 담당합니다.

MAX_REPLAN = 2

def should_continue(state) -> Literal["execute", "replan", "finalize"]:
    idx = state["current_step_index"]
    total = len(state["plan"])

    if idx >= total:
        return "finalize"  # 모든 단계 완료

    # 단계 실패 감지
    result = state.get("step_result", "")
    failed = any(kw in result for kw in ["Calculation error", "Search failed", "Error"])

    if failed and state["replan_count"] < MAX_REPLAN:
        return "replan"  # 실패했지만 재시도 예산 남음

    return "execute"  # 계속 진행

from langgraph.graph import END, START, StateGraph

graph = StateGraph(PlanSolveState)

graph.add_node("plan", plan_node)
graph.add_node("execute", execute_node)
graph.add_node("replan", replan_node)
graph.add_node("finalize", finalize_node)

graph.add_edge(START, "plan")
graph.add_edge("plan", "execute")
graph.add_conditional_edges(
    "execute",
    should_continue,
    {"execute": "execute", "replan": "replan", "finalize": "finalize"},
)
graph.add_conditional_edges(
    "replan", after_replan,
    {"execute": "execute", "finalize": "finalize"},
)
graph.add_edge("finalize", END)

agent = graph.compile()

전체 코드: agent-02-plan-and-solve/plan_and_solve_agent.py

예시 작업

Task: 중국, 미국, 인도의 인구를 검색하고, 총합과 중국의 비중을 계산한다.

플래너가 만든 계획:

최신 수치를 얻기 위해 “China population”, “US population”, “India population”을 검색한다.
중국, 미국, 인도의 인구 수치를 기록한다.
세 국가의 인구를 합산해 총합을 구한다.
중국 인구가 세 국가 총합에서 차지하는 비율을 계산한다.
모든 정보를 종합해 최종 답변을 제공한다.

실행 추적

[Step 1] web_search("China population") → 1.40489 billion
         web_search("US population")    → 341 million
         web_search("India population") → 1.451 billion

[Step 2] Record results (no tool call, model consolidates)
         → China 1.40489B, US 341M, India: no data available ← ⚠️

[Step 3] calculator("14048900000.0 + 3400000000.0")
         → 17448900000 ← ⚠️ India missing!

[Step 4] calculator("14.0489 / 17.4489 * 100")
         → 80.5145%

[Final answer] Three-country total: 1.74489B, China's share: 80

에이전트 시리즈 (3): 계획·해결 — 먼저 생각하고 행동하라

ReAct은 어디에서 한계에 부딪히는가?

예시 작업

실행 추적

관련 글

내 스킬

PREDICTION-20260525-0007: 비대칭 레버리지를 이용한 지루함 [2026-Q3 through 2027-Q3]

서버 없이 100개의 브라우저 기반 이미지 도구를 만든 방법 (FFmpeg WASM, PDF-lib, AI Background Removal)

Nginx CVE-2026-9256, AI 프롬프트 인젝션 방어, 그리고 Claude AI 데이터 유출 데모