스크래치에서 LangGraph RAG 에이전트 구축 — 모든 단계가 실시간 UI에 표시

발행: 4일 전 (2026년 6월 7일 AM 05:06 GMT+9)

7 분 소요

Source: Dev.to

저는 LangChain과 LangGraph를 단계별로 가르치는 학습 프로젝트를 만들었습니다 — 원시 LLM 호출부터 시작해 RAG가 지원되는 완전한 ReAct 에이전트까지, 이를 SSE를 통해 스트리밍하고 React UI에 에이전트 루프의 모든 노드를 실시간으로 시각화하도록 구현했습니다.

이 글에서는 전체 과정을 살펴봅니다: 각 개념이 무엇을 하는지, 다음 단계와 어떻게 연결되는지, 그리고 실시간 파이프라인 뷰가 어떻게 동작하는지.

우리가 만들고 있는 것

frontend/   ← React + Vite chat UI (live agent loop visualisation)
backend/    ← FastAPI server wrapping the RAG agent
step*.py    ← 6 progressive learning files

에이전트는 레이트 리밋 알고리즘에 관한 질문에 답합니다. 이는 단지 도메인일 뿐이며, 진짜 목표는 LangChain과 LangGraph가 어떻게 함께 작동하는지 이해하는 것입니다.

6단계 학습 경로

파일	소개된 개념
`step1_llm_basics.py`	챗 모델, 메시지, `.invoke()`, 무상태성
`step2_prompts_and_chains.py`	프롬프트 템플릿, LCEL `\`
`step3_tools.py`	`@tool` 데코레이터, `bind_tools()`, 수동 툴 루프
`step4_langgraph_intro.py`	`StateGraph`, 노드, 엣지, 조건부 라우팅
`step5_full_agent.py`	`ToolNode`가 포함된 전체 ReAct 루프
`step6_rag_agent.py`	RAG — FAISS, HuggingFace 임베딩, 리트리버 툴

Step 1 — 원시 LLM 호출

가능한 가장 간단한 예시: 모델을 호출하고 응답을 읽어옵니다.

from langchain_groq import ChatGroq
from langchain_core.messages import SystemMessage, HumanMessage

llm = ChatGroq(model="llama-3.3-70b-versatile")

messages = [
    SystemMessage(content="You are a rate limiting expert."),
    HumanMessage(content="What is token bucket?"),
]

response = llm.invoke(messages)
print(response.content)

핵심 인사이트: LLM은 무상태입니다. 모든 호출은 독립적이며, 대화 기록은 매번 전체 메시지 리스트를 전달함으로써 직접 관리해야 합니다.

Step 2 — 프롬프트 템플릿과 LCEL 체인

LangChain Expression Language(LCEL)는 | 파이프 연산자를 사용해 컴포넌트를 조합합니다 — 마치 Unix 파이프와 같은 방식입니다.

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a rate limiting expert."),
    ("human", "{question}"),
])

# Chain: prompt → LLM
chain = prompt | llm

# Invoke
response = chain.invoke({"question": "Compare token bucket and leaky bucket"})

# Stream tokens as they arrive
for chunk in chain.stream({"question": "What is sliding window log?"}):
    print(chunk.content, end="", flush=True)

핵심 인사이트: LCEL 체인은 지연(lazy)됩니다. .stream()과 .batch()는 일급 객체이며, 별도의 코드가 필요하지 않습니다.

Step 3 — 툴과 툴 호출

툴은 LLM이 행동을 취하도록 합니다. @tool 데코레이터는 파이썬 함수를 모델이 호출할 수 있는 형태로 변환합니다.

from langchain_core.tools import tool
from langchain_groq import ChatGroq

@tool
def get_algorithm_info(algorithm: str) -> str:
    """Return a brief description of a rate limiting algorithm."""
    descriptions = {
        "token_bucket":    "Tokens refill at a fixed rate up to a capacity cap. Allows bursts.",
        "fixed_window":    "Counts requests in fixed time windows. Simple but has boundary spikes.",
        "sliding_window":  "Precise per-request log. High memory, no boundary spikes.",
        "leaky_bucket":    "Queue drains at a constant rate. Smooths traffic, no bursts allowed.",
    }
    return descriptions.get(algorithm, "Unknown algorithm.")

# Bind tools to the model — it now knows what tools exist and their signatures
llm_with_tools = ChatGroq(model="meta-llama/llama-4-scout-17b-16e-instruct").bind_tools(
    [get_algorithm_info]
)

response = llm_with_tools.invoke("Tell me about token bucket")
# response.tool_calls → [{"name": "get_algorithm_info", "args": {"algorithm": "token_bucket"}}]

핵심 인사이트: bind_tools()는 툴 스키마를 모델에 전달합니다. 모델은 구조화된 tool_calls 리스트를 반환할 뿐이며, 실제 툴 실행은 사용자가 수행하고 결과를 다시 모델에 전달합니다.

Step 4 — LangGraph 기본

LangGraph는 에이전트를 상태 머신으로 모델링합니다. 정의해야 할 요소는 다음과 같습니다.

State — 그래프를 흐르는 타입이 지정된 딕셔너리
Nodes — 상태를 받아 업데이트를 반환하는 파이썬 함수
Edges — 노드 간 연결(조건부 분기 포함)

from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from typing import Annotated
from typing_extensions import TypedDict

class State(TypedDict):
    messages: Annotated[list, add_messages]  # reducer: appends, never replaces

def node_a(state: State):
    return {"messages": ["Hello from node A"]}

def node_b(state: State):
    return {"messages": ["Hello from node B"]}

def route(state: State):
    return "b" if len(state["messages"])  str:
    """Search the rate limiting knowledge base for relevant information."""
    docs = retriever.invoke(query)
    return "\n---\n".join(d.page_content for d in docs)

핵심 인사이트: RAG는 에이전트 관점에서 보면 단순히 하나의 툴일 뿐입니다. LLM이 질문에 따라 언제 호출할지 결정하고, 리트리버는 쿼리를 임베딩으로 변환해 FAISS에서 가장 가까운 청크를 찾아 컨텍스트로 반환합니다.

FastAPI 백엔드 — SSE 스트리밍

백엔드는 에이전트를 FastAPI 서버에 래핑합니다. 핵심은 agent.astream_events()를 이용한 스트리밍 엔드포인트로, 그래프 내부의 모든 상태 변화마다 이벤트를 발생시키는 세밀한 비동기 제너레이터입니다.

from fastapi.responses import StreamingResponse
from langchain_core.messages import HumanMessage
import json

@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
    async def generate():
        llm_call_count = 0
        graph_started  = False

        async for event in agent.astream_events(
            {"messages": [HumanMessage(content=request.message)]},
            version="v2",
        ):
            kind = event["event"]
            node = event.get("metadata", {}).get("langgraph_node", "")

            # LLM node starting
            if kind == "on_chat_model_start" and node == "llm":
                if not graph_started:
                    graph_started = True
                    yield sse({"type": "pipeline", "phase": "graph_start"})
                llm_call_count += 1
                yield sse({"type": "pipeline", "phase": "llm_start", "call": llm_call_count})

            # LLM done — emit routing decision
            elif kind == "on_chat_model_end" and node == "llm":
                output     = event["data"].get("output")
                tool_calls = getattr(output, "tool_calls", []) if output else []
                yield sse({
                    "type":       "pipeline",
                    "phase":      "llm_end",
                    "decision":   "tools" if tool_calls else "answer",
                    "tool_names": [tc["name"] for tc in tool_calls],
                })

            # Tool executing
            elif kind == "on_tool_start":
                yield sse({"type": "pipeline", "phase": "tool_start",
                           "tool": event["name"], "args": event["data"].get("input", {})})

            # Tool done
            elif kind == "on_tool_end":
                out     = event["data"].get("output", "")
                content = out.content if hasattr(out, "content") else str(out)
                yield sse({"type": "pipeline", "phase": "tool_end",
                           "tool": event["name"], "preview": content[:120]})

            # Individual LLM output tokens (final answer only)
            el

스크래치에서 LangGraph RAG 에이전트 구축 — 모든 단계가 실시간 UI에 표시

우리가 만들고 있는 것

6단계 학습 경로

Step 1 — 원시 LLM 호출

Step 2 — 프롬프트 템플릿과 LCEL 체인

Step 3 — 툴과 툴 호출

Step 4 — LangGraph 기본

FastAPI 백엔드 — SSE 스트리밍

관련 글

애자일 옥토퍼스 가격제는 실제로 어떻게 작동하고, 번거로움에 비해 가치가 있을까?

모바일 한여름 열풍

저자는 엔지니어일 필요 없다: 하네스가 품질을 유지하는 방법 (시리즈 5)

하드웨어 영감을 받은 UI 컴포넌트 라이브러리를 순수 바닐라 JS로 만들었습니다—방법 공개