LangChain Expression Language (LCEL) 마스터하기: 분기, 병렬성, 스트리밍

발행: 2시간 전 (2026년 1월 16일 오후 10:00 GMT+9)

5 min read

Source: Dev.to

Introduction

AI 애플리케이션을 구축할 때는 종종 “접착 코드”를 작성하는 느낌이 듭니다—프롬프트, LLM, 출력 파서 사이의 데이터 흐름을 관리하기 위한 끝없는 if/else 문과 루프 말이죠.
LangChain Expression Language (LCEL)는 선언적이고 조합 가능한 방식으로 체인을 구축할 수 있게 해줍니다—AI를 위한 Unix 파이프(|)를 떠올리면 됩니다.

이번 데모에서는 LangChain, Ollama, 그리고 Gemma 모델을 사용해 LCEL의 세 가지 고급 기능을 보여줍니다:

라우팅 (동적 분기)
병렬 실행 (팬‑아웃 검색)
스트리밍 미들웨어 (실시간 토큰 정제)

Routing with `RunnableBranch`

단일 챗봇을 가지고 있지만 사용자의 의도(예: 코드 vs. 데이터)에 따라 다르게 동작하도록 하고 싶을 때, 명령형 if 문 대신 라우터 체인을 구축합니다.

Classify Intent

# A chain that outputs "code", "data", or "general"
classifier_chain = classifier_prompt | llm | parser

Branch to the appropriate sub‑chain

from langchain.schema.runnable import RunnableBranch

routing_chain = RunnableBranch(
    (lambda x: x["intent"] == "code", code_chain),
    (lambda x: x["intent"] == "data", data_chain),
    general_chain,                     # fallback
)

Example

python main.py routing --query "Write a binary search in Python"

Output

[Router] Detected 'code'

def binary_search(arr, target):
    # ... concise, professional code output ...

시스템이 자동으로 의도를 감지하고 “Senior Engineer” 페르소나로 전환했습니다.

Parallel Retrieval with `RunnableParallel`

질문에 여러 개별 소스(내부 위키, API 문서, 일반 노트)에서 정보를 가져와야 할 경우, 순차적으로 조회하면 느립니다. RunnableParallel은 여러 검색기를 동시에 실행합니다.

Define Parallel Retrievers

from langchain.schema.runnable import RunnableParallel

parallel_retrievers = RunnableParallel({
    "lc_docs": retriever_langchain,
    "ollama_docs": retriever_ollama,
    "misc_docs": retriever_misc,
})

Example

python main.py parallel_rag --query "What is LCEL?"

Output

“Merger” 단계가 세 데이터베이스의 결과를 즉시 받아 결합하고, LLM이 전체 컨텍스트를 활용해 답변합니다.

Streaming Middleware for Real‑Time Sanitization

LLM의 응답을 토큰 단위로 스트리밍해 사용자에게 전달하면서, 화면에 표시되기 전에 민감한 정보(예: PII)를 가로채야 할 때가 있습니다. 표준 .astream() 이터레이터를 비동기 제너레이터로 감싸면 토큰을 실시간으로 버퍼링, 정제, 로깅할 수 있는 미들웨어 레이어를 만들 수 있습니다.

Middleware Implementation

async def middleware_stream(iterable):
    buffer = ""
    async for chunk in iterable:
        buffer += chunk
        # Simple example: redact any token containing '@'
        if "@" in buffer:
            yield "[REDACTED_EMAIL]"
            buffer = ""
        else:
            yield buffer

Note: A production implementation would use smarter buffering to handle split tokens.

Example

python main.py stream_middleware --query "My email is test@example.com"

Output

LLM이 실제 이메일을 생성했음에도 미들웨어가 즉시 이를 포착해 사용자가 보기 전에 교체했습니다.

Takeaways

LCEL은 단순한 문법 설탕이 아니라 복잡하고 프로덕션 수준의 AI 흐름을 구축하기 위한 강력한 프레임워크입니다:

Dynamic Logic – LLM이 판단한 의도에 기반한 라우팅
Performance – 여러 지식 베이스를 병렬로 검색
Safety – 실시간 콘텐츠 검열을 위한 스트리밍 미들웨어

이 모든 것을 Ollama와 함께 로컬에서 실행되는 표준, 조합 가능한 컴포넌트만으로 구현할 수 있습니다.

GitHub repository:

LangChain Expression Language (LCEL) 마스터하기: 분기, 병렬성, 스트리밍

Introduction

Routing with `RunnableBranch`

Classify Intent

Branch to the appropriate sub‑chain

Example

Parallel Retrieval with `RunnableParallel`

Define Parallel Retrievers

Example

Streaming Middleware for Real‑Time Sanitization

Middleware Implementation

Example

Takeaways

관련 글

Linus Torvalds는 'Vibe Coding'이다.

Steve Yegge의 Gas Town에서 배운 것 — 그리고 솔로 개발자를 위한 작은 도구

왜 클라우드 인프라가 이벤트 기반인가?

‘내 컴퓨터에서는 작동한다’는 말에 작별: Docker를 사용한 Spring Boot와 PostgreSQL

Introduction

Routing with RunnableBranch

Classify Intent

Branch to the appropriate sub‑chain

Example

Parallel Retrieval with RunnableParallel

Define Parallel Retrievers

Example

Streaming Middleware for Real‑Time Sanitization

Middleware Implementation

Example

Takeaways

관련 글

Linus Torvalds는 'Vibe Coding'이다.

Steve Yegge의 Gas Town에서 배운 것 — 그리고 솔로 개발자를 위한 작은 도구

왜 클라우드 인프라가 이벤트 기반인가?

‘내 컴퓨터에서는 작동한다’는 말에 작별: Docker를 사용한 Spring Boot와 PostgreSQL

Routing with `RunnableBranch`

Parallel Retrieval with `RunnableParallel`