에이전트 프레임워크는 필요 없다
Source: Towards Data Science
an LLM 애플리케이션.
잠시 생각해 보면 가장 먼저 떠오르는 생각이 바로: 강력한 에이전트를 만들어 보자! 하지만 곧이어 스스로에게 묻습니다. 어떤 에이전트 프레임워크를 사용해야 할까? CrewAI일까요? LangGraph일까요? Microsoft Agent Framework일까요? 아니면 다른 것이 좋을까요?
그다음에 문서 페이지를 몇 개 열고 예시를 비교하며 현재 문제 상황에 가장 적합한 프레임워크가 무엇인지 고민합니다. 몇 시간은 흐르고, 압도당하고 코드를 작성하기 시작하지도 않은 상황입니다.
하지만 잠깐 생각해 보자: realmente 여기서 에이전트 프레임워크가 필요할까요? 사실상 LLM 에이전트를 만들 필요가 있을까요?
지난 2년간 저는 다양한 분야에서 여러 LLM 애플리케이션을 만들었습니다. 배운 교훈은, 유용한 LLM 애플리케이션의 대부분에서는 진정한 자율 에이전트보다는 워크플로우가 실제로 효과적으로 작동한다는 것입니다.
워크플로를 구축하는 데 있어 실제로는 프레임워크가 전혀 필요하지 않을 수도 있습니다.
이 포스트에서는 평범한 Python, 로컬 함수, 구조화된 출력, 그리고 OpenAI Responses API(다른 LLM 제공자도 동일한 패턴을 적용할 수 있음)를 사용해 LLM 워크플로우를 프로토타입하는 방법을 보여드리겠습니다. 실습적으로 이상치 설명 문제를 해결해 보겠습니다.
제 목표는 프레임워크가 나쁘다고 주장하는 것이 아니라, 다양한 상황에서는 분명히 유용하다는 점을 보여드리는 것입니다. 또한 이 포스트 말미에 그 사용 사례를 논의하겠습니다. 제 목표는 많은 애플리케이션에서는 명확한 워크플로우가 실제로 필요한 첫 번째 추상화임을 보여주는 것입니다.
1. Try Workflows Before Jumping to Agents
An LLM agent usually refers to an autonomous system that can decide for itself what to do next.
You give it a goal and a set of tools. It will then plan, act, observe the result, and continue iterating. Instead of following a fixed path, the agent dynamically chooses the next step based on the current situation. For open‑ended problems, this is powerful.
Many real‑world problems are not that open‑ended.
In many cases, we do know roughly how the problem should be solved. But if we know that already, why ask the model to reinvent the wheel?
This is where workflows come in.
In an LLM workflow, we developers define the main steps, and the LLM is used inside selected steps as a reasoning engine. This is still an LLM‑powered application. The difference is that the LLM is not treated as the whole system. It is treated only as a decision node inside a larger process.
So, what are the benefits of adopting an LLM workflow? The following points are important in my opinion:
-
A workflow is transparent. You can easily examine, as each step has a well‑defined role and a clear input and output contract.
-
A workflow is modular, meaning that you can change one step without rewriting the entire application.
-
Most importantly, a workflow has deterministic control flow. The LLM’s reasoning and decisions can still vary inside a step, but the overall path is owned by code. That alone removes a lot of uncertainty when we are trying to build something reliable.
I know, this sounds less exciting than building a fully autonomous agent. But your goal is to deliver an effective solution. If boring stuff works, then we should happily use it.
2. What We Actually Need to Design
In my experience, there are four key ingredients:
- Control flow
- Role instructions (system prompts)
- Prompt builders
- Structured output
Let me unpack each of them.
2.1 Control flow
Control flow defines how the application moves from input to output.
A useful way to think about control flow is to view it as a graph. In this graph, we have nodes and edges:
-
Nodes: Each node represents one step in the application. It can be a deterministic processing step performed by the code (e.g., loading data, calling a local function). It can also be an LLM step, where an LLM is employed to make a decision, extract information, or write an explanation.
-
Edges: Each edge represents how information moves from one step to the next. Just like nodes, edges also have different types. For example, the edge can be static, i.e., always calling a predefined next step after processing the current step; or it can be conditional, for instance, if the LLM in the current steps says more evidence is needed, the edge links to a local tool node; if the LLM believes enough evidence has been gathered, the edge points to the final explanation.
A key thing: in a workflow, code owns this graph. LLMs are bound to specific nodes rather than running freely.
I’d start here as it forces you to first think through what the application actually has to do, which tasks should be handled by deterministic code, which go to LLM, where the workflow branches, and when it should stop.
2.2 Role Instructions
A workflow typically uses LLMs in several roles, and each role needs its own instruction (system prompts).
A role instruction defines how the LLM should behave inside one specific node. It typically tells the LLM its persona, what tasks it’s performing, what it should pay attend to, and what to avoid. Any domain-specific rules should also be specified here.
2.3 Prompt Builders
While role instructions define how an LLM should behave, prompt builders decide what the LLM can see.
Prompt builders assemble the context for an LLM call based on the objective of this LLM and the current workflow state.
In practice, it’s just a function: it takes in the dynamic values from the current workflow, optionally preprocesses them, and feeds them into a prompt template. The output is the final prompt payload sent to the LLM.
Prompt builders are where we control the context window. The context must be tailored and sufficient to support LLM’s reasoning.
2.4 Structured Output
In a workflow, LLM outputs are often intermediate results consumed by the next step. Letting LLM output free‑form texts is not a good idea here, as it makes the downstream parsing fragile.
A better approach would be to ask the LLMs to return a structured output that follows a predefined data schema, commonly represented as a JSON object or a Pydantic model.
That schema is the contract between LLM steps. When designing the schema, you need to think through what fields should exist, what types they should have, and what values are allowed. When the model output follows that schema, the next step can read the fields directly.
In fact, structured output is also what enables tool/function calling. Specifically, we give the schema one field for a function name and another for its arguments, and the next step can read both and let Python execute the corresponding local function.
Now that we have seen four core workflow ingredients, it’s time to see how they work in action.
3. Let’**s Build A Real Workflow
Here, we build a small data-quality investigation workflow using the Iris dataset (CC BY 4.0).
In practice, we often need to screen datasets and flag suspicious records before submitting to model training. But flagging alone is rarely enough; oftentimes, we also want to understand the why: is it a real data‑quality issue, and what evidence supports that assessment?
This is where LLMs can help.
3.1 Problem Setup
For this case study, we use the Iris dataset. Of course, Iris is a classification dataset with no labeled anomalies. But to make things more interesting, we deliberately perturb one sample: this sample has the versicolor label, but we changed its sepal measurements to unusual values. This gives us one clear feature‑level data-quality anomaly.
After that, we follow a normal screening workflow. First, we leverage simple logic to flag suspicious samples. Then, we use LLMs to diagnose the flagged sample and produce an assessment with evidence.
Our workflow has two LLM roles. The first one is an LLM investigator who determines what diagnostic evidence to gather next. This step involves calling tools that we have pre‑defined.
두 번째 역할은 LLM 평가자이며, 수집한 증거를 바탕으로 진단을 수행합니다.