[Paper] El Agente Gráfico: Structured Execution Graphs for Scientific Agents

Published: 2 months ago (February 19, 2026 at 06:47 PM EST)

5 min read

Source: arXiv

Source: arXiv

Overview

The paper introduces El Agente Gráfico, a single‑agent framework that couples large language models (LLMs) with a type‑safe execution environment and a dynamic knowledge‑graph “memory”.

Typed scientific state – Represented as Python objects with explicit types.
Graph persistence – Objects are stored in a graph database, enabling structured, auditable identifiers instead of fragile free‑form text prompts.

The authors demonstrate that this approach can drive complex, multi‑step scientific workflows more reliably than existing multi‑agent, prompt‑centric pipelines. Example applications include:

Quantum‑chemistry benchmarking
Conformer generation
Metal‑organic framework (MOF) design

Key Contributions

Structured Execution Graphs – Introduces an object‑graph mapper that converts computational state into typed Python objects, stored either in‑memory or in an external knowledge graph.
Type‑Safe Context Management – Replaces raw textual context with symbolic, type‑checked identifiers, improving consistency and provenance tracking.
Single‑Agent Architecture – Demonstrates that a single LLM‑driven agent, when paired with a reliable execution engine, can replace fragile multi‑agent orchestration.
Benchmark Suite – Provides an automated benchmarking framework for university‑level quantum‑chemistry tasks, reproducing results previously obtained with a multi‑agent system.
Domain Extensions – Shows the paradigm applied to two additional scientific domains—conformer ensemble generation and MOF design—using the knowledge graph as both memory and reasoning substrate.
Open‑Source Prototype – Releases a reference implementation (Python library + Neo4j‑backed graph) to encourage community adoption and further research.

Methodology

Abstraction Layer – Scientific concepts (e.g., molecules, calculations, results) are defined as Python classes with explicit type annotations.
Object‑Graph Mapper (OGM) – Instances of these classes are automatically serialized into nodes/relationships in a graph database (Neo4j). The OGM maintains a bidirectional link between in‑memory objects and persisted graph entities.
LLM Decision Engine – An LLM (e.g., GPT‑4) receives a concise, typed prompt that references objects by their symbolic IDs (e.g., Molecule:123). The model decides which tool to invoke next (e.g., geometry optimization, TD‑DFT).
Typed Execution Engine – A thin Python wrapper validates the LLM’s suggested action against the object’s type signature, then dispatches the appropriate external tool (Gaussian, ORCA, RDKit, etc.).
Provenance Capture – Every tool invocation, input, and output is recorded as graph edges, enabling full audit trails and reproducible pipelines.
Evaluation – The authors built three pipelines—quantum‑chemistry benchmarking, conformer ensemble generation, and MOF design—each executing dozens of parallel jobs and comparing success rates, runtime, and reproducibility against a prior multi‑agent baseline.

Results & Findings

Domain	Success Rate (vs. baseline)	Avg. Runtime Reduction	Provenance Overhead
Quantum‑chemistry benchmark (≈30 tasks)	96 % (↑ 8 pts)	22 % faster	< 2 %
Conformer ensemble generation (100 mol.)	94 % (↑ 10 pts)	18 % faster	< 3 %
MOF design (20 candidate frameworks)	92 % (↑ 12 pts)	25 % faster	< 2 %

Robustness – The single‑agent system completed all pipelines without the deadlocks or context‑drift issues that plagued the multi‑agent version.
Scalability – Parallel execution of up to 12 concurrent jobs was handled cleanly, with the knowledge graph efficiently indexing intermediate results.
Auditability – Researchers could query the graph to retrieve the exact sequence of decisions, inputs, and tool versions that produced any result, facilitating reproducibility.

Practical Implications

Developer‑Friendly Automation
- Exposing a typed API (instead of raw prompt engineering) lets developers embed LLM‑driven decision logic directly into existing CI/CD pipelines for scientific software.
Tool Orchestration Platforms
- Cloud providers and workflow engines such as Airflow or Prefect can adopt the OGM pattern to give LLMs a reliable control plane for launching domain‑specific tools.
Regulatory & Auditing Needs
- Industries like pharmaceuticals, materials, and chemicals can satisfy compliance requirements because every computational step is recorded in a queryable graph.
Reduced Maintenance
- A single, well‑defined agent eliminates the need to synchronize multiple prompt‑tuned bots, lowering operational overhead and simplifying debugging.
Extensibility
- New scientific domains can be onboarded by defining additional typed classes and registering corresponding tool wrappers—no redesign of the core agent is required.

Limitations & Future Work

LLM Dependency: The system still relies on the quality of the underlying LLM; hallucinations in tool selection can propagate errors despite type checks.
Graph Overhead: While modest, persisting every intermediate object may become costly for extremely large datasets (e.g., high‑throughput screening of millions of compounds).
Tool Integration Scope: Current prototypes support a limited set of quantum‑chemistry packages; broader adoption will require wrappers for more diverse scientific software.
User‑Facing Interfaces: The paper focuses on backend orchestration; future work should explore UI/UX layers that let domain scientists interact with the knowledge graph without programming.
Distributed Execution: Scaling beyond a single node (e.g., across HPC clusters) and handling graph consistency in a distributed setting remain open challenges.

El Agente Gráfico showcases how marrying LLM reasoning with typed, graph‑backed state can turn fragile prompt‑centric bots into reliable scientific assistants—an approach that could reshape automation across computational research domains.

Authors

Alán Aspuru‑Guzik
Abdulrahman Aldossary
Jiaru Bai
Marcel Müller
Thomas Swanick
Yeonghun Kang
Zijian Zhang
Jin Won Lee
Tsz Wai Ko
Mohammad Ghazi Vakili
Varinia Bernales

Paper Information

Field	Details
arXiv ID	`2602.17902v1`
Categories	`cs.AI`, `cs.MA`, `cs.SE`, `physics.chem-ph`
Published	February 19, 2026
PDF	Download PDF

[Paper] El Agente Gráfico: Structured Execution Graphs for Scientific Agents

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

Why Most AI Agents Are Still Glorified Chatbots (And What Actually Works)

How We Handle 'Gray Area' Logic in Conversational Agents

Making Wolfram Tech Available as a Foundation Tool for LLM Systems

Apple releases videos from its 2025 AI Reasoning and Planning Workshop