[Paper] Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems

Published: 3 days ago (April 22, 2026 at 01:19 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.20795v1

Overview

The paper proposes a hybrid AI architecture that couples large language models (LLMs) with an external, structured ontology stored as an RDF/OWL knowledge graph. By automatically building and continuously updating this graph from documents, APIs, and dialogue logs, the system gives LLMs a persistent, verifiable memory layer that boosts multi‑step reasoning, planning, and explainability.

Key Contributions

Automated ontology pipeline: end‑to‑end extraction (entity & relation detection, normalization, triple generation) from heterogeneous sources, followed by SHACL/OWL validation.
Hybrid inference engine: combines traditional vector‑based retrieval‑augmented generation (RAG) with graph‑based reasoning and tool use during LLM prompting.
Generation‑Verification‑Correction loop: outputs are checked against ontology constraints, enabling automatic correction or rejection of invalid results.
Empirical validation: demonstrates measurable gains on classic planning benchmarks (e.g., Tower of Hanoi) and on tasks requiring long‑term, structured knowledge.
Blueprint for real‑world agents: outlines how the architecture can be plugged into robotics, enterprise assistants, and autonomous software agents that need reliable, explainable decisions.

Methodology

Data Ingestion – The system pulls raw material from three channels:
- Unstructured text (PDFs, web pages)
- Structured API specifications (OpenAPI, GraphQL)
- Conversational logs (chat transcripts, voice‑assistant interactions)
Information Extraction – A fine‑tuned LLM (or a dedicated NER/RE model) tags entities and relations, then normalizes them to a shared schema (e.g., using CURIEs).
Triple Generation – Normalized entities and relations are emitted as RDF triples (subject – predicate – object).
Ontology Construction & Validation
- The triples are merged into an OWL ontology.
- SHACL shapes and OWL axioms enforce domain/range, cardinality, and logical constraints.
- Invalid triples are either rejected or sent back for re‑generation.
Hybrid Retrieval at Inference Time – When a user query arrives:
- A vector store returns top‑k relevant passages (RAG).
- A SPARQL engine fetches related graph sub‑structures.
- Both contexts are concatenated and fed to the LLM, which can also invoke external tools (e.g., planners, calculators).
Verification Loop – The LLM’s generated answer is parsed back into triples and re‑validated against the ontology. If violations are detected, the system either corrects the answer automatically or flags it for human review.

Results & Findings

Metric	Baseline LLM (RAG only)	Hybrid LLM + Ontology
Success rate on Tower of Hanoi (≤ 7 disks)	62 %	84 %
Average planning steps error	1.9 steps	0.6 steps
Ontology‑based validation pass rate	71 % (post‑hoc)	96 %
Latency increase (per query)	—	+ 120 ms (due to SPARQL lookup)

What it means: Adding a verified knowledge graph reduces hallucinations and improves the LLM’s ability to keep track of objects and constraints across many reasoning steps. The modest latency overhead is outweighed by the gain in reliability and explainability.

Practical Implications

Enterprise AI assistants can now reference a single source of truth (the ontology) for product catalogs, compliance rules, or internal processes, ensuring that generated advice never violates policy.
Robotics & automation: planners can query the graph for object affordances, safety constraints, or workspace layouts, enabling safer task execution without hard‑coding every rule.
Developer tooling: IDE plugins could auto‑populate a project’s knowledge graph from code, documentation, and issue trackers, letting LLM‑based code assistants reason about API contracts and dependency graphs.
Explainability & auditability: Every answer can be traced back to the specific triples that justified it, satisfying regulatory requirements in finance, healthcare, and legal tech.
Scalable long‑term memory: Unlike pure RAG, the graph persists across sessions, allowing agents to accumulate and refine knowledge over weeks or months without re‑training the LLM.

Limitations & Future Work

Ontology quality depends on extraction accuracy; noisy source data can still propagate errors despite SHACL checks.
The current pipeline assumes a relatively static schema; rapid schema evolution (e.g., micro‑service churn) may require more dynamic alignment mechanisms.
Scalability: SPARQL queries on very large graphs can become a bottleneck; the authors suggest incremental indexing and graph partitioning as next steps.
Generalization: Experiments focus on planning benchmarks; broader evaluation on open‑domain QA, code generation, or multimodal tasks remains open.

The authors plan to explore self‑supervised ontology refinement, tighter integration with LLM‑based tool use (e.g., function calling), and real‑world deployments in warehouse robotics and compliance‑heavy enterprise settings.

Authors

Pavel Salovskii
Iuliia Gorshkova

Paper Information

arXiv ID: 2604.20795v1
Categories: cs.AI
Published: April 22, 2026
PDF: Download PDF

[Paper] Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Seeing Fast and Slow: Learning the Flow of Time in Videos

[Paper] Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

[Paper] Fine-Tuning Regimes Define Distinct Continual Learning Problems

[Paper] The Sample Complexity of Multicalibration