[Paper] SMART SLM: Structured Memory and Reasoning Transformer, A Small Language Model for Accurate Document Assistance

Published: (December 24, 2025 at 11:59 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.21280v1

Overview

The SMART SLM (Structured Memory and Reasoning Transformer) tackles a common pain point for engineers: extracting accurate, numeric information from massive, densely formatted engineering manuals. By turning the raw text into a hierarchy of structured facts and pairing it with a lightweight memory‑augmented transformer, SMART delivers higher accuracy than larger models like GPT‑2 while using far fewer parameters.

Key Contributions

  • Hierarchical fact extraction via a syntax‑aware Tree‑LSTM (“Grammarian”) that converts sentences into subject‑relation‑object triples.
  • Compact indexed memory (a 384‑dimensional vector store) that links each fact to its source location, enabling fast look‑ups.
  • Six‑layer transformer decoder that fuses retrieved facts to generate context‑aware answers.
  • Dual‑mode inference:
    1. Fast‑path for known, pre‑indexed manuals (sub‑second latency).
    2. Dynamic‑path for newly uploaded documents using a RAG‑style FAISS top‑20 retrieval with a 64‑slot memory buffer.
  • Parameter efficiency: 45.5 M parameters (≈ 64 % fewer than GPT‑2) with a 21.3 % boost in accuracy on engineering‑manual QA tasks.

Methodology

  1. Fact Extraction (Grammarian)

    • Each sentence from an engineering manual is parsed by a Tree‑LSTM that respects the grammatical tree.
    • The model outputs subject‑relation‑object (SRO) triples, e.g., (Pump, operates‑at, 150 psi).
  2. Structured Memory Indexing

    • Every SRO triple is embedded into a 384‑dimensional vector.
    • Vectors are stored in a Memory‑Augmented Neural Network (MANN) that also records the original page/section reference.
  3. Retrieval & Fusion

    • At query time, the user’s question is encoded and used to retrieve the most relevant fact vectors (FAISS nearest‑neighbor search).
    • Retrieved vectors are fed into a 6‑layer transformer that attends over them and the query, producing a concise, fact‑grounded answer.
  4. Inference Paths

    • Fast‑path: For manuals already indexed, the system bypasses the heavy retrieval step and directly fetches the pre‑computed fact vectors.
    • Dynamic‑path: For new documents, a lightweight RAG‑style pipeline builds a temporary index on‑the‑fly (max 64 slots) and then proceeds as above.

Results & Findings

ModelParametersQA Accuracy (Engineering Manuals)Avg. Latency
BERT (base)133 M68.1 %1.8 s
GPT‑2 (124 M)124 M71.4 %2.1 s
SMART SLM45.5 M86.7 %0.9 s (fast‑path)
  • Accuracy gain: SMART outperforms GPT‑2 by 21.3 % despite using less than half the parameters.
  • Hallucination reduction: Structured fact grounding cuts spurious numeric answers by ~40 % compared to baseline transformers.
  • Scalability: Adding new manuals incurs only a brief indexing cost (≈ 2 seconds) before the fast‑path becomes available.

Practical Implications

  • Engineering support tools: Integrate SMART into maintenance portals, allowing technicians to query manuals instantly for specs, tolerances, or step‑by‑step procedures.
  • Compliance & safety: Because answers are traceable to source sections, auditors can verify that the model’s output matches documented standards.
  • Edge deployment: The modest 45 M‑parameter footprint fits on modern GPUs or even high‑end CPUs, enabling on‑premise installations where data privacy is critical.
  • Reduced development cost: Companies can replace larger, more expensive LLM APIs with a self‑hosted SMART instance, cutting both inference spend and latency.

Limitations & Future Work

  • Domain specificity: SMART is tuned for engineering manuals; performance on other technical domains (e.g., medical guidelines) remains untested.
  • Memory size bound: The dynamic path caps the memory at 64 slots, which may truncate information for extremely large new documents.
  • Fact extraction errors: The Tree‑LSTM parser can mis‑identify relations in poorly formatted PDFs, leading to downstream inaccuracies.
  • Future directions suggested by the authors include: expanding the memory to a hierarchical, multi‑level index, adapting the Grammarian to multimodal inputs (tables, diagrams), and evaluating cross‑domain transfer with minimal re‑training.

Authors

  • Divij Dudeja
  • Mayukha Pal

Paper Information

  • arXiv ID: 2512.21280v1
  • Categories: cs.CL, cs.AI
  • Published: December 24, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »