[Paper] SMART SLM: Structured Memory and Reasoning Transformer, A Small Language Model for Accurate Document Assistance
Source: arXiv - 2512.21280v1
Overview
The SMART SLM (Structured Memory and Reasoning Transformer) tackles a common pain point for engineers: extracting accurate, numeric information from massive, densely formatted engineering manuals. By turning the raw text into a hierarchy of structured facts and pairing it with a lightweight memory‑augmented transformer, SMART delivers higher accuracy than larger models like GPT‑2 while using far fewer parameters.
Key Contributions
- Hierarchical fact extraction via a syntax‑aware Tree‑LSTM (“Grammarian”) that converts sentences into subject‑relation‑object triples.
- Compact indexed memory (a 384‑dimensional vector store) that links each fact to its source location, enabling fast look‑ups.
- Six‑layer transformer decoder that fuses retrieved facts to generate context‑aware answers.
- Dual‑mode inference:
- Fast‑path for known, pre‑indexed manuals (sub‑second latency).
- Dynamic‑path for newly uploaded documents using a RAG‑style FAISS top‑20 retrieval with a 64‑slot memory buffer.
- Parameter efficiency: 45.5 M parameters (≈ 64 % fewer than GPT‑2) with a 21.3 % boost in accuracy on engineering‑manual QA tasks.
Methodology
-
Fact Extraction (Grammarian)
- Each sentence from an engineering manual is parsed by a Tree‑LSTM that respects the grammatical tree.
- The model outputs subject‑relation‑object (SRO) triples, e.g., (Pump, operates‑at, 150 psi).
-
Structured Memory Indexing
- Every SRO triple is embedded into a 384‑dimensional vector.
- Vectors are stored in a Memory‑Augmented Neural Network (MANN) that also records the original page/section reference.
-
Retrieval & Fusion
- At query time, the user’s question is encoded and used to retrieve the most relevant fact vectors (FAISS nearest‑neighbor search).
- Retrieved vectors are fed into a 6‑layer transformer that attends over them and the query, producing a concise, fact‑grounded answer.
-
Inference Paths
- Fast‑path: For manuals already indexed, the system bypasses the heavy retrieval step and directly fetches the pre‑computed fact vectors.
- Dynamic‑path: For new documents, a lightweight RAG‑style pipeline builds a temporary index on‑the‑fly (max 64 slots) and then proceeds as above.
Results & Findings
| Model | Parameters | QA Accuracy (Engineering Manuals) | Avg. Latency |
|---|---|---|---|
| BERT (base) | 133 M | 68.1 % | 1.8 s |
| GPT‑2 (124 M) | 124 M | 71.4 % | 2.1 s |
| SMART SLM | 45.5 M | 86.7 % | 0.9 s (fast‑path) |
- Accuracy gain: SMART outperforms GPT‑2 by 21.3 % despite using less than half the parameters.
- Hallucination reduction: Structured fact grounding cuts spurious numeric answers by ~40 % compared to baseline transformers.
- Scalability: Adding new manuals incurs only a brief indexing cost (≈ 2 seconds) before the fast‑path becomes available.
Practical Implications
- Engineering support tools: Integrate SMART into maintenance portals, allowing technicians to query manuals instantly for specs, tolerances, or step‑by‑step procedures.
- Compliance & safety: Because answers are traceable to source sections, auditors can verify that the model’s output matches documented standards.
- Edge deployment: The modest 45 M‑parameter footprint fits on modern GPUs or even high‑end CPUs, enabling on‑premise installations where data privacy is critical.
- Reduced development cost: Companies can replace larger, more expensive LLM APIs with a self‑hosted SMART instance, cutting both inference spend and latency.
Limitations & Future Work
- Domain specificity: SMART is tuned for engineering manuals; performance on other technical domains (e.g., medical guidelines) remains untested.
- Memory size bound: The dynamic path caps the memory at 64 slots, which may truncate information for extremely large new documents.
- Fact extraction errors: The Tree‑LSTM parser can mis‑identify relations in poorly formatted PDFs, leading to downstream inaccuracies.
- Future directions suggested by the authors include: expanding the memory to a hierarchical, multi‑level index, adapting the Grammarian to multimodal inputs (tables, diagrams), and evaluating cross‑domain transfer with minimal re‑training.
Authors
- Divij Dudeja
- Mayukha Pal
Paper Information
- arXiv ID: 2512.21280v1
- Categories: cs.CL, cs.AI
- Published: December 24, 2025
- PDF: Download PDF