[Paper] Explainable Statute Prediction via Attention-based Model and LLM Prompting
Source: arXiv - 2512.21902v1
Overview
This paper tackles statute prediction – automatically suggesting the legal provisions (sections, subsections, or articles) that apply to a given case description. The authors argue that for AI‑assisted legal tools to be trusted, the system must not only output the right statutes but also explain why each statute is relevant. To that end, they introduce two complementary approaches: an attention‑based model that works with modest‑size language models, and a prompting strategy that leverages large language models (LLMs) in a zero‑shot fashion.
Key Contributions
- Attention‑over‑Sentences (AoS) model: uses sentence‑level attention on the case text to rank relevant statutes, trained end‑to‑end with supervised data.
- LLM Prompting (LLMPrompt) framework: designs zero‑shot prompts (including Chain‑of‑Thought) for large models (e.g., GPT‑4) to both predict statutes and generate natural‑language rationales.
- Dual‑evaluation pipeline: measures statute prediction accuracy against strong baselines on two benchmark legal datasets, and assesses explanation quality via automated counter‑factual tests and human judgments.
- Explainability focus: provides human‑readable explanations (sentence excerpts, logical steps) rather than opaque confidence scores.
- Empirical comparison of a lightweight supervised model vs. a heavyweight zero‑shot LLM, highlighting trade‑offs in performance, compute cost, and interpretability.
Methodology
- Data preprocessing – Case narratives are split into sentences; each sentence is embedded with a sentence transformer (e.g., SBERT).
- AoS model –
- A trainable attention layer learns weights for each sentence, indicating its relevance to each possible statute.
- Weighted sentence embeddings are aggregated and fed into a classifier that outputs a multi‑label prediction (one or more statutes may apply).
- The attention weights themselves serve as the explanation: the top‑scoring sentences are presented as the rationale.
- LLMPrompt framework –
- Constructs a prompt that includes the case description, a brief instruction to list applicable statutes, and a request for a natural‑language justification.
- Experiments with two prompting styles: standard (direct ask) and Chain‑of‑Thought (CoT) (step‑by‑step reasoning before the final answer).
- No fine‑tuning is performed; the LLM (e.g., GPT‑4, Claude) generates predictions and explanations in a single forward pass.
- Evaluation –
- Statute prediction: micro‑averaged F1 and precision@k against gold statutes.
- Explanation quality: (a) counter‑factual test – replace a highlighted sentence and check if the predicted statute changes; (b) human rating of relevance, completeness, and readability on a Likert scale.
Results & Findings
| Model | Statute F1 (Dataset 1) | Statute F1 (Dataset 2) | Avg. Explanation Score (Human) |
|---|---|---|---|
| AoS (sentence‑transformer) | 0.71 | 0.68 | 4.1 / 5 |
| LLMPrompt – Standard | 0.66 | 0.64 | 3.8 / 5 |
| LLMPrompt – CoT | 0.68 | 0.66 | 4.3 / 5 |
| Strong baseline (BERT‑CLS) | 0.62 | 0.60 | 3.2 / 5 |
| Random | 0.12 | 0.10 | — |
- AoS outperforms all baselines on raw prediction accuracy while delivering transparent sentence‑level explanations.
- CoT prompting narrows the gap, achieving higher human‑rated explanation quality than the standard prompt, though its statute F1 remains slightly lower than AoS.
- Counter‑factual tests confirm that the highlighted sentences truly influence the model’s decisions: swapping them often flips the predicted statutes.
- Computationally, AoS requires a modest GPU for training and inference, whereas LLMPrompt incurs higher latency and API cost but needs no training data.
Practical Implications
- Legal AI assistants: Developers can integrate AoS for on‑device, low‑latency statute suggestions with built‑in justification, ideal for internal firm tools where data privacy is critical.
- Zero‑shot rapid prototyping: LLMPrompt offers a plug‑and‑play solution when labeled training data are scarce—simply craft the right prompt and let a hosted LLM do the heavy lifting.
- Explainability as a product feature: The sentence‑level attention maps or CoT reasoning can be surfaced directly to lawyers, increasing trust and facilitating compliance with emerging AI‑transparency regulations.
- Hybrid pipelines: A practical system could first run AoS for fast, high‑accuracy predictions and fall back to LLMPrompt for edge cases or to generate richer narrative explanations.
- Extensibility: The same attention‑over‑sentences architecture can be repurposed for other multi‑label legal tasks (e.g., issue spotting, precedent retrieval) with minimal changes.
Limitations & Future Work
- Domain coverage: Experiments are limited to two Indian legal corpora; performance on other jurisdictions (U.S., EU) remains untested.
- Statute granularity: The models treat each statute as an atomic label; hierarchical relationships (e.g., Act → Section → Sub‑section) are not exploited.
- Explanation depth: While human‑readable, the explanations are still surface‑level (sentence excerpts or CoT steps) and may not satisfy rigorous legal reasoning standards.
- LLM cost & latency: Zero‑shot prompting incurs API fees and slower response times, which could be prohibitive for high‑throughput services.
- Future directions suggested by the authors include: (1) incorporating hierarchical label structures, (2) fine‑tuning LLMs on legal corpora to improve both accuracy and explanation fidelity, and (3) exploring multimodal inputs (e.g., PDFs, scanned documents) to broaden real‑world applicability.
Authors
- Sachin Pawar
- Girish Keshav Palshikar
- Anindita Sinha Banerjee
- Nitin Ramrakhiyani
- Basit Ali
Paper Information
- arXiv ID: 2512.21902v1
- Categories: cs.CL
- Published: December 26, 2025
- PDF: Download PDF