[Paper] Explainable Statute Prediction via Attention-based Model and LLM Prompting

Published: 1 month ago (December 26, 2025 at 02:29 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.21902v1

Overview

This paper tackles statute prediction – automatically suggesting the legal provisions (sections, subsections, or articles) that apply to a given case description. The authors argue that for AI‑assisted legal tools to be trusted, the system must not only output the right statutes but also explain why each statute is relevant. To that end, they introduce two complementary approaches: an attention‑based model that works with modest‑size language models, and a prompting strategy that leverages large language models (LLMs) in a zero‑shot fashion.

Key Contributions

Attention‑over‑Sentences (AoS) model: uses sentence‑level attention on the case text to rank relevant statutes, trained end‑to‑end with supervised data.
LLM Prompting (LLMPrompt) framework: designs zero‑shot prompts (including Chain‑of‑Thought) for large models (e.g., GPT‑4) to both predict statutes and generate natural‑language rationales.
Dual‑evaluation pipeline: measures statute prediction accuracy against strong baselines on two benchmark legal datasets, and assesses explanation quality via automated counter‑factual tests and human judgments.
Explainability focus: provides human‑readable explanations (sentence excerpts, logical steps) rather than opaque confidence scores.
Empirical comparison of a lightweight supervised model vs. a heavyweight zero‑shot LLM, highlighting trade‑offs in performance, compute cost, and interpretability.

Methodology

Data preprocessing – Case narratives are split into sentences; each sentence is embedded with a sentence transformer (e.g., SBERT).
AoS model –
- A trainable attention layer learns weights for each sentence, indicating its relevance to each possible statute.
- Weighted sentence embeddings are aggregated and fed into a classifier that outputs a multi‑label prediction (one or more statutes may apply).
- The attention weights themselves serve as the explanation: the top‑scoring sentences are presented as the rationale.
LLMPrompt framework –
- Constructs a prompt that includes the case description, a brief instruction to list applicable statutes, and a request for a natural‑language justification.
- Experiments with two prompting styles: standard (direct ask) and Chain‑of‑Thought (CoT) (step‑by‑step reasoning before the final answer).
- No fine‑tuning is performed; the LLM (e.g., GPT‑4, Claude) generates predictions and explanations in a single forward pass.
Evaluation –
- Statute prediction: micro‑averaged F1 and precision@k against gold statutes.
- Explanation quality: (a) counter‑factual test – replace a highlighted sentence and check if the predicted statute changes; (b) human rating of relevance, completeness, and readability on a Likert scale.

Results & Findings

Model	Statute F1 (Dataset 1)	Statute F1 (Dataset 2)	Avg. Explanation Score (Human)
AoS (sentence‑transformer)	0.71	0.68	4.1 / 5
LLMPrompt – Standard	0.66	0.64	3.8 / 5
LLMPrompt – CoT	0.68	0.66	4.3 / 5
Strong baseline (BERT‑CLS)	0.62	0.60	3.2 / 5
Random	0.12	0.10	—

AoS outperforms all baselines on raw prediction accuracy while delivering transparent sentence‑level explanations.
CoT prompting narrows the gap, achieving higher human‑rated explanation quality than the standard prompt, though its statute F1 remains slightly lower than AoS.
Counter‑factual tests confirm that the highlighted sentences truly influence the model’s decisions: swapping them often flips the predicted statutes.
Computationally, AoS requires a modest GPU for training and inference, whereas LLMPrompt incurs higher latency and API cost but needs no training data.

Practical Implications

Legal AI assistants: Developers can integrate AoS for on‑device, low‑latency statute suggestions with built‑in justification, ideal for internal firm tools where data privacy is critical.
Zero‑shot rapid prototyping: LLMPrompt offers a plug‑and‑play solution when labeled training data are scarce—simply craft the right prompt and let a hosted LLM do the heavy lifting.
Explainability as a product feature: The sentence‑level attention maps or CoT reasoning can be surfaced directly to lawyers, increasing trust and facilitating compliance with emerging AI‑transparency regulations.
Hybrid pipelines: A practical system could first run AoS for fast, high‑accuracy predictions and fall back to LLMPrompt for edge cases or to generate richer narrative explanations.
Extensibility: The same attention‑over‑sentences architecture can be repurposed for other multi‑label legal tasks (e.g., issue spotting, precedent retrieval) with minimal changes.

Limitations & Future Work

Domain coverage: Experiments are limited to two Indian legal corpora; performance on other jurisdictions (U.S., EU) remains untested.
Statute granularity: The models treat each statute as an atomic label; hierarchical relationships (e.g., Act → Section → Sub‑section) are not exploited.
Explanation depth: While human‑readable, the explanations are still surface‑level (sentence excerpts or CoT steps) and may not satisfy rigorous legal reasoning standards.
LLM cost & latency: Zero‑shot prompting incurs API fees and slower response times, which could be prohibitive for high‑throughput services.
Future directions suggested by the authors include: (1) incorporating hierarchical label structures, (2) fine‑tuning LLMs on legal corpora to improve both accuracy and explanation fidelity, and (3) exploring multimodal inputs (e.g., PDFs, scanned documents) to broaden real‑world applicability.

Authors

Sachin Pawar
Girish Keshav Palshikar
Anindita Sinha Banerjee
Nitin Ramrakhiyani
Basit Ali

Paper Information

arXiv ID: 2512.21902v1
Categories: cs.CL
Published: December 26, 2025
PDF: Download PDF

[Paper] Explainable Statute Prediction via Attention-based Model and LLM Prompting

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] A2P-Vis: an Analyzer-to-Presenter Agentic Pipeline for Visual Insights Generation and Reporting

[Paper] Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis

[Paper] Unifying Learning Dynamics and Generalization in Transformers Scaling Law

[Paper] Context as a Tool: Context Management for Long-Horizon SWE-Agents