[Paper] Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System

Published: 3 days ago (February 20, 2026 at 11:57 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.18346v1

Overview

The paper introduces Vichara, a new AI‑driven framework that can both predict the outcome of Indian appellate cases and explain its reasoning in a format familiar to lawyers. By breaking down case documents into granular “decision points” and using large language models (LLMs) to reason over them, Vichara pushes judgment‑prediction accuracy past existing benchmarks while delivering human‑readable explanations.

Key Contributions

Decision‑point decomposition: Converts raw appellate proceedings into structured units (issue, authority, outcome, reasoning, temporal context).
IRAC‑style explanations: Generates explanations that follow the Issue‑Rule‑Application‑Conclusion template, customized for Indian jurisprudence.
Multi‑model evaluation: Benchmarks four LLMs (GPT‑4o mini, Llama‑3.1‑8B, Mistral‑7B, Qwen2.5‑7B) on two curated datasets (PredEx, ILDC_expert).
State‑of‑the‑art performance: GPT‑4o mini attains F1 scores of 81.5 (PredEx) and 80.3 (ILDC_expert), outperforming prior judgment‑prediction baselines.
Human‑centric evaluation: Assesses explanation quality on Clarity, Linking, and Usefulness, showing GPT‑4o mini’s explanations are the most interpretable.

Methodology

Document Ingestion – Vichara reads English‑language appellate case files (court orders, transcripts, etc.).
Decision‑Point Extraction – A rule‑based + neural pipeline identifies discrete legal determinations, each tagged with:
- Legal issue (what is being decided)
- Deciding authority (which judge or bench)
- Outcome (affirm, reverse, modify)
- Reasoning snippet (key rationale)
- Temporal context (when the point was raised)
Prompt Construction – For each decision point, a prompt is built that feeds the structured data into an LLM. The prompt explicitly asks the model to:
- Predict the appellate outcome (binary or multi‑class).
- Generate an IRAC‑style (Issue‑Rule‑Application‑Conclusion) explanation.
Model Ensemble – Four LLMs are run on the same prompts; results are compared both quantitatively (F1, accuracy) and qualitatively (human rating of explanations).
Evaluation Datasets –
- PredEx: A publicly available appellate judgment‑prediction benchmark.
- ILDC_expert: A subset of the Indian Legal Documents Corpus manually annotated by legal experts for decision points and outcomes.

Results & Findings

Model	Dataset	F1 Score	Avg. Explanation Rating*
GPT‑4o mini	PredEx	81.5	4.6 / 5
GPT‑4o mini	ILDC_expert	80.3	4.5 / 5
Llama‑3.1‑8B	PredEx	78.2	4.1 / 5
Llama‑3.1‑8B	ILDC_expert	77.0	4.0 / 5
Mistral‑7B	PredEx	73.4	3.7 / 5
Qwen2.5‑7B	PredEx	71.9	3.5 / 5

*Ratings are averages across Clarity, Linking (how well the explanation ties back to the decision point), and Usefulness (practical value for a lawyer).

Takeaways

Structured decision‑point representation dramatically improves prediction fidelity compared to feeding raw text to the LLM.
The IRAC‑style explanations are not just “plausible text”; they are consistently rated higher for legal relevance and transparency.
Even the 8‑B parameter Llama‑3.1 competes closely with GPT‑4o mini, suggesting that the framework can be adapted to open‑source models for cost‑sensitive deployments.

Practical Implications

Case triage for courts – Judges and clerks can use Vichara to flag high‑probability reversals early, helping prioritize backlog reduction.
Legal research assistants – Law firms can integrate Vichara into document‑review pipelines to auto‑summarize appellate decisions and surface the reasoning behind likely outcomes.
Training junior lawyers – The IRAC‑style explanations serve as teaching material, illustrating how appellate courts structure their judgments.
Policy analytics – Government bodies can aggregate prediction trends to identify systemic patterns (e.g., over‑reliance on certain precedents).
Open‑source feasibility – Because the framework works with models as small as 7‑8 B parameters, smaller firms can deploy a cost‑effective, on‑premise version without relying on proprietary APIs.

Limitations & Future Work

Language scope – Vichara currently handles only English‑language documents; many Indian judgments are in regional languages, limiting coverage.
Dataset bias – The evaluation datasets are skewed toward higher‑court decisions; performance on lower‑court appeals remains untested.
Explainability depth – While IRAC explanations are structured, they do not yet provide citations to specific statutes or precedent paragraphs, which lawyers often demand.
Future directions – The authors propose extending the pipeline to multilingual inputs, enriching explanations with legal citations, and exploring few‑shot fine‑tuning to adapt the model to niche domains (e.g., tax or intellectual‑property appellate law).

Authors

Pavithra PM Nair
Preethu Rose Anish

Paper Information

arXiv ID: 2602.18346v1
Categories: cs.CL, cs.AI
Published: February 20, 2026
PDF: Download PDF

[Paper] Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System

Overview

Key Contributions

Methodology

Results & Findings

Takeaways

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

[Paper] Validating Political Position Predictions of Arguments

[Paper] On the 'Induction Bias' in Sequence Models

[Paper] VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean