[Paper] Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System

Published: (February 20, 2026 at 11:57 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.18346v1

Overview

The paper introduces Vichara, a new AI‑driven framework that can both predict the outcome of Indian appellate cases and explain its reasoning in a format familiar to lawyers. By breaking down case documents into granular “decision points” and using large language models (LLMs) to reason over them, Vichara pushes judgment‑prediction accuracy past existing benchmarks while delivering human‑readable explanations.

Key Contributions

  • Decision‑point decomposition: Converts raw appellate proceedings into structured units (issue, authority, outcome, reasoning, temporal context).
  • IRAC‑style explanations: Generates explanations that follow the Issue‑Rule‑Application‑Conclusion template, customized for Indian jurisprudence.
  • Multi‑model evaluation: Benchmarks four LLMs (GPT‑4o mini, Llama‑3.1‑8B, Mistral‑7B, Qwen2.5‑7B) on two curated datasets (PredEx, ILDC_expert).
  • State‑of‑the‑art performance: GPT‑4o mini attains F1 scores of 81.5 (PredEx) and 80.3 (ILDC_expert), outperforming prior judgment‑prediction baselines.
  • Human‑centric evaluation: Assesses explanation quality on Clarity, Linking, and Usefulness, showing GPT‑4o mini’s explanations are the most interpretable.

Methodology

  1. Document Ingestion – Vichara reads English‑language appellate case files (court orders, transcripts, etc.).
  2. Decision‑Point Extraction – A rule‑based + neural pipeline identifies discrete legal determinations, each tagged with:
    • Legal issue (what is being decided)
    • Deciding authority (which judge or bench)
    • Outcome (affirm, reverse, modify)
    • Reasoning snippet (key rationale)
    • Temporal context (when the point was raised)
  3. Prompt Construction – For each decision point, a prompt is built that feeds the structured data into an LLM. The prompt explicitly asks the model to:
    • Predict the appellate outcome (binary or multi‑class).
    • Generate an IRAC‑style (Issue‑Rule‑Application‑Conclusion) explanation.
  4. Model Ensemble – Four LLMs are run on the same prompts; results are compared both quantitatively (F1, accuracy) and qualitatively (human rating of explanations).
  5. Evaluation Datasets
    • PredEx: A publicly available appellate judgment‑prediction benchmark.
    • ILDC_expert: A subset of the Indian Legal Documents Corpus manually annotated by legal experts for decision points and outcomes.

Results & Findings

ModelDatasetF1 ScoreAvg. Explanation Rating*
GPT‑4o miniPredEx81.54.6 / 5
GPT‑4o miniILDC_expert80.34.5 / 5
Llama‑3.1‑8BPredEx78.24.1 / 5
Llama‑3.1‑8BILDC_expert77.04.0 / 5
Mistral‑7BPredEx73.43.7 / 5
Qwen2.5‑7BPredEx71.93.5 / 5

*Ratings are averages across Clarity, Linking (how well the explanation ties back to the decision point), and Usefulness (practical value for a lawyer).

Takeaways

  • Structured decision‑point representation dramatically improves prediction fidelity compared to feeding raw text to the LLM.
  • The IRAC‑style explanations are not just “plausible text”; they are consistently rated higher for legal relevance and transparency.
  • Even the 8‑B parameter Llama‑3.1 competes closely with GPT‑4o mini, suggesting that the framework can be adapted to open‑source models for cost‑sensitive deployments.

Practical Implications

  • Case triage for courts – Judges and clerks can use Vichara to flag high‑probability reversals early, helping prioritize backlog reduction.
  • Legal research assistants – Law firms can integrate Vichara into document‑review pipelines to auto‑summarize appellate decisions and surface the reasoning behind likely outcomes.
  • Training junior lawyers – The IRAC‑style explanations serve as teaching material, illustrating how appellate courts structure their judgments.
  • Policy analytics – Government bodies can aggregate prediction trends to identify systemic patterns (e.g., over‑reliance on certain precedents).
  • Open‑source feasibility – Because the framework works with models as small as 7‑8 B parameters, smaller firms can deploy a cost‑effective, on‑premise version without relying on proprietary APIs.

Limitations & Future Work

  • Language scope – Vichara currently handles only English‑language documents; many Indian judgments are in regional languages, limiting coverage.
  • Dataset bias – The evaluation datasets are skewed toward higher‑court decisions; performance on lower‑court appeals remains untested.
  • Explainability depth – While IRAC explanations are structured, they do not yet provide citations to specific statutes or precedent paragraphs, which lawyers often demand.
  • Future directions – The authors propose extending the pipeline to multilingual inputs, enriching explanations with legal citations, and exploring few‑shot fine‑tuning to adapt the model to niche domains (e.g., tax or intellectual‑property appellate law).

Authors

  • Pavithra PM Nair
  • Preethu Rose Anish

Paper Information

  • arXiv ID: 2602.18346v1
  • Categories: cs.CL, cs.AI
  • Published: February 20, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »