[Paper] VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Published: 2 months ago (February 20, 2026 at 01:53 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.18429v1

Overview

The paper introduces VIRAASAT, a new benchmark that challenges large language models (LLMs) to reason about Indian culture through multi‑hop questions. By automatically generating over 3,200 culturally rich QA pairs from a curated knowledge graph, the authors expose a blind spot in today’s LLMs: the inability to reliably chain low‑frequency, region‑specific facts.

Key Contributions

VIRAASAT dataset – a semi‑automated, multi‑hop QA collection covering 13 cultural attributes across all 28 Indian states and 8 Union Territories, built on a 700‑node expert‑curated knowledge graph.
Empirical gap analysis – systematic evaluation of state‑of‑the‑art LLMs (including CoT‑fine‑tuned models) that reveals poor performance on chained cultural reasoning.
Symbolic Chain‑of‑Manipulation (SCoM) – a novel training framework that teaches models to simulate explicit graph operations (traversals, merges, look‑ups) rather than relying on free‑form textual reasoning.
Performance boost – SCoM‑enhanced models achieve up to 20 % higher accuracy than standard Chain‑of‑Thought (CoT) baselines on VIRAASAT.
Open resources – the dataset, knowledge graph, and training scripts are publicly released to foster culturally aware AI research.

Methodology

Knowledge Graph Construction – Domain experts compiled a graph of >700 cultural artifacts (festivals, historical events, cuisines, etc.) linked by 13 attribute types (e.g., “celebrated‑in”, “origin‑year”).
Semi‑automated QA Generation – Using graph traversal algorithms, the authors sampled multi‑hop paths (e.g., State → Festival → Historical Origin) and automatically templated questions that require chaining those hops. Human reviewers validated a subset to ensure linguistic naturalness.
Baseline Evaluation – Prominent LLMs (GPT‑4, LLaMA‑2, PaLM‑2) were tested in zero‑shot, few‑shot, and CoT‑fine‑tuned settings. Accuracy, reasoning trace quality, and fact grounding were measured.
SCoM Framework – Instead of prompting the model to “think step‑by‑step,” SCoM provides a symbolic instruction set that mirrors graph operations (e.g., SELECT node where attribute = “festival” → FOLLOW edge “celebrated‑in” → RETURN state). The model is fine‑tuned to output these symbolic traces before producing the final answer, encouraging internal graph‑like reasoning.
Supervised Fine‑Tuning (SFT) – The authors fine‑tuned LLMs on the SCoM traces using the VIRAASAT training split, then evaluated on a held‑out test set.

Results & Findings

Model	Zero‑Shot	CoT‑Fine‑Tuned	SCoM‑Fine‑Tuned
GPT‑4	38 %	49 %	61 %
LLaMA‑2‑13B	32 %	44 %	58 %
PaLM‑2‑Bison	35 %	46 %	60 %

Chain‑of‑Thought improves performance but still fails on low‑probability facts (e.g., obscure regional festivals).
SCoM consistently outperforms CoT by 12–20 % absolute accuracy, demonstrating that explicit symbolic manipulation helps the model navigate the graph’s topology.
Error analysis shows SCoM reduces “hallucination” of unrelated facts and improves traceability (the model’s intermediate steps align with the actual graph path 78 % of the time vs. 42 % for CoT).

Practical Implications

Culturally aware assistants – Developers building chatbots for Indian markets can integrate SCoM‑style fine‑tuning to avoid mis‑representations of local customs, festivals, or legal nuances.
Domain‑specific QA systems – Enterprises (e.g., tourism boards, heritage museums) can leverage the VIRAASAT graph and SCoM training to power question‑answering interfaces that require multi‑step cultural reasoning.
Reduced annotation cost – The semi‑automated pipeline shows a scalable way to generate high‑quality, multi‑hop QA data for any region, lowering the barrier for creating localized AI benchmarks.
Improved model interpretability – Symbolic traces give developers a concrete debugging artifact (the “path” the model took), which is valuable for compliance and bias audits in culturally sensitive applications.

Limitations & Future Work

Coverage bias – Although the graph spans all Indian states, the depth of each cultural attribute varies; some niche traditions remain under‑represented.
Language diversity – VIRAASAT is currently English‑only; extending to Hindi, Tamil, Bengali, etc., would better reflect India’s multilingual reality.
Scalability of expert curation – The initial knowledge graph required substantial manual effort; future work could explore fully automated KG construction from regional corpora.
Generalization beyond India – The authors plan to adapt the SCoM framework to other cultural domains (e.g., African folklore, Latin American festivals) to test cross‑cultural transfer.

VIRAASAT opens a practical pathway for developers to build AI systems that respect and understand the rich tapestry of Indian culture, moving us a step closer to truly global, culturally competent language models.

Authors

Harshul Raj Surana
Arijit Maji
Aryan Vats
Akash Ghosh
Sriparna Saha
Amit Sheth

Paper Information

arXiv ID: 2602.18429v1
Categories: cs.CL, cs.IR
Published: February 20, 2026
PDF: Download PDF

[Paper] VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering

[Paper] SPQ: An Ensemble Technique for Large Language Model Compression

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

[Paper] Validating Political Position Predictions of Arguments