[Paper] VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning
Source: arXiv - 2602.18429v1
Overview
The paper introduces VIRAASAT, a new benchmark that challenges large language models (LLMs) to reason about Indian culture through multi‑hop questions. By automatically generating over 3,200 culturally rich QA pairs from a curated knowledge graph, the authors expose a blind spot in today’s LLMs: the inability to reliably chain low‑frequency, region‑specific facts.
Key Contributions
- VIRAASAT dataset – a semi‑automated, multi‑hop QA collection covering 13 cultural attributes across all 28 Indian states and 8 Union Territories, built on a 700‑node expert‑curated knowledge graph.
- Empirical gap analysis – systematic evaluation of state‑of‑the‑art LLMs (including CoT‑fine‑tuned models) that reveals poor performance on chained cultural reasoning.
- Symbolic Chain‑of‑Manipulation (SCoM) – a novel training framework that teaches models to simulate explicit graph operations (traversals, merges, look‑ups) rather than relying on free‑form textual reasoning.
- Performance boost – SCoM‑enhanced models achieve up to 20 % higher accuracy than standard Chain‑of‑Thought (CoT) baselines on VIRAASAT.
- Open resources – the dataset, knowledge graph, and training scripts are publicly released to foster culturally aware AI research.
Methodology
- Knowledge Graph Construction – Domain experts compiled a graph of >700 cultural artifacts (festivals, historical events, cuisines, etc.) linked by 13 attribute types (e.g., “celebrated‑in”, “origin‑year”).
- Semi‑automated QA Generation – Using graph traversal algorithms, the authors sampled multi‑hop paths (e.g., State → Festival → Historical Origin) and automatically templated questions that require chaining those hops. Human reviewers validated a subset to ensure linguistic naturalness.
- Baseline Evaluation – Prominent LLMs (GPT‑4, LLaMA‑2, PaLM‑2) were tested in zero‑shot, few‑shot, and CoT‑fine‑tuned settings. Accuracy, reasoning trace quality, and fact grounding were measured.
- SCoM Framework – Instead of prompting the model to “think step‑by‑step,” SCoM provides a symbolic instruction set that mirrors graph operations (e.g., SELECT node where attribute = “festival” → FOLLOW edge “celebrated‑in” → RETURN state). The model is fine‑tuned to output these symbolic traces before producing the final answer, encouraging internal graph‑like reasoning.
- Supervised Fine‑Tuning (SFT) – The authors fine‑tuned LLMs on the SCoM traces using the VIRAASAT training split, then evaluated on a held‑out test set.
Results & Findings
| Model | Zero‑Shot | CoT‑Fine‑Tuned | SCoM‑Fine‑Tuned |
|---|---|---|---|
| GPT‑4 | 38 % | 49 % | 61 % |
| LLaMA‑2‑13B | 32 % | 44 % | 58 % |
| PaLM‑2‑Bison | 35 % | 46 % | 60 % |
- Chain‑of‑Thought improves performance but still fails on low‑probability facts (e.g., obscure regional festivals).
- SCoM consistently outperforms CoT by 12–20 % absolute accuracy, demonstrating that explicit symbolic manipulation helps the model navigate the graph’s topology.
- Error analysis shows SCoM reduces “hallucination” of unrelated facts and improves traceability (the model’s intermediate steps align with the actual graph path 78 % of the time vs. 42 % for CoT).
Practical Implications
- Culturally aware assistants – Developers building chatbots for Indian markets can integrate SCoM‑style fine‑tuning to avoid mis‑representations of local customs, festivals, or legal nuances.
- Domain‑specific QA systems – Enterprises (e.g., tourism boards, heritage museums) can leverage the VIRAASAT graph and SCoM training to power question‑answering interfaces that require multi‑step cultural reasoning.
- Reduced annotation cost – The semi‑automated pipeline shows a scalable way to generate high‑quality, multi‑hop QA data for any region, lowering the barrier for creating localized AI benchmarks.
- Improved model interpretability – Symbolic traces give developers a concrete debugging artifact (the “path” the model took), which is valuable for compliance and bias audits in culturally sensitive applications.
Limitations & Future Work
- Coverage bias – Although the graph spans all Indian states, the depth of each cultural attribute varies; some niche traditions remain under‑represented.
- Language diversity – VIRAASAT is currently English‑only; extending to Hindi, Tamil, Bengali, etc., would better reflect India’s multilingual reality.
- Scalability of expert curation – The initial knowledge graph required substantial manual effort; future work could explore fully automated KG construction from regional corpora.
- Generalization beyond India – The authors plan to adapt the SCoM framework to other cultural domains (e.g., African folklore, Latin American festivals) to test cross‑cultural transfer.
VIRAASAT opens a practical pathway for developers to build AI systems that respect and understand the rich tapestry of Indian culture, moving us a step closer to truly global, culturally competent language models.
Authors
- Harshul Raj Surana
- Arijit Maji
- Aryan Vats
- Akash Ghosh
- Sriparna Saha
- Amit Sheth
Paper Information
- arXiv ID: 2602.18429v1
- Categories: cs.CL, cs.IR
- Published: February 20, 2026
- PDF: Download PDF