[Paper] Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People

Published: 13 hours ago (March 10, 2026 at 01:56 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2603.09964v1

Overview

A new study explores how a large language model (LLM)–powered “sighted guide” can help blind and low‑vision (BLV) users navigate social virtual‑reality (VR) spaces. By pairing the guide with a small user study (16 participants), the authors uncover how BLV users treat the AI both as a functional tool and as a social companion, offering fresh design insights for building inclusive VR experiences.

Key Contributions

LLM‑driven guide prototype that answers navigation queries and describes the virtual environment in real time.
Empirical user study with 16 BLV participants interacting with the guide in solo and socially‑rich VR scenarios.
Behavioral insights showing a shift from “tool” mindset (solo) to “companion” mindset (group) – e.g., nicknaming the guide, rationalizing errors, and prompting interaction with other avatars.
Design recommendations for future AI‑based accessibility agents (e.g., personality cues, error transparency, multimodal feedback).
Cross‑disciplinary contribution linking human‑computer interaction (HCI), AI, and accessibility research.

Methodology

Guide Architecture
- A large language model (GPT‑4‑style) receives a stream of scene metadata (object positions, avatar locations, audio cues).
- The model generates concise, spoken descriptions and answers ad‑hoc questions (“Where is the door?”).
- Output is rendered through a text‑to‑speech engine and delivered via the user’s headset.
Study Design
- Participants: 16 adults with varying degrees of blindness or low vision.
- Scenarios:
  - Solo navigation: participants explored a virtual lobby alone, relying solely on the guide.
  - Social interaction: a confederate avatar (controlled by a researcher) joined the scene, prompting participants to coordinate with both the avatar and the guide.
- Data collection: think‑aloud protocols, screen‑recorded logs, post‑session interviews, and sentiment coding of guide‑related language.
Analysis
- Qualitative coding identified patterns of tool‑like vs. companion‑like behavior.
- Quantitative metrics (e.g., task completion time, number of guide queries) complemented the qualitative insights.

Results & Findings

Finding	What it Means
Tool mindset in solo mode – participants asked direct, task‑oriented questions and treated the guide as a utility.	The LLM guide can effectively serve as an on‑demand spatial description engine.
Companion mindset in social mode – participants gave the guide nicknames, apologized for its mistakes, and encouraged the confederate to “talk” to the guide.	Users anthropomorphize the AI when social cues are present, seeking a sense of shared presence.
Error rationalization – participants blamed the guide’s “voice” or “personality” for inaccurate descriptions rather than the underlying system.	Transparent error handling (e.g., indicating confidence levels) could reduce misattribution.
Increased engagement – participants queried the guide more frequently when another avatar was present.	Social contexts raise the perceived value of an “assistant” that can mediate between multiple participants.

Overall, the guide enabled successful navigation and interaction, but its perceived reliability hinged on how users framed its role.

Practical Implications

For VR developers:
- Embedding an LLM‑backed narration layer can make existing 3D worlds instantly more accessible without redesigning geometry.
- Provide configurable personality settings (tone, name, verbosity) to let BLV users tailor the guide to a “tool” or “companion” role.
For AI product teams:
- Leverage confidence scores or “I’m not sure” prompts to avoid users over‑trusting erroneous descriptions.
- Design multimodal feedback (haptic cues for proximity, audio for object identity) that complements the LLM’s speech.
For accessibility consultants:
- Use the study’s design recommendations (e.g., explicit error explanations, consistent voice identity) when auditing VR platforms for BLV compliance.
For open‑source communities:
- The prototype can be built on top of existing LLM APIs (OpenAI, Anthropic) and integrated with Unity/Unreal via simple metadata hooks, lowering the barrier for inclusive VR tooling.

Limitations & Future Work

Sample size & diversity: 16 participants provide rich qualitative data but limit statistical generalization across the full spectrum of BLV abilities.
Guide’s knowledge scope: The prototype relied on pre‑processed scene metadata; real‑world VR apps may have dynamic, procedurally generated content that is harder to describe.
Latency & bandwidth: Real‑time LLM inference can introduce delays, especially on mobile headsets; future work should explore edge‑computing or distilled models.
Long‑term interaction: The study covered a single session; longitudinal studies are needed to see how relationships with the guide evolve over weeks or months.

Future research directions include adaptive personality models, error‑aware dialogue management, and cross‑modal integration (e.g., vibrotactile maps) to further close the accessibility gap in immersive social VR.

Authors

Jazmin Collins
Sharon Y Lin
Tianqi Liu
Andrea Stevenson Won
Shiri Azenkot

Paper Information

arXiv ID: 2603.09964v1
Categories: cs.HC, cs.AI, cs.ET
Published: March 10, 2026
PDF: Download PDF

[Paper] Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Task Aware Modulation Using Representation Learning for Upsaling of Terrestrial Carbon Fluxes

[Paper] From Data Statistics to Feature Geometry: How Correlations Shape Superposition

[Paper] Emotional Modulation in Swarm Decision Dynamics

[Paper] BEACON: Language-Conditioned Navigation Affordance Prediction under Occlusion