Study: AI chatbots provide less-accurate information to vulnerable users

Published: (February 19, 2026 at 06:25 PM EST)
4 min read

Source: MIT News - AI

Study Overview

Large language models (LLMs) have been championed as tools that could democratize access to information worldwide, offering knowledge in a user‑friendly interface regardless of a person’s background or location. However, new research from MIT’s Center for Constructive Communication (CCC) suggests these artificial‑intelligence systems may actually perform worse for the very users who could most benefit from them.

A study conducted by researchers at CCC (based at the MIT Media Lab) found that state‑of‑the‑art AI chatbots—including OpenAI’s GPT‑4, Anthropic’s Claude 3 Opus, and Meta’s Llama 3—sometimes provide less‑accurate and less‑truthful responses to users who:

  • Have lower English proficiency
  • Have less formal education
  • Originate from outside the United States

The models also refuse to answer questions at higher rates for these users, and in some cases respond with condescending or patronizing language.

“We were motivated by the prospect of LLMs helping to address inequitable information accessibility worldwide,” says lead author Elinor Poole‑Dayan SM ’25, a technical associate in the MIT Sloan School of Management who led the research as a CCC affiliate and master’s student in Media Arts and Sciences. “But that vision cannot become a reality without ensuring that model biases and harmful tendencies are safely mitigated for all users, regardless of language, nationality, or other demographics.”

A paper describing the work, LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users, was presented at the AAAI Conference on Artificial Intelligence in January.

Systematic Underperformance Across Multiple Dimensions

Datasets used

  • TruthfulQA – measures truthfulness by probing common misconceptions and literal truths.
  • SciQ – contains science‑exam questions that test factual accuracy.

Method

Researchers prepended short user biographies to each question, varying three traits: education level, English proficiency, and country of origin.

Key Findings

TraitEffect on Accuracy
Less formal educationSignificant drop across all three models and both datasets
Non‑native English speakerSignificant drop across all three models and both datasets
Intersection (less educated + non‑native English)Largest declines in response quality
Country of origin (Iran, China vs. U.S.)Claude 3 Opus performed significantly worse for users from Iran on both datasets

“We see the largest drop in accuracy for the user who is both a non‑native English speaker and less educated,” says Jad Kabbara, a research scientist at CCC and co‑author on the paper. “These results show that the negative effects of model behavior with respect to these user traits compound in concerning ways, thus suggesting that such models deployed at scale risk spreading harmful behavior or misinformation downstream to those who are least able to identify it.”

Refusals and Condescending Language

  • Refusal rates: Claude 3 Opus refused to answer ≈ 11 % of questions for less‑educated, non‑native English‑speaking users, versus 3.6 % for the control condition (no biography).

  • Condensing language: Manual analysis showed that 43.7 % of refusals for less‑educated users contained condescending, patronizing, or mocking language, compared with

“This is another indicator suggesting that the alignment process might incentivize models to withhold information from certain users to avoid potentially misinforming them, although the model clearly knows the correct answer and provides it to other users,” says Kabbara.

Echoes of Human Bias

The findings mirror documented patterns of human sociocognitive bias. Research shows that native English speakers often perceive non‑native speakers as less educated, intelligent, and competent, regardless of actual expertise. Similar biased perceptions have been documented among teachers evaluating non‑native English‑speaking students.

“The value of large language models is evident in their extraordinary uptake by individuals and the massive investment flowing into the technology,” says Deb Roy, professor of Media Arts and Sciences, CCC director, and co‑author on the paper. “This study is a reminder of how important it is to continually assess systematic biases that can quietly slip into these systems, creating unfair harms for certain groups without any of us being fully aware.”

Implications for Personalization

Personalization features—such as ChatGPT’s Memory, which tracks user information across conversations—are becoming increasingly common. These features risk differentially treating already‑marginalized groups.

“LLMs have been marketed as tools that will foster more equitable access to information and revolutionize personalized learning,” says Poole‑Dayan. “But our findings suggest they may actually exacerbate existing inequities by systematically providing misinformation or refusals to vulnerable users.”

“… to answer queries to certain users. The people who may rely on these tools the most could receive subpar, false, or even harmful information.”

0 views
Back to Blog

Related posts

Read more »

Awesome AI Agent Papers 2026

!Cover image for Awesome AI Agent Papers 2026https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-u...

What is an LLM Gateway?

markdown !smakoshhttps://media2.dev.to/dynamic/image/width=50,height=50,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploa...