[Paper] Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally

Published: (December 19, 2025 at 01:57 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.17898v1

Overview

The paper investigates how making AI agents look and behave more “human‑like” influences users’ tendency to anthropomorphize them, and whether that translates into higher engagement and trust. By running two massive cross‑national experiments (≈3,500 participants across 10 countries), the authors show that human‑like design does boost anthropomorphism, but its impact on engagement and trust is far from universal—cultural context matters.

Key Contributions

  • Empirical proof of causality: Demonstrates that specific human‑like design levers (e.g., conversational flow, perspective‑taking cues) directly increase users’ anthropomorphic attributions.
  • Cross‑cultural nuance: Shows that the same design cues can raise trust in some regions (e.g., Brazil) while lowering it in others (e.g., Japan), contradicting the “one‑size‑fits‑all” safety assumptions.
  • User‑centric evaluation criteria: Finds that users judge AI humanity based on interactional cues rather than abstract notions like sentience, informing more practical design checklists.
  • Large‑scale, realistic interaction study: Conducts real‑time, open‑ended dialogues with a deployed AI system, moving beyond lab‑only or survey‑only methods.
  • Policy‑relevant insights: Provides evidence that current AI governance frameworks, which largely extrapolate from Western samples, may misjudge risk for non‑Western users.

Methodology

  1. Participants & Setting – 3,500 volunteers from 10 culturally diverse nations (e.g., Brazil, Japan, Germany, Kenya) were recruited via online panels.
  2. AI System – A conversational agent was built with two configurable “design levers”:
    • Human‑like language: smoother turn‑taking, use of empathy phrases, and perspective‑taking statements.
    • Mechanical language: terse, task‑focused replies without social niceties.
  3. Experimental Design – Participants were randomly assigned to interact with either the human‑like or mechanical version in a 10‑minute free‑form chat. After the session they completed:
    • Anthropomorphism scale (e.g., “The AI seemed to understand my feelings”).
    • Engagement metrics (time spent, number of turns, self‑reported enjoyment).
    • Trust questionnaire (e.g., willingness to rely on the AI’s advice).
  4. Cross‑national analysis – Mixed‑effects models incorporated country as a random effect, allowing the team to isolate cultural moderation of design effects.
  5. Qualitative follow‑up – Open‑ended responses were coded to identify which cues participants actually used to judge “human‑likeness”.

Results & Findings

OutcomeHuman‑like design vs. MechanicalCultural moderation
Anthropomorphism↑ significant increase (Cohen’s d ≈ 0.45)Consistent across all countries
Engagement (turns, time)Small, non‑significant lift overallPositive in Brazil & Mexico; neutral/negative in Japan & South Korea
Self‑reported trustMixed: overall effect ≈ 0 (no clear lift)Trust ↑ in Brazil, Philippines; ↓ in Japan, Germany
Cue importanceUsers highlighted conversation flow and perspective‑taking as key human‑like signalsSame cues valued globally, but their trust impact diverged

In short, making an AI feel more conversational makes people see it as more human, but whether that translates into “I like it more” or “I trust it” depends heavily on cultural background.

Practical Implications

  • Design checklists: Prioritize interactional cues (smooth turn‑taking, empathy language) when the goal is to increase perceived humanity; don’t rely on abstract “sentient” features.
  • Localization strategy: Deploy different conversational styles per market. For example, a more formal, less overtly empathetic tone may be safer in Japan, while a warm, expressive style could boost trust in Brazil.
  • Metrics selection: Trust and engagement should be measured post‑deployment per region; a single global KPI can mask opposite effects.
  • Regulatory compliance: When documenting “human‑like” features for AI audits, include cultural impact assessments rather than assuming uniform risk.
  • Product roadmaps: Teams building customer‑support bots, virtual assistants, or health chatters can use these findings to decide when human‑like traits are worth the engineering effort versus when a more utilitarian tone is preferable.

Limitations & Future Work

  • Scope of tasks: The study used a generic open‑ended chat; results may differ for domain‑specific interactions (e.g., finance, medical advice).
  • Depth of cultural variables: While country was used as a proxy, finer‑grained factors (e.g., individualism vs. collectivism, power distance) were not directly modeled.
  • Long‑term effects: Experiments captured a single session; it remains unclear how anthropomorphism and trust evolve over weeks or months of repeated use.
  • AI model constraints: The conversational agent was a rule‑based system with limited language generation capabilities; modern large‑language models could amplify or dampen the observed effects.

Future research should explore longitudinal studies, test domain‑specific agents, and integrate richer cultural psychometrics to refine design guidelines for globally deployed human‑like AI.

Authors

  • Robin Schimmelpfennig
  • Mark Díaz
  • Vinodkumar Prabhakaran
  • Aida Davani

Paper Information

  • arXiv ID: 2512.17898v1
  • Categories: cs.AI
  • Published: December 19, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »