[Paper] Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space

Published: 2 months ago (February 5, 2026 at 01:23 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.05971v1

Overview

This paper proposes a novel way to look at how people generate concepts—think of naming animals, listing properties, or doing a verbal‑fluency test—by treating each word they utter as a step through a high‑dimensional embedding space (the same kind of space that modern transformer models like BERT or RoBERTa use). By turning a sequence of spoken words into a semantic trajectory, the authors can measure “movement” (distance, speed, direction) and compare those dynamics across languages, tasks, and even clinical populations.

Key Contributions

Trajectory‑based framework: Introduces cumulative word‑embedding vectors as points in a continuous space, turning any concept‑production task into a geometric path.
Rich set of metrics: Defines scalar and vectorial measures (e.g., distance to next word, distance to centroid, entropy, velocity, acceleration) that capture both what is being said and how it is being navigated.
Cross‑lingual, cross‑task validation: Applies the method to four datasets (English neurodegenerative fluency, English swear‑word fluency, Italian property listing, German property listing) and shows consistent patterns.
Clinical relevance: Demonstrates that trajectory metrics can separate clinical groups (e.g., patients with neurodegenerative disease vs. healthy controls) without labor‑intensive manual annotation.
Model‑agnostic robustness: Finds that different transformer‑based embedding models (e.g., BERT, RoBERTa, multilingual models) yield highly similar trajectory statistics, suggesting the approach is not tied to a specific architecture.
Cumulative vs. non‑cumulative embeddings: Shows that cumulative embeddings (adding each new word to the running sum) work best for longer utterances, while a non‑cumulative (single‑word) view may be preferable for very short sequences.

Methodology

Data collection – Participants perform a concept production task (e.g., “list as many animals as you can”). The spoken or typed responses are tokenized into a chronological list of words.
Embedding extraction – Each word is fed to a pre‑trained transformer text encoder (BERT, RoBERTa, multilingual BERT, etc.) to obtain a dense vector (typically 768‑dimensional).
Cumulative representation – For the i‑th word, the authors compute the sum (or average) of embeddings from the first word up to the i‑th word, producing a point pᵢ in the embedding space. The full list [p₁, p₂, …, pₙ] is the semantic trajectory.
Metric computation – From the trajectory they derive:
- Δ‑distance: Euclidean distance between consecutive points (how far the meaning jumps).
- Centroid distance: Distance from each point to the overall mean of the trajectory (how “central” the current concept is).
- Entropy: Shannon entropy of the distribution of distances, reflecting variability.
- Velocity & acceleration: First and second temporal derivatives of the trajectory (rate of change and its change).
Statistical analysis – Metrics are compared across groups (e.g., patients vs. controls) and across languages using standard tests (t‑tests, ANOVAs) and effect‑size calculations.
Baseline comparison – A non‑cumulative version (each point = embedding of a single word) is evaluated to understand the contribution of context accumulation.

Results & Findings

Group discrimination – In the neurodegenerative fluency dataset, patients showed significantly lower average velocity and higher centroid distance, indicating slower, more scattered semantic jumps compared with healthy participants.
Cross‑language consistency – Italian and German property‑listing tasks produced similar metric patterns, confirming that the trajectory approach generalizes beyond English.
Embedding model parity – Whether using BERT‑base, RoBERTa‑large, or multilingual BERT, the resulting trajectory statistics were statistically indistinguishable, underscoring that the method taps into a shared semantic geometry learned by these models.
Cumulative advantage – For sequences longer than ~10 words, cumulative embeddings yielded higher classification accuracy (≈ 78 % vs. 65 % for non‑cumulative) because the accumulated context stabilizes the trajectory. For very short lists (< 5 words), the non‑cumulative approach performed marginally better.
Entropy as a diagnostic cue – Higher entropy correlated with greater lexical diversity in healthy speakers, while lower entropy was a hallmark of constrained or impaired output (e.g., in the swear‑word fluency task where participants quickly exhausted the limited lexical pool).

Practical Implications

Rapid clinical screening – Developers of digital health tools can embed this trajectory analysis into speech‑or‑text‑based apps to flag early signs of cognitive decline without needing expert linguists to annotate responses.
Multilingual AI diagnostics – Because the method works with multilingual transformer models, the same pipeline can be deployed across countries, enabling consistent cross‑cultural assessments.
Human‑AI interaction research – Understanding “semantic navigation speed” could inform adaptive chatbots that adjust response complexity based on a user’s current trajectory (e.g., slowing down when the user’s velocity drops).
Benchmarking artificial cognition – Researchers can compare the trajectories of language models (e.g., GPT‑4 generating a list) against human trajectories to quantify how “human‑like” a model’s semantic search is.
Feature engineering for downstream ML – The trajectory metrics (velocity, entropy, etc.) can serve as compact, interpretable features for classification models in neuropsychology, psycholinguistics, or even user‑behavior analytics.

Limitations & Future Work

Data sparsity for rare languages – The study only covered English, Italian, and German; extending to low‑resource languages may require larger multilingual models or fine‑tuning.
Assumption of linear accumulation – Summing embeddings treats each word as equally contributing to the semantic context; alternative compositional schemes (e.g., attention‑weighted accumulation) could capture nuance better.
Temporal granularity – The current approach uses word order only; incorporating actual speech timing (pauses, articulation rate) could enrich velocity/acceleration measures.
Clinical validation – While promising, the method needs prospective validation on larger, more diverse patient cohorts before deployment in medical settings.
Explainability for clinicians – Translating abstract geometric metrics into actionable clinical insights will require user‑friendly visualizations and domain‑specific thresholds.

Bottom line: By turning a simple list of words into a navigable path through a transformer‑derived semantic space, this work offers developers a ready‑to‑use, language‑agnostic toolkit for quantifying how people think, speak, and—crucially—how those processes change when cognition is impaired.

Authors

Felipe D. Toro-Hernández
Jesuino Vieira Filho
Rodrigo M. Cabral-Carvalho

Paper Information

arXiv ID: 2602.05971v1
Categories: cs.CL, cs.LG, q-bio.NC
Published: February 5, 2026
PDF: Download PDF

[Paper] Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

[Paper] Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay

[Paper] Uncovering Cross-Objective Interference in Multi-Objective Alignment

[Paper] The Representational Geometry of Number