[Paper] Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space
Source: arXiv - 2602.05971v1
Overview
This paper proposes a novel way to look at how people generate concepts—think of naming animals, listing properties, or doing a verbal‑fluency test—by treating each word they utter as a step through a high‑dimensional embedding space (the same kind of space that modern transformer models like BERT or RoBERTa use). By turning a sequence of spoken words into a semantic trajectory, the authors can measure “movement” (distance, speed, direction) and compare those dynamics across languages, tasks, and even clinical populations.
Key Contributions
- Trajectory‑based framework: Introduces cumulative word‑embedding vectors as points in a continuous space, turning any concept‑production task into a geometric path.
- Rich set of metrics: Defines scalar and vectorial measures (e.g., distance to next word, distance to centroid, entropy, velocity, acceleration) that capture both what is being said and how it is being navigated.
- Cross‑lingual, cross‑task validation: Applies the method to four datasets (English neurodegenerative fluency, English swear‑word fluency, Italian property listing, German property listing) and shows consistent patterns.
- Clinical relevance: Demonstrates that trajectory metrics can separate clinical groups (e.g., patients with neurodegenerative disease vs. healthy controls) without labor‑intensive manual annotation.
- Model‑agnostic robustness: Finds that different transformer‑based embedding models (e.g., BERT, RoBERTa, multilingual models) yield highly similar trajectory statistics, suggesting the approach is not tied to a specific architecture.
- Cumulative vs. non‑cumulative embeddings: Shows that cumulative embeddings (adding each new word to the running sum) work best for longer utterances, while a non‑cumulative (single‑word) view may be preferable for very short sequences.
Methodology
- Data collection – Participants perform a concept production task (e.g., “list as many animals as you can”). The spoken or typed responses are tokenized into a chronological list of words.
- Embedding extraction – Each word is fed to a pre‑trained transformer text encoder (BERT, RoBERTa, multilingual BERT, etc.) to obtain a dense vector (typically 768‑dimensional).
- Cumulative representation – For the i‑th word, the authors compute the sum (or average) of embeddings from the first word up to the i‑th word, producing a point pᵢ in the embedding space. The full list [p₁, p₂, …, pₙ] is the semantic trajectory.
- Metric computation – From the trajectory they derive:
- Δ‑distance: Euclidean distance between consecutive points (how far the meaning jumps).
- Centroid distance: Distance from each point to the overall mean of the trajectory (how “central” the current concept is).
- Entropy: Shannon entropy of the distribution of distances, reflecting variability.
- Velocity & acceleration: First and second temporal derivatives of the trajectory (rate of change and its change).
- Statistical analysis – Metrics are compared across groups (e.g., patients vs. controls) and across languages using standard tests (t‑tests, ANOVAs) and effect‑size calculations.
- Baseline comparison – A non‑cumulative version (each point = embedding of a single word) is evaluated to understand the contribution of context accumulation.
Results & Findings
- Group discrimination – In the neurodegenerative fluency dataset, patients showed significantly lower average velocity and higher centroid distance, indicating slower, more scattered semantic jumps compared with healthy participants.
- Cross‑language consistency – Italian and German property‑listing tasks produced similar metric patterns, confirming that the trajectory approach generalizes beyond English.
- Embedding model parity – Whether using BERT‑base, RoBERTa‑large, or multilingual BERT, the resulting trajectory statistics were statistically indistinguishable, underscoring that the method taps into a shared semantic geometry learned by these models.
- Cumulative advantage – For sequences longer than ~10 words, cumulative embeddings yielded higher classification accuracy (≈ 78 % vs. 65 % for non‑cumulative) because the accumulated context stabilizes the trajectory. For very short lists (< 5 words), the non‑cumulative approach performed marginally better.
- Entropy as a diagnostic cue – Higher entropy correlated with greater lexical diversity in healthy speakers, while lower entropy was a hallmark of constrained or impaired output (e.g., in the swear‑word fluency task where participants quickly exhausted the limited lexical pool).
Practical Implications
- Rapid clinical screening – Developers of digital health tools can embed this trajectory analysis into speech‑or‑text‑based apps to flag early signs of cognitive decline without needing expert linguists to annotate responses.
- Multilingual AI diagnostics – Because the method works with multilingual transformer models, the same pipeline can be deployed across countries, enabling consistent cross‑cultural assessments.
- Human‑AI interaction research – Understanding “semantic navigation speed” could inform adaptive chatbots that adjust response complexity based on a user’s current trajectory (e.g., slowing down when the user’s velocity drops).
- Benchmarking artificial cognition – Researchers can compare the trajectories of language models (e.g., GPT‑4 generating a list) against human trajectories to quantify how “human‑like” a model’s semantic search is.
- Feature engineering for downstream ML – The trajectory metrics (velocity, entropy, etc.) can serve as compact, interpretable features for classification models in neuropsychology, psycholinguistics, or even user‑behavior analytics.
Limitations & Future Work
- Data sparsity for rare languages – The study only covered English, Italian, and German; extending to low‑resource languages may require larger multilingual models or fine‑tuning.
- Assumption of linear accumulation – Summing embeddings treats each word as equally contributing to the semantic context; alternative compositional schemes (e.g., attention‑weighted accumulation) could capture nuance better.
- Temporal granularity – The current approach uses word order only; incorporating actual speech timing (pauses, articulation rate) could enrich velocity/acceleration measures.
- Clinical validation – While promising, the method needs prospective validation on larger, more diverse patient cohorts before deployment in medical settings.
- Explainability for clinicians – Translating abstract geometric metrics into actionable clinical insights will require user‑friendly visualizations and domain‑specific thresholds.
Bottom line: By turning a simple list of words into a navigable path through a transformer‑derived semantic space, this work offers developers a ready‑to‑use, language‑agnostic toolkit for quantifying how people think, speak, and—crucially—how those processes change when cognition is impaired.
Authors
- Felipe D. Toro-Hernández
- Jesuino Vieira Filho
- Rodrigo M. Cabral-Carvalho
Paper Information
- arXiv ID: 2602.05971v1
- Categories: cs.CL, cs.LG, q-bio.NC
- Published: February 5, 2026
- PDF: Download PDF