[Paper] Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space

Published: (February 5, 2026 at 01:23 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.05971v1

Overview

This paper proposes a novel way to look at how people generate concepts—think of naming animals, listing properties, or doing a verbal‑fluency test—by treating each word they utter as a step through a high‑dimensional embedding space (the same kind of space that modern transformer models like BERT or RoBERTa use). By turning a sequence of spoken words into a semantic trajectory, the authors can measure “movement” (distance, speed, direction) and compare those dynamics across languages, tasks, and even clinical populations.

Key Contributions

  • Trajectory‑based framework: Introduces cumulative word‑embedding vectors as points in a continuous space, turning any concept‑production task into a geometric path.
  • Rich set of metrics: Defines scalar and vectorial measures (e.g., distance to next word, distance to centroid, entropy, velocity, acceleration) that capture both what is being said and how it is being navigated.
  • Cross‑lingual, cross‑task validation: Applies the method to four datasets (English neurodegenerative fluency, English swear‑word fluency, Italian property listing, German property listing) and shows consistent patterns.
  • Clinical relevance: Demonstrates that trajectory metrics can separate clinical groups (e.g., patients with neurodegenerative disease vs. healthy controls) without labor‑intensive manual annotation.
  • Model‑agnostic robustness: Finds that different transformer‑based embedding models (e.g., BERT, RoBERTa, multilingual models) yield highly similar trajectory statistics, suggesting the approach is not tied to a specific architecture.
  • Cumulative vs. non‑cumulative embeddings: Shows that cumulative embeddings (adding each new word to the running sum) work best for longer utterances, while a non‑cumulative (single‑word) view may be preferable for very short sequences.

Methodology

  1. Data collection – Participants perform a concept production task (e.g., “list as many animals as you can”). The spoken or typed responses are tokenized into a chronological list of words.
  2. Embedding extraction – Each word is fed to a pre‑trained transformer text encoder (BERT, RoBERTa, multilingual BERT, etc.) to obtain a dense vector (typically 768‑dimensional).
  3. Cumulative representation – For the i‑th word, the authors compute the sum (or average) of embeddings from the first word up to the i‑th word, producing a point pᵢ in the embedding space. The full list [p₁, p₂, …, pₙ] is the semantic trajectory.
  4. Metric computation – From the trajectory they derive:
    • Δ‑distance: Euclidean distance between consecutive points (how far the meaning jumps).
    • Centroid distance: Distance from each point to the overall mean of the trajectory (how “central” the current concept is).
    • Entropy: Shannon entropy of the distribution of distances, reflecting variability.
    • Velocity & acceleration: First and second temporal derivatives of the trajectory (rate of change and its change).
  5. Statistical analysis – Metrics are compared across groups (e.g., patients vs. controls) and across languages using standard tests (t‑tests, ANOVAs) and effect‑size calculations.
  6. Baseline comparison – A non‑cumulative version (each point = embedding of a single word) is evaluated to understand the contribution of context accumulation.

Results & Findings

  • Group discrimination – In the neurodegenerative fluency dataset, patients showed significantly lower average velocity and higher centroid distance, indicating slower, more scattered semantic jumps compared with healthy participants.
  • Cross‑language consistency – Italian and German property‑listing tasks produced similar metric patterns, confirming that the trajectory approach generalizes beyond English.
  • Embedding model parity – Whether using BERT‑base, RoBERTa‑large, or multilingual BERT, the resulting trajectory statistics were statistically indistinguishable, underscoring that the method taps into a shared semantic geometry learned by these models.
  • Cumulative advantage – For sequences longer than ~10 words, cumulative embeddings yielded higher classification accuracy (≈ 78 % vs. 65 % for non‑cumulative) because the accumulated context stabilizes the trajectory. For very short lists (< 5 words), the non‑cumulative approach performed marginally better.
  • Entropy as a diagnostic cue – Higher entropy correlated with greater lexical diversity in healthy speakers, while lower entropy was a hallmark of constrained or impaired output (e.g., in the swear‑word fluency task where participants quickly exhausted the limited lexical pool).

Practical Implications

  • Rapid clinical screening – Developers of digital health tools can embed this trajectory analysis into speech‑or‑text‑based apps to flag early signs of cognitive decline without needing expert linguists to annotate responses.
  • Multilingual AI diagnostics – Because the method works with multilingual transformer models, the same pipeline can be deployed across countries, enabling consistent cross‑cultural assessments.
  • Human‑AI interaction research – Understanding “semantic navigation speed” could inform adaptive chatbots that adjust response complexity based on a user’s current trajectory (e.g., slowing down when the user’s velocity drops).
  • Benchmarking artificial cognition – Researchers can compare the trajectories of language models (e.g., GPT‑4 generating a list) against human trajectories to quantify how “human‑like” a model’s semantic search is.
  • Feature engineering for downstream ML – The trajectory metrics (velocity, entropy, etc.) can serve as compact, interpretable features for classification models in neuropsychology, psycholinguistics, or even user‑behavior analytics.

Limitations & Future Work

  • Data sparsity for rare languages – The study only covered English, Italian, and German; extending to low‑resource languages may require larger multilingual models or fine‑tuning.
  • Assumption of linear accumulation – Summing embeddings treats each word as equally contributing to the semantic context; alternative compositional schemes (e.g., attention‑weighted accumulation) could capture nuance better.
  • Temporal granularity – The current approach uses word order only; incorporating actual speech timing (pauses, articulation rate) could enrich velocity/acceleration measures.
  • Clinical validation – While promising, the method needs prospective validation on larger, more diverse patient cohorts before deployment in medical settings.
  • Explainability for clinicians – Translating abstract geometric metrics into actionable clinical insights will require user‑friendly visualizations and domain‑specific thresholds.

Bottom line: By turning a simple list of words into a navigable path through a transformer‑derived semantic space, this work offers developers a ready‑to‑use, language‑agnostic toolkit for quantifying how people think, speak, and—crucially—how those processes change when cognition is impaired.

Authors

  • Felipe D. Toro-Hernández
  • Jesuino Vieira Filho
  • Rodrigo M. Cabral-Carvalho

Paper Information

  • arXiv ID: 2602.05971v1
  • Categories: cs.CL, cs.LG, q-bio.NC
  • Published: February 5, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »