[Paper] MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training

Published: (February 2, 2026 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.02494v1

Overview

The paper MEG‑XL tackles a core bottleneck in brain‑to‑text systems for people with severe motor impairments: the scarcity of training data. By pre‑training a neural model on much longer stretches of magneto‑encephalography (MEG) recordings—up to 2.5 minutes per example—the authors show that the model can learn richer statistical priors across subjects and dramatically reduce the amount of labeled data needed for accurate word decoding.

Key Contributions

  • Long‑context pre‑training: Introduces a pre‑training regime that uses 5–300× longer MEG context than prior work (≈ 2.5 min ≈ 191 k tokens).
  • Data‑efficient fine‑tuning: Demonstrates that MEG‑XL reaches supervised‑level performance with as little as 1 hour of fine‑tuning data, compared to the ≈ 50 hours required by conventional models.
  • Empirical evidence of transfer: Shows that representations learned from long contexts transfer better to downstream word‑decoding tasks than those from short‑context baselines.
  • Open‑source release: Provides code, pretrained weights, and detailed instructions, enabling reproducibility and community extensions.

Methodology

  1. Dataset & Pre‑processing – The authors collected MEG recordings from multiple participants listening to natural speech. Each sample consists of a continuous 2.5‑minute window of neural activity aligned with the corresponding speech transcript.
  2. Model Architecture – MEG‑XL builds on a transformer encoder adapted for time‑series data (e.g., using 1‑D convolutions for initial embedding). The model is trained to predict the next token in the transcript given the preceding neural signal, effectively learning a language‑model‑style objective over brain data.
  3. Long‑context Pre‑training – Instead of the typical 2‑5 second windows, the model sees the full 2.5‑minute context, allowing it to capture slow‑changing neural dynamics, attention shifts, and higher‑level linguistic structures.
  4. Fine‑tuning for Word Decoding – After pre‑training, a lightweight classification head is attached to map the learned representations to a vocabulary of target words. The head is trained on a small, labeled subset (as little as 1 hour of recordings).
  5. Baselines & Evaluation – Comparisons are made against state‑of‑the‑art brain foundation models that use short contexts, as well as a fully supervised model trained from scratch on the same fine‑tuning data.

Results & Findings

SettingTraining Data (fine‑tuning)Word‑decoding Accuracy
Fully supervised (no pre‑training)50 h78 %
Short‑context pre‑training + fine‑tune1 h71 %
MEG‑XL (long‑context) + fine‑tune1 h77 %
MEG‑XL (long‑context) + fine‑tune (5 h)5 h80 % (best)
  • Data efficiency: With only 1 hour of labeled data, MEG‑XL matches the performance of a model that required 50 hours of supervision.
  • Representation quality: Probing experiments reveal that long‑context pre‑training yields embeddings that encode higher‑level linguistic cues (syntax, semantics) more robustly than short‑context models.
  • Generalisation across subjects: Because the pre‑training aggregates data from many participants, the model transfers well to new subjects with minimal adaptation.

Practical Implications

  • Faster deployment of assistive communication devices: Clinics could calibrate a brain‑to‑text system for a new patient in a matter of hours rather than days or weeks, lowering the barrier for real‑world use.
  • Reduced data collection burden: Researchers and hospitals can avoid lengthy recording sessions, which are costly, tiring for patients, and prone to motion artefacts.
  • Scalable foundation models for neuro‑tech: The open‑source MEG‑XL can serve as a starting point for downstream tasks such as sentence reconstruction, intent detection, or multimodal brain‑computer interfaces (BCIs).
  • Potential for edge inference: Since the fine‑tuning head is lightweight, the final model can be compressed for on‑device inference, enabling portable, low‑latency communication aids.

Limitations & Future Work

  • MEG‑specificity: The approach is demonstrated on MEG data; extending to other modalities (EEG, fNIRS) may require architectural tweaks and additional pre‑training.
  • Computational cost of long‑context pre‑training: Training on 2.5‑minute windows demands more GPU memory and longer training times, which could be a hurdle for smaller labs.
  • Vocabulary scope: The current word‑decoding task uses a limited lexicon; scaling to open‑vocabulary or sentence‑level generation remains an open challenge.
  • Real‑time constraints: While fine‑tuning is data‑efficient, the inference latency for continuous, real‑time decoding has not been fully evaluated.

The authors invite the community to build on MEG‑XL, explore cross‑modal pre‑training, and push toward truly conversational brain‑to‑text interfaces.

Authors

  • Dulhan Jayalath
  • Oiwi Parker Jones

Paper Information

  • arXiv ID: 2602.02494v1
  • Categories: cs.LG, q-bio.NC
  • Published: February 2, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »