[Paper] Spatio-Spectroscopic Representation Learning using Unsupervised Convolutional Long-Short Term Memory Networks

Published: 3 days ago (February 20, 2026 at 01:48 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.18426v1

Overview

A new unsupervised deep‑learning framework tackles the massive, multi‑dimensional data produced by modern Integral Field Spectroscopy (IFS) surveys. By marrying convolutional layers with Long‑Short Term Memory (LSTM) units, the authors automatically learn compact representations of both spatial and spectral information for ~9 000 galaxies from the MaNGA survey—opening the door to scalable, data‑driven discovery in astrophysics.

Key Contributions

Hybrid Conv‑LSTM Autoencoder: First application of a convolutional LSTM autoencoder to IFS cubes, preserving spatial context while modeling spectral sequences.
Fully Unsupervised Feature Learning: No hand‑crafted labels are required; the network discovers latent structures directly from raw data.
Cross‑Dimensional Embedding: Generates a single low‑dimensional vector that captures information from 19 optical emission lines across the full galaxy image.
Anomaly Detection in AGN: Demonstrates the model’s ability to flag unusual active galactic nuclei (AGN) by comparing reconstruction errors and latent‑space distances.
Scalable Pipeline: End‑to‑end training on thousands of IFS cubes using commodity GPUs, showing feasibility for upcoming larger surveys (e.g., HECTOR, SDSS‑V).

Methodology

Data Preparation – Each galaxy’s IFS cube (spatial × spectral) is sliced into a stack of 19 narrow‑band images, one per emission line. The stack is treated as a temporal sequence where the “time” axis corresponds to wavelength.
Network Architecture –
- Encoder: A series of 2‑D convolutional layers extract spatial features from each slice, feeding into an LSTM that learns how these features evolve across the spectral dimension.
- Latent Space – The LSTM’s final hidden state is compressed to a 128‑dimensional vector (the learned representation).
- Decoder – Mirrors the encoder: the latent vector is expanded by a reverse LSTM, then de‑convolution layers reconstruct the original 19‑channel image stack.
Training Objective – Mean‑squared reconstruction loss across all pixels and channels; no labels are used.
Evaluation – Reconstruction error and latent‑space clustering are used to identify outliers. A subset of 290 known AGN is examined to illustrate the model’s diagnostic power.

Results & Findings

High‑Fidelity Reconstructions – The autoencoder reproduces >95 % of the variance in the original cubes, preserving subtle line‑ratio gradients that are astrophysically meaningful.
Meaningful Latent Structure – t‑SNE/UMAP visualizations of the 128‑D embeddings separate galaxies by morphology, star‑formation rate, and metallicity, despite the model never seeing these labels.
Anomalous AGN Detection – A handful of AGN exhibit unusually large reconstruction errors or occupy isolated regions in latent space. Follow‑up inspection reveals rare spectral features (e.g., extreme line broadening, off‑nuclear emission) that merit further scientific study.

Practical Implications

Automated Pre‑Processing – The encoder can serve as a fast, learned compressor for IFS data, reducing storage and I/O costs for downstream pipelines.
Feature Extraction for ML – The latent vectors can be fed directly into classification, regression, or clustering models, bypassing expensive handcrafted feature engineering.
Real‑Time Anomaly Alerts – In survey operations, the reconstruction error can trigger alerts for unusual objects, enabling rapid follow‑up with telescopes or other instruments.
Transferable Architecture – The Conv‑LSTM design is applicable to any 3‑D scientific data where one axis behaves like a sequence (e.g., hyperspectral imaging, medical MRI time‑series).

Limitations & Future Work

Spectral Resolution Constraint – The model treats each emission line as a discrete “time step,” which may miss fine‑grained velocity information within lines.
Interpretability – While the latent space clusters meaningfully, mapping individual dimensions to physical parameters remains an open challenge.
Scalability to Larger Surveys – Training on >100 000 cubes will require distributed training strategies and memory‑efficient data loaders.
Extension to Multi‑Instrument Data – Future work could fuse IFS with complementary modalities (e.g., photometry, radio maps) to build truly multimodal galaxy representations.

Bottom line: By leveraging a convolutional LSTM autoencoder, this work shows that unsupervised deep learning can turn the high‑dimensional, spectro‑spatial data of modern galaxy surveys into compact, actionable representations—paving the way for faster, more automated discovery pipelines in astronomy and beyond.

Authors

Kameswara Bharadwaj Mantha
Lucy Fortson
Ramanakumar Sankar
Claudia Scarlata
Chris Lintott
Sandor Kruk
Mike Walmsley
Hugh Dickinson
Karen Masters
Brooke Simmons
Rebecca Smethurst

Paper Information

arXiv ID: 2602.18426v1
Categories: astro-ph.GA, cs.CV
Published: February 20, 2026
PDF: Download PDF

[Paper] Spatio-Spectroscopic Representation Learning using Unsupervised Convolutional Long-Short Term Memory Networks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory

[Paper] SARAH: Spatially Aware Real-time Agentic Humans

[Paper] The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

[Paper] CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation