[Paper] Spatio-Spectroscopic Representation Learning using Unsupervised Convolutional Long-Short Term Memory Networks
Source: arXiv - 2602.18426v1
Overview
A new unsupervised deep‑learning framework tackles the massive, multi‑dimensional data produced by modern Integral Field Spectroscopy (IFS) surveys. By marrying convolutional layers with Long‑Short Term Memory (LSTM) units, the authors automatically learn compact representations of both spatial and spectral information for ~9 000 galaxies from the MaNGA survey—opening the door to scalable, data‑driven discovery in astrophysics.
Key Contributions
- Hybrid Conv‑LSTM Autoencoder: First application of a convolutional LSTM autoencoder to IFS cubes, preserving spatial context while modeling spectral sequences.
- Fully Unsupervised Feature Learning: No hand‑crafted labels are required; the network discovers latent structures directly from raw data.
- Cross‑Dimensional Embedding: Generates a single low‑dimensional vector that captures information from 19 optical emission lines across the full galaxy image.
- Anomaly Detection in AGN: Demonstrates the model’s ability to flag unusual active galactic nuclei (AGN) by comparing reconstruction errors and latent‑space distances.
- Scalable Pipeline: End‑to‑end training on thousands of IFS cubes using commodity GPUs, showing feasibility for upcoming larger surveys (e.g., HECTOR, SDSS‑V).
Methodology
- Data Preparation – Each galaxy’s IFS cube (spatial × spectral) is sliced into a stack of 19 narrow‑band images, one per emission line. The stack is treated as a temporal sequence where the “time” axis corresponds to wavelength.
- Network Architecture –
- Encoder: A series of 2‑D convolutional layers extract spatial features from each slice, feeding into an LSTM that learns how these features evolve across the spectral dimension.
- Latent Space – The LSTM’s final hidden state is compressed to a 128‑dimensional vector (the learned representation).
- Decoder – Mirrors the encoder: the latent vector is expanded by a reverse LSTM, then de‑convolution layers reconstruct the original 19‑channel image stack.
- Training Objective – Mean‑squared reconstruction loss across all pixels and channels; no labels are used.
- Evaluation – Reconstruction error and latent‑space clustering are used to identify outliers. A subset of 290 known AGN is examined to illustrate the model’s diagnostic power.
Results & Findings
- High‑Fidelity Reconstructions – The autoencoder reproduces >95 % of the variance in the original cubes, preserving subtle line‑ratio gradients that are astrophysically meaningful.
- Meaningful Latent Structure – t‑SNE/UMAP visualizations of the 128‑D embeddings separate galaxies by morphology, star‑formation rate, and metallicity, despite the model never seeing these labels.
- Anomalous AGN Detection – A handful of AGN exhibit unusually large reconstruction errors or occupy isolated regions in latent space. Follow‑up inspection reveals rare spectral features (e.g., extreme line broadening, off‑nuclear emission) that merit further scientific study.
Practical Implications
- Automated Pre‑Processing – The encoder can serve as a fast, learned compressor for IFS data, reducing storage and I/O costs for downstream pipelines.
- Feature Extraction for ML – The latent vectors can be fed directly into classification, regression, or clustering models, bypassing expensive handcrafted feature engineering.
- Real‑Time Anomaly Alerts – In survey operations, the reconstruction error can trigger alerts for unusual objects, enabling rapid follow‑up with telescopes or other instruments.
- Transferable Architecture – The Conv‑LSTM design is applicable to any 3‑D scientific data where one axis behaves like a sequence (e.g., hyperspectral imaging, medical MRI time‑series).
Limitations & Future Work
- Spectral Resolution Constraint – The model treats each emission line as a discrete “time step,” which may miss fine‑grained velocity information within lines.
- Interpretability – While the latent space clusters meaningfully, mapping individual dimensions to physical parameters remains an open challenge.
- Scalability to Larger Surveys – Training on >100 000 cubes will require distributed training strategies and memory‑efficient data loaders.
- Extension to Multi‑Instrument Data – Future work could fuse IFS with complementary modalities (e.g., photometry, radio maps) to build truly multimodal galaxy representations.
Bottom line: By leveraging a convolutional LSTM autoencoder, this work shows that unsupervised deep learning can turn the high‑dimensional, spectro‑spatial data of modern galaxy surveys into compact, actionable representations—paving the way for faster, more automated discovery pipelines in astronomy and beyond.
Authors
- Kameswara Bharadwaj Mantha
- Lucy Fortson
- Ramanakumar Sankar
- Claudia Scarlata
- Chris Lintott
- Sandor Kruk
- Mike Walmsley
- Hugh Dickinson
- Karen Masters
- Brooke Simmons
- Rebecca Smethurst
Paper Information
- arXiv ID: 2602.18426v1
- Categories: astro-ph.GA, cs.CV
- Published: February 20, 2026
- PDF: Download PDF