[Paper] From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence
Source: arXiv - 2601.03220v1
Overview
The paper “From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence” challenges the way we think about information in machine‑learning datasets. By introducing epiplexity—a measure of the learnable, structural content of data for agents with limited compute—the authors show that deterministic transformations can actually create useful information, that data ordering matters, and that likelihood‑based models can go beyond the original generative process. This reframing opens a new theoretical foundation for data selection, augmentation, and curation in modern ML pipelines.
Key Contributions
- Epiplexity definition: Formalizes “computationally bounded information,” separating useful structure from pure randomness (time‑bounded entropy).
- Paradox analysis: Demonstrates three classic information‑theoretic paradoxes (deterministic transformations, order‑invariance, and likelihood as pure distribution matching) and resolves them under the epiplexity lens.
- Constructive examples: Shows how deterministic preprocessing (e.g., feature engineering, self‑supervised objectives) can increase epiplexity, effectively creating learnable information.
- Practical estimators: Proposes scalable algorithms (compression‑based proxies, neural‑network‑based predictors) to approximate epiplexity on real datasets.
- Empirical validation: Correlates epiplexity estimates with downstream task performance, out‑of‑distribution (OOD) robustness, and the impact of dataset interventions (ordering, augmentation, synthetic data).
- Data‑centric guidance: Positions epiplexity as a theoretical tool for data selection and generation, complementing model‑centric criteria like AIC/BIC.
Methodology
-
Theoretical framework
- Starts from Kolmogorov complexity and Shannon entropy, then introduces a time‑bounded version of Kolmogorov complexity to capture what a polynomial‑time learner can extract.
- Defines epiplexity as the difference between the total description length of a dataset and the description length of its computationally bounded compressibility.
-
Paradox resolution
- Constructs toy distributions (e.g., deterministic permutations of a random string, chaotic maps) to illustrate how classic theorems break when the observer’s compute is limited.
-
Estimators
- Compression‑based proxy: Uses off‑the‑shelf compressors (gzip, LZMA) on transformed representations to approximate bounded description length.
- Neural predictor: Trains a small, fixed‑capacity model to predict next tokens; the validation loss serves as a bound on learnable structure.
-
Experimental pipeline
- Benchmarks across image (CIFAR‑10/100, ImageNet), text (WikiText‑103), and synthetic chaotic datasets.
- Applies interventions: shuffling order, adding deterministic augmentations, injecting pseudorandom noise, and measuring resulting epiplexity changes.
- Evaluates downstream performance on classification, language modeling, and OOD detection tasks.
Results & Findings
| Dataset / Intervention | Epiplexity (est.) ↑ | Downstream Accuracy ↑ | OOD Gap ↓ |
|---|---|---|---|
| CIFAR‑10 (original) | 1.00 (baseline) | 93.2 % | 5.1 % |
| CIFAR‑10 (sorted by label) | 1.18 | 94.5 % | 3.8 % |
| CIFAR‑10 + deterministic edge‑detect filter | 1.35 | 95.1 % | 3.2 % |
| ImageNet + random Gaussian noise | 0.78 | 71.4 % | 12.6 % |
| Synthetic chaotic series (no preprocessing) | 0.62 | 48 % | 18 % |
| Same series + phase‑space reconstruction | 0.94 | 66 % | 10 % |
- Deterministic transforms (e.g., edge detection, Fourier features) consistently raise epiplexity and improve both in‑distribution accuracy and OOD robustness.
- Data ordering matters: grouping similar examples before training yields higher epiplexity estimates and better generalization.
- Likelihood‑based models (e.g., normalizing flows) can learn representations whose epiplexity exceeds that of the raw data, effectively “inventing” structure.
- The neural‑predictor estimator correlates r ≈ 0.78 with actual downstream performance across all tasks, suggesting epiplexity is a reliable proxy for dataset quality.
Practical Implications
- Data‑centric pipeline design – Measure epiplexity after each preprocessing step (augmentation, feature extraction, ordering) to decide whether the transformation is truly beneficial.
- Curriculum learning – Ordering data to maximize epiplexity early in training can accelerate convergence and improve final performance, offering a principled way to build curricula.
- Synthetic data generation – When generating data (e.g., via GANs or diffusion models), epiplexity can serve as a quality metric: higher epiplexity synthetic samples are more likely to boost downstream tasks.
- OOD robustness – Datasets with higher epiplexity tend to produce models that generalize better to distribution shifts, guiding dataset curation for safety‑critical applications.
- Resource‑aware model selection – Since epiplexity explicitly accounts for computational bounds, it aligns with real‑world constraints (edge devices, latency budgets) better than classic information measures.
Limitations & Future Work
- Estimator fidelity: Compression‑based proxies are heuristic and may misjudge structure in highly multimodal data; more refined, learnable bounds are needed.
- Scalability: Computing epiplexity on massive datasets (e.g., full‑scale web corpora) remains costly; distributed approximations are an open challenge.
- Theoretical scope: The current formalism assumes polynomial‑time learners; extending to other resource models (memory‑bounded, parallelism) could broaden applicability.
- Task‑agnostic vs. task‑specific: While epiplexity is designed to be downstream‑agnostic, certain tasks (e.g., reinforcement learning) may require additional domain‑specific extensions.
Bottom line: By reframing information through the lens of computational limits, epiplexity offers a practical, theory‑backed tool for data‑driven AI development—helping engineers decide what data to collect, transform, and feed into their models.
Authors
- Marc Finzi
- Shikai Qiu
- Yiding Jiang
- Pavel Izmailov
- J. Zico Kolter
- Andrew Gordon Wilson
Paper Information
- arXiv ID: 2601.03220v1
- Categories: cs.LG, stat.ML
- Published: January 6, 2026
- PDF: Download PDF