[Paper] Conditional Morphogenesis: Emergent Generation of Structural Digits via Neural Cellular Automata
Source: arXiv - 2512.08360v1
Overview
A new paper introduces Conditional Neural Cellular Automata (c‑NCA), a lightweight neural model that can grow ten different MNIST digit shapes from a single pixel seed, simply by broadcasting a one‑hot class vector. By staying true to the strictly local, translation‑equivariant rules of cellular automata, the work shows how conditional generation—usually the domain of GANs or VAEs—can emerge from purely local interactions, opening a path toward more biologically‑inspired, scalable generative systems.
Key Contributions
- c‑NCA architecture: Extends differentiable Neural Cellular Automata with a spatially broadcasted class condition, enabling a single set of local update rules to produce multiple distinct topologies.
- Class‑conditional structural generation: Demonstrates that a one‑hot digit label injected into each cell’s perception field is enough to break symmetry and steer the automaton toward ten separate geometric attractors (the MNIST digits).
- Strict locality & translation equivariance: Unlike most deep generative models, the c‑NCA never looks beyond its immediate neighbourhood, preserving the core cellular‑automata principle.
- Robust convergence from minimal seeds: Shows stable growth from a single active pixel to a full digit, with the system tolerating noise and perturbations similarly to biological morphogenesis.
- Open‑source implementation & lightweight footprint: The model runs with a few thousand parameters, making it feasible for edge devices or real‑time interactive demos.
Methodology
- Base NCA – Each cell stores a hidden state vector. At every timestep, a 3×3 convolution (the “perception field”) extracts neighbours’ states, which are fed into a small MLP that predicts the cell’s next state.
- Condition injection – A one‑hot vector representing the target digit (0‑9) is concatenated to the perception vector for every cell before the MLP. This broadcasted condition is the only global information the system receives.
- Training loop – Starting from a seed image (a single active pixel), the automaton is rolled out for a fixed number of steps (e.g., 64). The final canvas is compared to the ground‑truth digit using a combination of L2 loss (pixel‑wise) and a perceptual loss from a pretrained classifier to encourage correct shape formation. Gradients flow through the entire rollout, making the whole process differentiable.
- Regularization – To keep the dynamics stable, the authors apply stochastic cell updates (randomly updating a subset each step) and a “death” rule that forces dead cells to stay dead unless revived by neighbours, mimicking biological apoptosis.
All of this is implemented with standard deep‑learning libraries (PyTorch/TensorFlow) and can be trained on a single GPU in under an hour.
Results & Findings
| Metric | Value |
|---|---|
| Final digit accuracy (classifier‑based) | ≈ 98 % |
| Convergence speed (steps to stable shape) | 40–60 iterations |
| Parameter count | ~4 k trainable weights |
| Robustness test (random pixel noise) | Shape recovered in >95 % of trials |
- Distinct attractors: The same rule set reliably converges to ten different digit shapes solely based on the injected class vector.
- Locality suffices: No global receptive field is needed; the model learns to propagate the class signal through neighbour interactions.
- Biological‑like resilience: When random cells are flipped during growth, the automaton often self‑corrects, re‑forming the intended digit.
These findings confirm that conditional generation can be achieved with the same simplicity that drives natural morphogenesis.
Practical Implications
- Edge‑friendly generative AI: With only a few thousand parameters and no need for large convolutional backbones, c‑NCA can run on microcontrollers, IoT devices, or in‑browser WebGL environments for on‑the‑fly pattern synthesis.
- Procedural content creation: Game developers could use c‑NCA to grow terrain features, architectural motifs, or UI icons conditioned on high‑level tags, all while preserving a “hand‑crafted” organic feel.
- Self‑repairing visual systems: Because the automaton can recover from local damage, it could be embedded in display pipelines that need to maintain visual integrity under pixel failures (e.g., e‑ink displays, LED panels).
- Explainable generative rules: The explicit locality of the update function makes it easier to inspect and modify the underlying dynamics compared to opaque GAN generators, facilitating debugging and custom rule injection.
Limitations & Future Work
- Scale of structures: The current experiments are limited to 28×28 MNIST digits; scaling to higher‑resolution or more complex topologies may require deeper state vectors or hierarchical CA designs.
- Condition granularity: Only a one‑hot class vector was explored; richer conditioning (e.g., style vectors, textual prompts) remains an open question.
- Training stability: While convergence is stable for digits, training on highly asymmetric or multi‑object scenes could suffer from mode collapse or oscillations.
- Biological fidelity vs. performance trade‑off: Adding more biologically realistic mechanisms (e.g., diffusion gradients, mechanical forces) could improve realism but increase computational cost.
Future research directions include multi‑scale c‑NCA pipelines, integration with reinforcement learning for goal‑directed growth, and applying the framework to 3D voxel morphogenesis for printable objects.
Authors
- Ali Sakour
Paper Information
- arXiv ID: 2512.08360v1
- Categories: cs.NE, cs.AI, cs.CV, cs.LG
- Published: December 9, 2025
- PDF: Download PDF