Beyond the Standard Model :Introducing the 'Cousins' of the Memory-Native Neural Network Family
Source: Dev.to
The “Cousin” Philosophy
Unlike the standard models found in api.py, these cousins are specialized tools:
- Standalone architectures
- Require dedicated C‑compiled backends
- Not general‑purpose models
- Designed for specific, complex temporal challenges that demand unique ways of remembering
1. DTPN — The Universal Persistence Hybrid
Dual‑Track Persistence Network
If the standard AMN is a master of context, DTPN is the master of persistence. It bridges immediate reaction and long‑term knowledge through three distinct retention tracks.
Track 1: The Echo (Temporal Fluidity)
- Retains a fraction of the immediate previous output (
βfactor) - Ensures smooth transitions between time steps
Track 2: The State (Stateful Neurons)
- Individual neurons maintain a decaying internal reservoir (
αfactor) - Acts as a medium‑term memory buffer
Track 3: The Manifold (Global Memory)
- A shared associative whiteboard
- Stores long‑term contextual information
Best For: Tracking micro‑fluctuations, medium‑term states, and long‑term facts simultaneously.
2. Hyper‑AMN — The Multi‑Head Specialist
Multi‑Head Associative Manifold
While a standard AMN uses a single global memory manifold, Hyper‑AMN introduces a multi‑head memory system, akin to a brain with specialized compartments.
Head Gating Mechanism
Information is routed into domain‑specific manifolds:
- Spatial Manifold – Positional and structural patterns
- Emotional Manifold – Sentiment and tone
- Logical Manifold – Reasoning and causal links
Best For: Complex data streams where categorical separation (e.g., how something is said vs what is said) is essential.
3. SGW‑AMN — The “Conscious” Bottleneck
Sparse Global Workspace
Inspired by Global Workspace Theory, SGW‑AMN proposes that memory is strongest when forced through a bottleneck.
- Thousands of neurons compete
- Only a few enter a tiny global workspace
- Memory becomes attention by compression
This competitive routing ensures that only the most salient features are stored.
Best For: Feature extraction and high‑noise environments where identifying the signal matters more than raw data volume.
4. NDM — The Fluid Network
Neural Differential Manifolds
NDM abandons static weight updates in favor of continuous weight evolution using Ordinary Differential Equations (ODEs).
# Example of continuous weight evolution
dW/dt = f(W, x, t)
- Learning follows Hebbian traces (“neurons that fire together, wire together”)
- The network rewires itself dynamically, achieving true neuroplasticity where structure and learning are inseparable.
Best For: Non‑stationary environments where rules change faster than traditional training can adapt.
Summary of the Cousins
| Architecture | Key Innovation | Best For |
|---|---|---|
| DTPN | Triple‑Track Persistence | Maximum data retention across all time scales |
| Hyper‑AMN | Domain‑Specific Heads | Logic vs Emotion vs Structure separation |
| SGW‑AMN | Competitive Bottleneck | Extracting signal from noise |
| NDM | ODE Weight Evolution | Constantly changing environments |
The Experiment Continues
These cousins live on the fringe of memory‑native research and demonstrate that there is no one‑size‑fits‑all intelligence:
- Sometimes you need a bottleneck (SGW‑AMN)
- Sometimes you need specialization (Hyper‑AMN)
- Sometimes your weights must flow like liquid (NDM)
The project remains open‑source:
- Code is available
- C‑libraries are ready to compile
- Exploration has only just begun
Note on Development
While these architectures were originally conceptualized with assistance from Claude Sonnet 4.5, they have been manually edited, refined, and tested to function as standalone research‑grade models.
🔗 Join the experiment: GitHub Repository