AWS re:Invent 2025 -The new AI architecture that adapts and thinks just like humans (STP108)
Source: Dev.to
Overview
AWS re:Invent 2025 – The new AI architecture that adapts and thinks just like humans (STP108)
In this session Jan Chorowski and Victor Szczerba from Pathway introduce Baby Dragon Hatchling, a post‑Transformer AI architecture inspired by the brain’s sparse neural networks. They argue that Transformers lack continuous learning, are inefficient, and are unsuitable for long‑running enterprise tasks. Baby Dragon features sparse activation and connectivity, enabling:
- Continuous learning from thin datasets
- Extended attention spans beyond two hours
- Improved energy efficiency
- Model observability for regulated environments
The architecture is positioned as a solution for enterprise needs through “sticky inference” with corporate data, with a planned mid‑year launch in partnership with AWS.
This article is auto‑generated from the original presentation; minor typos or inaccuracies may be present.
The Transformer’s Limitations and the Brain‑Inspired Baby Dragon Hatchling Architecture
“The Transformer is on its way out. Its days are numbered.” – Jan Chorowski
Why Transformers struggle with long‑running tasks
- Limited long‑term memory – Models are trained once, released as a static snapshot, and do not adapt after deployment.
- Inefficiency – Incremental benchmark gains require ten‑fold increases in model size and data, driving up cost and data‑collection effort.
- Lack of interpretability – Bugs are hard to diagnose; fixing them often means scaling up data or switching models, which merely trades one failure for another.
These constraints make dense, one‑size‑fits‑all Transformers ill‑suited for enterprise workloads that demand continuous improvement, custom data, and regulatory compliance.
The brain as a model for continuous learning
The human brain learns continuously, integrating new information on the fly. Its key properties contrast sharply with dense Transformers:
| Brain | Transformer |
|---|---|
| Sparse activation – only relevant neurons fire | Dense activation – every layer processes every input |
| Dynamic connectivity – pathways strengthen or weaken with experience | Static connectivity – weights are fixed after training |
| Energy‑efficient – low power consumption for complex tasks | Energy‑hungry – scaling up models dramatically increases compute cost |
Baby Dragon Hatchling: core ideas
- Sparse activation & connectivity – mimics the brain’s selective firing, reducing compute and energy use.
- Continuous learning – the model updates its parameters during operation, allowing it to improve from thin, domain‑specific datasets.
- Extended attention windows – supports coherent reasoning over tasks lasting hours, not just a few sentences.
- Observability & auditability – built‑in tooling for monitoring model behavior, essential for regulated industries.
Enterprise implications
- Sticky inference – models can retain corporate knowledge securely, reducing the need to re‑upload large datasets.
- Mid‑year launch with AWS – integration with AWS services (e.g., SageMaker, Bedrock) will enable seamless deployment and scaling.
- Energy and cost savings – sparsity translates to lower inference costs, making large‑scale, long‑running AI applications financially viable.




