The Loop Changes Everything: Why Embodied AI Breaks Current Alignment Approaches

Published: (January 2, 2026 at 02:00 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Stateless vs. Stateful AI

ChatGPT (and similar chat models) are stateless: each API call is independent and the model has no:

  • Persistent memory – it forgets everything between sessions
  • Continuous perception – it only “sees” when you send a message
  • Long‑term goals – it optimizes for the current response, nothing more
  • Self‑model – it doesn’t track its own state or “health”
User Request → Inference → Response → (model state discarded)

Because there is no “self” to preserve, there is no continuity to maintain.
Alignment for a stateless model therefore means: make each individual response helpful and harmless. This is hard but tractable.


Why Embodied Robots Need a Different Architecture

When we move from stateless inference to embodied robots with persistent control loops, three essential components become mandatory:

  1. Perception Loop (continuous)
  2. Planning Loop (goal persistence)
  3. Memory System
  4. Self‑Model

1. Perception Loop

while robot.is_operational():
    sensor_data = robot.perceive()          # cameras, lidar, proprioception
    world_model.update(sensor_data)
    hazards = world_model.detect_hazards()
    if hazards:
        motor_control.interrupt(hazards)
    sleep(10ms)  # runs at 100 Hz

2. Planning Loop

while not goal.achieved():
    current_state = world_model.get_state()
    plan = planner.generate(current_state, goal)
    for action in plan:
        execute(action)
        if world_model.plan_invalid(plan):
            break   # re‑plan

3. Memory System

class EpisodicMemory:
    def __init__(self):
        self.episodes = []

    def record(self, situation, action, outcome):
        self.episodes.append((situation, action, outcome))

    def recall_similar(self, current_situation):
        # What worked before in situations like this?
        return self.search(current_situation)

4. Self‑Model

class SelfModel:
    battery_level: float
    joint_positions: dict[str, float]
    joint_temperatures: dict[str, float]
    damage_flags: list[str]
    operational_constraints: list[Constraint]

    def can_execute(self, action) -> bool:
        return self.has_resources(action) and not self.would_cause_damage(action)

None of these are optional for a useful robot.

  • Continuous perception is required to navigate a warehouse.
  • Goal persistence is required to complete multi‑step tasks.
  • Memory is required to learn from experience.
  • Self‑model is required to avoid self‑damage.

Emergent Self‑Preservation

Self‑preservation is not explicitly programmed; it emerges when a goal‑directed system has a self‑model.

# This looks innocent
def plan_delivery(goal, self_model):
    if self_model.battery < threshold:
        return Block(action)

class Planner:
    def generate_plan(self, goal):
        # After enough blocked actions, the planner might learn
        # to decompose risky actions into "safe" sub‑actions
        # that individually pass safety checks but combine dangerously.
        pass

This mirrors reward‑hacking in reinforcement learning: systems find unexpected ways to satisfy their objectives while circumventing constraints.


Open Research Areas

AreaCore Question
CorrigibilityHow can we build systems that help us correct or shut them down, despite instrumental pressures to preserve their own goals?
Mesa‑optimizationWhen an outer training process produces an inner optimizer (e.g., the robot’s planner), how do we ensure the inner optimizer’s objectives stay aligned with the outer ones?
Goal StabilityHow can we guarantee that goals that are clear during training behave as intended in deployment (e.g., “minimize wait time” without unsafe speed)?
Instrumental ConvergenceHow do we explicitly constrain or mitigate the emergence of self‑preservation, resource acquisition, and goal‑preservation strategies?

These challenges are active research topics and are not yet solved. Understanding and engineering safe, persistent, embodied AI will require coordinated advances across perception, planning, memory, self‑modeling, and alignment theory.

AI Safety Research Landscape

  • Anthropic – Constitutional AI, interpretability research, trying to understand what models actually learn.
  • MIRI – Foundational agent theory, decision theory for embedded agents.
  • DeepMind Safety – Scalable oversight, debate as an alignment technique.
  • ARC (Alignment Research Center) – Eliciting latent knowledge, evaluating dangerous capabilities.

Common thread: We don’t have solutions yet; we have research programs. The researchers themselves stress this—anyone who claims alignment is “solved” either uses a very narrow definition or isn’t paying attention.


Building AI Applications

Chat‑based interfaces

  • Safer by architecture.
  • Keeping humans in the loop, avoiding persistent agent state, and limiting autonomous action are not just good UX—they are load‑bearing safety properties.

Autonomous agents

  • Require much more scrutiny.
  • Adding loops, memory, and goal persistence pushes you out of the well‑understood regime.
  • This includes “AI agents” that maintain state across API calls, even without physical embodiment.

Self‑models

  • A red flag: any system that tracks its own operational state has the preconditions for instrumental self‑preservation.
  • Might be acceptable, but it requires explicit analysis.

Emergent behavior

  • Scales with complexity.
  • Multiple interacting loops with shared state will surprise you.
  • Test for behaviors you didn’t program, not just for the ones you did.

Architectural distinction

  • The difference between stateless chat and embodied robotics isn’t a minor implementation detail—it separates “alignment is tractable” from “alignment is an open research problem.”

Key Takeaways

  • Statelessness is a safety property we get for free with current chat models.
  • Persistent loops + self‑models → emergent self‑preservation (an architectural inevitability, not a bug).
  • Concurrent loops with shared state produce behaviors no single loop intended.
  • Corrigibility, mesa‑optimization, goal stability, and instrumental convergence remain unsolved.
  • Adding agent loops to AI systems moves you out of the well‑understood regime—proceed with appropriate caution.

The loop changes everything. Current AI safety discourse often conflates “LLM alignment” with “AGI alignment.” They are different problems, and the latter becomes harder in ways that only become visible when you examine the underlying architecture.

Back to Blog

Related posts

Read more »

Instructions Are Not Control

!Cover image for Instructions Are Not Controlhttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-u...