AWS re:Invent 2025 - Zoox: Building Machine Learning Infrastructure for Autonomous Vehicles (AMZ304)

Published: (December 6, 2025 at 01:50 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

Jim Robinson‑Bohnslav, tech lead for foundation model training at Zoox, opens the session with a brief welcome. He is joined by co‑presenters Anindya and Avinash, who will later introduce themselves.

Agenda

  1. Short introduction to Zoox.
  2. Overview of the use cases for foundation models at Zoox.
  3. Deep dive into AWS SageMaker HyperPod and its integration into Zoox’s training and evaluation stack.

What is Zoox?

Zoox builds a purpose‑built, fully electric, fully autonomous robotaxi designed to make personal transportation safe, clean, and enjoyable. The company believes the traditional model of individually owned, human‑driven vehicles is broken due to safety risks, under‑utilization, and pollution. Zoox’s robotaxis are already operating in Las Vegas, where riders can request a free ride via a QR‑code‑enabled app.

Zoox Autonomy Overview

Zoox vehicles are equipped with sensor pods on the top corners of the vehicle. Each pod contains:

  • LiDAR
  • Radar
  • Cameras (including thermal)
  • Microphones

These sensors provide a 360° field of view with redundancy, allowing perception of objects up to and beyond 150 m.

Three Pillars of Autonomy

Perception

The perception stack ingests raw sensor data and produces a structured world model. Core outputs include 3‑D bounding boxes for agents (cars, pedestrians, trucks, etc.) and detection of traffic lights, all fused with high‑definition maps.

Prediction

Prediction takes the detected agents and forecasts multiple possible future trajectories for each (multimodal prediction). These hypotheses feed into the planning stack.

Planning & Controls

Planning combines perception, prediction, map data, and the rider’s destination to generate high‑level routes and low‑level control commands that drive the vehicle.

Edge Cases & Long‑Tail Scenarios

Zoox’s autonomy stack must handle a wide variety of unusual situations observed on the Las Vegas strip:

  • Fire trucks blocking a lane without sirens, with hoses extending into traffic.
  • Jaywalkers crossing multiple lanes at 30 mph.
  • Construction zone flagger dressed as a child, creating ambiguous signals.
  • Convoy of tanks—a class not present in the ontology.
  • Traffic cone placed on a sign, challenging conventional interpretation.
  • Vehicles on fire and handwritten construction signs that must be obeyed.
  • “Cartzilla” – a hybrid half‑tour‑bus, half‑shopping‑cart vehicle.
  • Foot climbing onto the robotaxi at night, captured alongside a friend’s video.
  • Dog in a backpack, raising classification questions (dog vs. pedestrian vs. bicyclist).

These long‑tail cases motivate the use of foundation models to generalize across rare events.

Foundation Models Approach

Zoox tackles difficult perception and decision‑making problems by training large multimodal foundation models that ingest camera, LiDAR, and radar data. The strategy follows the “bitter lesson” principle articulated by Richard Sutton (see his classic blog post on the topic).

Machine Learning Infrastructure

Training Pipeline

  • Supervised fine‑tuning on tens of thousands of driving hours.
  • Reinforcement learning using GRPO and DAPO algorithms.

Scale & Performance

  • Petabytes of sensor data stored and processed.
  • Model parallelism with HSDP and tensor parallelism across 64+ GPUs.
  • Achieved ≈ 95 % GPU utilization.

SageMaker HyperPod

  • Auto‑resume capability for interrupted jobs.
  • EFA‑enabled networking delivering up to 3200 Gbps throughput.
  • Integrated observability via Amazon CloudWatch and Grafana dashboards.

Orchestration & Multi‑Region Training

  • Migration from SLURM to Amazon EKS for greater flexibility.
  • Multi‑region training leveraging P5 and P6 instances to accelerate large‑scale experiments.

This article is auto‑generated from the original presentation content and may contain minor typographical errors.

Back to Blog

Related posts

Read more »