AWS re:Invent 2025 - Customize models for agentic AI at scale with SageMaker AI and Bedrock (AIM381)

Published: 2 days ago (December 5, 2025 at 08:37 PM EST)

3 min read

Source: Dev.to

Overview

In this session, Amit Modi and Shelbee demonstrate Amazon SageMaker’s new capabilities for building agentic AI applications at scale. They introduce serverless model customization with a broad selection of foundation models and fine‑tuning techniques—including reinforcement learning—plus serverless MLflow for unified observability, and serverless model evaluation with industry benchmarks and AI‑as‑a‑judge metrics.

The demo walks through an end‑to‑end workflow:

Customizing Qwen 2.5 for a medical‑triage agent
Tracking experiments and datasets as versioned assets
Evaluating against MMLU clinical benchmarks
Deploying to SageMaker endpoints
Integrating with the AgentCore runtime via the Strands SDK

Key highlights

Automatic lineage tracking
SageMaker Pipelines integration with new deployment steps for Bedrock
Multi‑model endpoints with adapter‑based inference (≈ 50 % cost savings)
Speculative decoding (≈ 2.5× latency reduction)

The session addresses four critical production challenges: lack of standardized customization tools, fragmented observability, evolving ML‑asset tracking needs, and complex inference optimization.

Market Trends

Rapid adoption of agentic AI in enterprise software: projected to grow from 1 % in 2024 to 33 % in 2028 (a 33× increase).
By 2028, ≈ 15 % of decisions are expected to be made autonomously by agents, driving high compute and model requirements for fast, cost‑effective inference.

Production Challenges

No standardized model‑customization tools
- Teams build ad‑hoc workflows with glue code, then must rewrite them for production, causing delays and manual effort.
Fragmented observability
- Disparate tools make it hard to debug failures or detect deviations in model/agent behavior.
Evolving ML‑asset tracking
- Beyond models, teams must version reward functions, prompts, and other artifacts used in reinforcement learning, adding integration overhead.
Cost‑effective, high‑quality inference
- Selecting optimal instance types, containers, and frameworks requires extensive benchmarking, often leading to expensive or delayed deployments.

SageMaker Capabilities

Amit Modi (Senior Manager, Model Operations & Inference) and Shelbee (Worldwide Specialist Senior Manager, Gen AI) outline how SageMaker tackles these challenges:

Serverless Model Customization

Broad foundation‑model catalog (public models plus Bedrock models)
Fine‑tuning techniques: supervised, reinforcement learning, and more
Fully serverless: no capacity planning or GPU management; SageMaker handles infrastructure, checkpointing, and node recovery automatically

SageMaker Studio UI

Navigate to SageMaker Studio → Models.
Choose a foundation model and a fine‑tuning technique (UI, SDK, or agent experience).
Upload a dataset or select an existing, versioned dataset from SageMaker.
Pick or define a reward function:
- Write inline code, or
- Attach a pre‑registered Lambda that implements the reward logic.

SageMaker automatically checkpoints jobs, enabling seamless recovery from node failures and ensuring efficient compute usage.

Pipeline Integration

SageMaker Pipelines now include purpose‑built steps for:
- Model customization
- Deployment to SageMaker endpoints and Bedrock (inference‑as‑a‑service)
No glue code required: annotate notebook code with @step or upload via the UI to generate a fully functional pipeline.
Pipelines are serverless, eliminating the need to manage underlying compute resources.

End‑to‑End Demo Highlights

Customizing Qwen 2.5 for a medical‑triage agent using supervised fine‑tuning.
Experiment tracking with MLflow, versioned datasets, and reward functions.
Evaluation against the MMLU clinical knowledge benchmark and AI‑as‑a‑judge metrics.
Deployment to a SageMaker endpoint and integration with the AgentCore runtime via the Strands SDK.
Cost‑saving features: adapter‑based multi‑model endpoints (≈ 50 % cheaper) and speculative decoding (≈ 2.5× lower latency).

Key Takeaways

Standardized, serverless customization removes the need for manual infrastructure management.
Unified observability via serverless MLflow simplifies debugging across models and agents.
Versioned ML assets (models, datasets, reward functions, prompts) support governance and compliance.
Optimized inference (adapter‑based multi‑model endpoints, speculative decoding) delivers significant cost and latency improvements.

These advancements aim to accelerate the path from prototype to production for agentic AI applications, addressing the major bottlenecks that have historically slowed adoption.

AWS re:Invent 2025 - Customize models for agentic AI at scale with SageMaker AI and Bedrock (AIM381)

Overview

Market Trends

Production Challenges

SageMaker Capabilities

Serverless Model Customization

SageMaker Studio UI

Pipeline Integration

End‑to‑End Demo Highlights

Key Takeaways

Related posts

Can CVE-2025-55182 (React Server Components Vulnerability) Create Files Like .sh, .gz, or XMRig Miners in Server Root?

How to Use Google Shopping Ads to Maximize Revenue

Understanding the S&P/ASX 200 Financials (XFJ): A Deep Dive into Australia’s Financial Powerhouse Sector

Creating a Universal Hybrid Resource (Clearnet + Darknet). ||V2.0||