AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)
Source: Dev.to
Introduction
In this session, the Amazon AGI Lab introduces Nova Act, an AI agent for browser automation that achieves over 90 % reliability in production workflows. Nova Act is designed to interact with web interfaces in a human‑like manner, using reinforcement learning on web simulations and advanced element understanding. The platform includes human‑in‑the‑loop (HITL) capabilities, full AWS integration as a managed service, and a developer ecosystem comprising a playground, SDK, IDE extension, and CLI.
The Limitations of Legacy Browser Automation
Traditional browser automation solutions are code‑based and require developers to write extensive logic for each step of a workflow. Common challenges include:
- Long setup times – many implementations take months to become operational.
- Fragility – small changes to a website often break the automation, leading to high maintenance overhead.
- Limited generalizability – workflows must be manually specified for each variation (e.g., different geographies, SKUs, or insurance companies), making scaling impractical.
These constraints contrast sharply with how humans use computers: we can quickly locate UI elements (e.g., a “Compose” button in an email client) based on visual cues and intuition built from millions of prior interactions.
Nova Act’s Human‑Like Interaction Model
Nova Act treats computer use like a human does:
- Perceive – capture a screenshot of the current page.
- Understand – interpret the UI elements in the context of the given task.
- Act – decide and execute the next action (click, type, select, etc.).
- Iterate – repeat the perception‑understanding‑action loop until the task is complete.
This approach yields:
- Robustness – minor UI changes no longer cause failures.
- Rapid onboarding – developers can describe tasks in natural language.
- Cross‑environment generalization – the same model works across diverse web applications.
Achieving High Reliability
Reliability is the primary focus of Nova Act. Two key strategies were employed:
Element Understanding
- Collected extensive training data on challenging UI components such as date pickers, dropdowns, filters, and dynamic loading behaviors.
- Evaluated models specifically on these elements to ensure end‑to‑end reliability.
Reinforcement Learning in Web Simulations
- Built hundreds of mock websites (“web gyms”) that replicate common UI patterns.
- Trained the model to complete tasks without prescribing the exact steps, rewarding only successful end states.
- This exploration enables the model to discover effective interaction strategies across varied interfaces.
Human‑In‑The‑Loop (HITL) Capabilities
Nova Act integrates HITL to handle edge cases and improve safety:
- Intervention points allow operators to review and correct actions before they are executed.
- Feedback loops capture corrections, feeding them back into the training pipeline for continuous improvement.
Integrated Developer Platform
| Component | Description |
|---|---|
| Playground | Interactive UI for prototyping tasks and visualizing agent behavior. |
| SDK | Programmatic access to Nova Act APIs for custom integration. |
| IDE Extension | Real‑time assistance and debugging within popular development environments. |
| CLI | Command‑line tool for automation pipelines and CI/CD workflows. |
Real‑World Demonstrations
Design partners showcased how Nova Act powers large‑scale automation:
- 1Password – uses Nova Act for Universal Sign‑On across millions of websites.
- Amazon Leo – automated 200 QA scenarios, saving approximately 60 developer days.
- Sola – built an enterprise process automation platform handling complex medical and financial workflows.
Performance Benchmarks
- Nova Act’s cost‑effectiveness and throughput surpass models such as Haiku and Sonnet in benchmark tests.
- The platform supports multi‑agent frameworks, enabling coordinated workflows across multiple agents.
Conclusion
Nova Act represents a shift from static, code‑heavy automation toward adaptive, human‑like agents that can reliably operate in dynamic web environments. By combining deep element understanding, reinforcement learning, and HITL safeguards, the service delivers a scalable, managed solution for enterprises seeking to automate complex browser‑based tasks.