Building AWS Bedrock Model Availability: Slashing AI Routing Discovery From Days to Minutes
Source: Salesforce Engineering
Engineering Energizers Q&A – Spotlight on Scott Chang
Principal Engineer, AI Infrastructure – Agentforce 360
What is your team’s mission in building Bedrock Model Availability for Agentforce?
The team’s core mission is to deliver stable, resilient, and secure infrastructure for the AI services behind Agentforce 360, with a sharp focus on operational excellence.
- As workloads powered by large language models (LLMs) expand and evolve rapidly, the reliability of the underlying infrastructure directly impacts the customer experience.
- Ensuring model routing, availability, and fallback behaviors remain predictable and safe under all operating conditions is therefore essential.
The Bedrock Model Availability capability emerged from this focus on operational excellence. Rather than treating model routing as a static configuration, the team built it as an infrastructure capability that includes automation, observability, and guardrails. The result is a robust foundation that:
- Enhances resilience
- Minimizes operational risk
- Allows Agentforce to adopt new models quickly without compromising correctness or compliance
What scalability constraints did Agentforce’s rapid regional expansion introduce for tracking model and endpoint availability?
Agentforce uses a multi‑account AWS architecture to:
- Maintain regional separation
- Satisfy data‑residency requirements
- Manage service quotas
Global expansion pushed the platform to 30+ AWS accounts across various territories. Each account requires its own routing setup for foundation models, turning availability management into a massive undertaking.
Additional challenges
- New regions are added while model providers launch fresh options (often first in hubs like Oregon or Virginia).
- Each model may have different inference profiles—in‑region or global endpoints—that change as capacity shifts.
Manual tracking quickly became impossible. Engineers spent hours finding models, checking profiles, and updating complex routing configurations for every release.
Solution: An automated system that gathers availability data through three AWS APIs.
- Regional AWS Lambda functions collect the data and publish it as CloudWatch custom metrics.
- These metrics feed internal monitoring tools and a single Grafana dashboard.
- Engineers can now identify active model endpoints by region and ID in real time.
Result: Discovery time reduced from days to minutes.

What correctness and compliance constraints arose when incorrect endpoints were deployed to production?
Incorrect endpoint deployment can:
- Route inference traffic to unexpected regions or non‑existent endpoints → increased latency or outages.
- Violate data‑residency requirements for customers who must keep data within specific regions or countries.
To prevent such failures from becoming customer‑visible incidents, the team partnered with the gateway team for external LLM providers to implement a deterministic baseline fallback mechanism:
- Baseline regions (e.g., Virginia, Oregon, Frankfurt) are identified as always‑available Bedrock capacity zones.
- If a primary endpoint returns HTTP errors or becomes unavailable, traffic is automatically rerouted to this baseline.
- Primary routing logic is separated from the fallback path, ensuring that stale or incorrect configuration data cannot cause prolonged outages.
This design dramatically improves correctness, reliability, and compliance under real‑world operating conditions.
What routing and resiliency constraints affected latency, capacity, data residency, and availability when traffic was sent to the wrong geographic region?
Cross‑region routing introduces several nuanced challenges:
| Constraint | Impact |
|---|---|
| Latency | Minor compared to overall LLM inference time, but still adds overhead when traffic traverses long distances. |
| Capacity | Unintended traffic can overload production capacity in regions not sized for the load, leading to throttling or failures. |
| Data residency | Violates client‑mandated jurisdictional limits (e.g., EU data must stay in the EU). |
| Availability | Unexpected routing can cause service degradation or outages in both source and destination regions. |
How the system addresses these limits
- Layered routing – Primary routing selects the optimal endpoint; a deterministic fallback handles failures.
- Real‑time availability metrics – CloudWatch and Grafana provide instant visibility into endpoint health and capacity.
- Compliance‑aware routing tables – Regions are tagged with residency constraints; the router respects these tags when selecting endpoints.
- Capacity‑aware throttling – The system monitors regional load and can divert traffic to baseline regions before capacity is exhausted.
Together, these mechanisms ensure that traffic is always directed to a correct, compliant, and capacity‑sufficient endpoint, preserving latency targets and meeting regulatory obligations.
Model Routing & Inference Profiles
AWS‑managed inference profiles provide in‑region, in‑country, and global failover options to handle capacity constraints. At the same time, LLM Gateway runs its own failover logic: it detects error signals and reroutes traffic to predefined backup endpoints. Together, these mechanisms ensure that routing decisions simultaneously respect latency, capacity, data residency, and availability.

Model routing integrated with Inference Profile increased resiliency.
What automation constraints limited how quickly Agentforce could adopt newly available AWS Bedrock models?
Before automation, routing updates were a cumbersome, multi‑step manual process:
- Discover model availability.
- Identify the appropriate endpoints.
- Update configuration files.
- Create pull requests.
- Redeploy services.
Each step introduced delays and opportunities for error, extending onboarding timelines to several days.
The Bedrock Model Availability capability removed these bottlenecks. The team now automatically:
- Collects, normalizes, and exposes all routing‑critical data.
- Presents the data on a Grafana dashboard for a single pane of glass view of model availability across regions.

A single‑pane‑of‑glass dashboard reduced operational time from days to minutes.
Impact
- Discovery time dropped from ≈ 3 days to under 10 minutes.
- Future plans aim for full automation, eliminating routing‑configuration delays entirely.
- What once took up to 7 days (discovery → deployment) will become effectively instantaneous, with dynamic updates propagating automatically at the service level.
This shift transforms routing from a manual operational task into a scalable infrastructure capability that evolves at Agentforce’s own pace.
Learn More
- Stay connected — join our Talent Community!
- Explore our Technology and Product teams to learn how you can get involved.