How llm-d brings critical resource optimization with SoftBank’s AI-RAN orchestrator

Published: 3 days ago (February 17, 2026 at 07:00 PM EST)

4 min read

Source: Red Hat Blog

As the technical reality of AI‑RAN comes into focus, many telecommunication service providers are realizing that it’s no longer just about whether they can run AI and radio access network (RAN) on the same hardware – it’s about how they manage AI at scale.

In Red Hat’s latest collaboration with SoftBank Corp., we have integrated llm‑d into SoftBank’s AI‑RAN orchestrator, AITRAS. Founded by Red Hat alongside other industry leaders, llm‑d is an open‑source framework designed to dynamically and intelligently distribute the inferencing of large language models (LLMs) within a RAN more efficiently and with increased performance.

The problem: Unifying AI and RAN workloads at the service‑provider edge

Traditional RAN applications are widely deployed by service providers at the edge on CPUs and GPUs, often utilizing Kubernetes platforms like Red Hat OpenShift. The recent surge in GenAI and transformer‑based language models is enabling new forms of computation and insights at the edge. In addition to traditional RANs, there are AI‑powered RAN applications and agents that require runtime and inference endpoints at the edge.

The critical question for service providers is how to enable traditional RAN and these new language models and agents to co‑exist at RAN locations in order to unlock new use cases, generate value, and create monetization. This unification is essential for reducing operational expenditure (OpEx) and accelerating time‑to‑market for new, revenue‑generating edge services.

To make AI‑RAN commercially viable, service providers need to treat AI workloads with the same flexibility as cloud‑native network functions (CNFs) and applications. Enter the collaboration between SoftBank and Red Hat using llm‑d and vLLM for AI‑RAN.

llm‑d: the bridge between inference and orchestrators

vLLM has emerged as the open‑source leader for AI inferencing, providing high‑performance model deployment on a single GPU node. However, it is not designed to manage model deployment across a complex, multi‑node footprint. That is the specific problem llm‑d was built to solve. By leveraging Kubernetes, llm‑d orchestrates vLLM across multiple nodes to achieve production‑scale AI inference, extending vLLM’s efficiency to a distributed environment.

By integrating llm‑d into the SoftBank AITRAS orchestrator, service providers achieve several major breakthroughs:

Unified AI and RAN workloads: AITRAS orchestrates and optimizes RAN workloads and LLM requests across multiple GPU clusters, while llm‑d and vLLM intelligently (prefix, KV‑cache, and load‑aware) route inference requests to the GPUs, seamlessly managing GPU resources and enabling autoscaling.
Hardware‑aware optimization: LLM inference involves two distinct phases – prefill (compute‑intensive prompt processing) and decode (memory‑bandwidth‑bound token generation). llm‑d enables AITRAS to disaggregate these phases, dynamically assigning specialized GPU resources to each. This mitigates the risk of high‑performance AI demands starving critical RAN functions that share the same hardware, protecting network resiliency and ensuring superior quality of service (QoS) for all customers.
Autonomous scaling for variable demand: User requests for LLM services are highly variable. llm‑d allows AITRAS to automatically assign and scale prefill and decode worker roles based on the workload profile, reducing latency, improving power consumption, and driving down total cost of ownership (TCO) while supporting sustainability goals.

Why this matters for the future of 5G and 6G

The integration of llm‑d into AITRAS effectively provides the operating system for AI at the edge. It allows SoftBank to run high‑performance inference and RAN workloads on power‑efficient architectures, including Arm‑based systems, proving that AI‑RAN can achieve the scalability and flexibility required for next‑generation mobile networks. By moving away from manual configurations toward an automated, llm‑d‑driven deployment model, service providers can eliminate the operational complexity that has historically held back edge AI.

Service providers are entering an era where the network doesn’t just carry data – it processes it intelligently and efficiently. Learn more about the results of this integration at the Red Hat booth at MWC Barcelona 2026, where experts will explain how llm‑d and AITRAS are making the promise of AI‑RAN a reality.

How llm-d brings critical resource optimization with SoftBank’s AI-RAN orchestrator

The problem: Unifying AI and RAN workloads at the service‑provider edge

llm‑d: the bridge between inference and orchestrators

Why this matters for the future of 5G and 6G

Explore the benefits of Red Hat AI

Related posts

Small models, big impact: The future of scaling enterprise AI agents

Production-ready: Red Hat’s blueprint for 2026

Red Hat Learning Subscription Course reimagines virtual training

From Linux Basics to Remote Deployment: A 6 Day DevOps Foundation Project

The problem: Unifying AI and RAN workloads at the service‑provider edge

llm‑d: the bridge between inference and orchestrators

Why this matters for the future of 5G and 6G

Explore the benefits of Red Hat AI

Related posts

Small models, big impact: The future of scaling enterprise AI agents

Production-ready: Red Hat’s blueprint for 2026

Red Hat Learning Subscription Course reimagines virtual training

From Linux Basics to Remote Deployment: A 6 Day DevOps Foundation Project

Explore the benefits of Red Hat AI