What’s new with Red Hat OpenShift AI 3.3 UI: Moving from pilot to production

Published: (March 2, 2026 at 07:00 PM EST)
6 min read

Source: Red Hat Blog

OpenShift AI 3.3 – Balancing Governance & Developer Velocity

With the release of Red Hat OpenShift AI, we laid the groundwork for a robust enterprise‑AI infrastructure.

Now, the OpenShift AI 3.3 release tackles a common dilemma:

  • Rigorous governance – ensuring compliance, security, and reproducibility.
  • Rapid developer access – enabling data scientists and engineers to experiment and ship models quickly.

What’s new in OpenShift AI 3.3?

  • Centralized AI asset hub – a single source of truth for models, datasets, pipelines, and prompts.
  • Multi‑model, multi‑agent support – native tooling for orchestrating diverse model families and autonomous agents.
  • Policy‑driven governance – fine‑grained controls for model provenance, licensing, and runtime security.
  • Self‑service developer portal – streamlined UI/CLI for fast onboarding, experiment tracking, and model deployment.
  • Enhanced observability – unified metrics, logs, and traces across the AI stack.

These tools empower enterprises to govern AI at scale while maintaining the agility developers need to bring innovative solutions to production.

Centralized Assets: The AI Hub

As enterprises move beyond single‑model use cases, discoverability becomes a bottleneck. Platform teams need a single source of truth for their AI assets—to:

  • Register and version models before they are configured for deployment.
  • View deployed models across the organization.
  • Get guidance on optimal deployment configurations, hardware requirements, latency, and throughput expectations.

The AI Hub fulfills these needs. It serves as the central repository for your organization’s AI assets, starting with large language models (LLMs) in OpenShift AI 3.3 and expanding to Model Context Protocol (MCP) servers in future releases.

What the AI Hub Provides in OpenShift AI 3.3

  • Performance insights derived from Red Hat’s AI model validation program.
  • Guidance on trade‑offs among performance, cost, and hardware requirements.
  • Recommendations that help platform teams steer developers toward the most efficient configurations before deployment begins.

By consolidating assets and offering actionable deployment advice, the AI Hub removes discoverability friction and accelerates the path from model registration to production.

Governance at Scale: Model‑as‑a‑Service (MaaS)

If you’re configuring and managing your own GPUs and deploying AI models on them, building AI applications can be tough. Most developers, AI engineers, and data scientists would rather start with an endpoint for a model that’s already up and running. Requiring them to handle GPU provisioning, model serving, and scaling slows them down, reduces time‑to‑value, and is neither cost‑effective nor scalable from a governance perspective.

Enabling platform teams to deliver these models to everyone—so data scientists and business teams can consume the models they need—extends the same paradigm used for traditional application platforms. In this model:

  • Platform teams handle model serving and optimization.
  • They provide a centralized catalog of AI models that can be governed through role‑based access policies, usage limits, and versioning.
  • End‑users receive a simple API endpoint and can start building immediately.

OpenShift AI 3.3 introduces a technical preview of MaaS designed to help organizations become their own internal AI model providers.

What’s in it for administrators?

  • Granular rate‑limiting policies – Define limits in the UI, e.g., grant high‑quota access for small, frequently used models while applying stricter caps to resource‑intensive frontier models.
  • Optimized routing with llm‑d – Works alongside the Kubernetes‑native distributed inference framework llm‑d. While you set the policies, llm‑d automatically routes requests to make the best use of hardware without violating service‑level agreements (SLAs).

Ready to let your platform team serve AI models at scale?

Developer Velocity: Gen AI Studio

Models or assets deployed by platform teams need to be registered and surfaced centrally so AI engineers and developers can start building with them.

Developers also need a central place to experiment with these models and assets—a plug‑and‑play environment where they can quickly discover which model, prompt, or tool works best for their use case while the underlying infrastructure complexity is abstracted away.

Our technical‑preview release of Gen AI Studio provides this playground and the tools developers need to move from a prompt to a pilot.

Features

  • AI Playground

    • Experiment with prompts, model parameters, and MCP tools.
    • In OpenShift AI 3.3 you can:
      • Import your own MCP servers.
      • Toggle specific tools on or off, giving the determinism required for reliable agentic behavior.
    • View Code: Switch from the OpenShift AI UI to your local environment and use the “View Code” function to see and copy the playground configuration.
    • Roadmap (upcoming):
      • Export code directly from the playground.
      • Integrated prompt management.
      • Retrieval‑augmented generation (RAG) capabilities.
      • Refined MCP‑tool selection.
  • AI Asset Endpoints

    • Instantly retrieve API keys and endpoints.
    • Start testing assets in your local IDE without additional setup.

Gen AI Studio is designed to accelerate the developer workflow—from discovery and experimentation to production‑ready pilots—while keeping the experience simple and deterministic.

The Production Gap: Continuous Evaluation and Optimization

One of the biggest barriers to deploying models in production isn’t building the model—it’s managing costs and preventing quality drift.

1. Cost‑Optimization with Model Compression

OpenShift AI 3.3 introduces guided workbenches for two open‑source tools that Red Hat uses to benchmark and compress models as part of its model‑validation program:

ToolWhat it doesLinks
LLM CompressorOptimizes LLMs for low‑latency deployments (e.g., quantization)Workbench example
Blog post
GitHub repo
GuideLLMEvaluates LLM deployments in real‑world inference scenariosWorkbench example
Blog post
GitHub repo

Result: You can benchmark a model, compress it (e.g., via quantization), and compare performance gains directly within your environment.

2. Experiment Tracking with MLflow

We are releasing a developer preview of MLflow integration. While compression and benchmarking address immediate performance concerns, MLflow provides the historical memory for your AI lifecycle:

  • Log guidellm results and application responses.
  • Track regressions and quality trends over time.
  • Ensure optimizations do not compromise accuracy.

3. Visualizing the Feedback Loop

The MLflow dashboard now shows a direct correlation between compression experiments and inference latency, turning performance troubleshooting into a data‑driven process rather than an anecdotal one.

Takeaway: By combining model compression, benchmarking, and MLflow‑based experiment tracking, you gain a continuous, observable loop that keeps costs low while safeguarding model quality in production.

Try Red Hat OpenShift AI

The features in OpenShift AI 3.3 are designed to transform how you govern access to AI capabilities on the platform. By installing OpenShift AI 3.3 you can:

  • Experience AI Hub
  • Preview Gen AI Studio
  • Explore the new Optimization Workbenches

For more details, see the official press release.

You can also try OpenShift AI through the Red Hat product trial center. The trial provides 60‑day, no‑cost access to a fully managed environment where you can test these production‑grade tools.

  • Start your trial here:
0 views
Back to Blog

Related posts

Read more »