OpenAI upgrades its Responses API to support agent skills and a complete terminal shell

Published: (February 10, 2026 at 05:25 PM EST)
6 min read

Source: VentureBeat

Introduction

Until recently, building AI agents was a bit like training a long‑distance runner with a thirty‑second memory.
You could give your models tools and instructions, but after a few dozen interactions — several laps around the track, to extend the analogy — they would inevitably lose context and start hallucinating.

OpenAI’s latest updates to its Responses API (the interface that lets developers access multiple agentic tools such as web search and file search with a single call) signal that the era of the limited agent is waning.

The announcements include three major upgrades:

  1. Server‑side Compaction
  2. Hosted Shell Containers
  3. A new “Skills” standard for agents

Together, these give agents a permanent desk, a terminal, and a memory that doesn’t fade, helping them evolve into reliable, long‑term digital workers.

Technology: Overcoming “Context Amnesia”

The biggest technical hurdle for autonomous agents has always been the clutter of long‑running tasks. Every time an agent calls a tool or runs a script, the conversation history grows. Eventually the model hits its token limit, forcing developers to truncate the history—often deleting the very reasoning the agent needs to finish the job.

Server‑side Compaction

OpenAI’s answer is Server‑side Compaction. Unlike simple truncation, compaction lets agents run for hours or even days. Early data from e‑commerce platform Triple Whale suggests this is a breakthrough in stability: their agent Moby successfully navigated a session involving 5 million tokens and 150 tool calls without a drop in accuracy.

In practical terms, the model can summarize its own past actions into a compressed state, keeping essential context alive while clearing the noise. This transforms the model from a forgetful assistant into a persistent system process.

Managed Cloud Sandboxes

The introduction of the Shell Tool moves OpenAI into the realm of managed compute. Developers can now opt for container_auto, which provisions an OpenAI‑hosted Debian 12 environment.

What the Hosted Shell Provides

  • Native execution environments

    • Python 3.11
    • Node.js 22
    • Java 17
    • Go 1.23
    • Ruby 3.1
  • Persistent storage via /mnt/data – agents can generate, save, and download artifacts.

  • Networking capabilities – agents can reach the internet to install libraries or interact with third‑party APIs.

The Hosted Shell and its persistent /mnt/data storage give agents a managed environment where they can perform complex data transformations using Python or Java without requiring teams to build and maintain custom ETL (Extract, Transform, Load) middleware for every AI project. By leveraging these containers, data engineers can implement high‑performance processing tasks while minimizing the overhead of managing bespoke infrastructure. OpenAI’s message is clear: “Give us the instructions; we’ll provide the computer.”

OpenAI’s Skills vs. Anthropic’s Skills

Both OpenAI and Anthropic have converged on a similar file structure—a SKILL.md manifest with YAML front‑matter—but their underlying strategies diverge.

OpenAI’s Approach

  • Programmable substrate optimized for developer velocity.
  • Bundles the shell, memory, and skills into the Responses API for a “turnkey” experience.
  • Enterprise impact: Glean reported a jump in tool accuracy from 73 % to 85 % using OpenAI’s Skills framework.

Anthropic’s Approach

  • Open standard (agentskills.io) designed for portability.
  • A skill built for Claude can be moved to VS Code, Cursor, or any platform that adopts the specification.

Real‑world example

The open‑source AI agent OpenClaw adopted the SKILL.md manifest, inheriting a wealth of procedural knowledge originally designed for Claude. This compatibility fueled a community‑driven “skills boom” on platforms like ClawHub, which now hosts over 3,000 community‑built extensions ranging from smart‑home integrations to complex enterprise workflow automations.

Because OpenClaw supports multiple models—including OpenAI’s GPT‑5 series and local Llama instances—developers can write a skill once and deploy it across a heterogeneous landscape of agents. For technical decision‑makers, this open standard is becoming the industry’s preferred way to externalize and share agentic knowledge, moving past proprietary prompts toward a shared, inspectable, and interoperable infrastructure.

Key Architectural Difference

FeatureOpenAIAnthropic
State ManagementServer‑side Compaction keeps a compressed active state for long‑running sessions.Progressive Disclosure: the model initially sees only skill names/descriptions; full details are loaded on demand.
Memory ImpactCompacting the entire session reduces token usage while preserving essential context.Loading details only when needed prevents overwhelming the model’s working memory, enabling massive skill libraries (brand guidelines, legal checklists, code templates, etc.).

Implications for Enterprise Technical Decision‑Makers

  • Scalability: Server‑side Compaction and hosted containers allow agents to operate over extended periods without manual context management.
  • Operational Simplicity: Managed shells eliminate the need for custom sandbox infrastructure, reducing DevOps overhead.
  • Portability & Ecosystem Growth: Anthropic’s open Skills standard encourages reusable, versioned assets that can be shared across models and platforms, fostering a vibrant community marketplace.
  • Strategic Choice: Organizations must decide whether they prioritize a tightly integrated, turnkey stack (OpenAI) or a portable, vendor‑agnostic skill ecosystem (Anthropic) based on their long‑term AI strategy.

Engineers – Rapid Deployment & Fine‑Tuning

  • Server‑side Compaction + Skills = massive productivity boost.
  • No need to build custom state‑management for every agent run; built‑in compaction handles multi‑hour tasks.
  • Skills act as “packaged IP”: fine‑tuned or specialized procedural knowledge can be modularised and reused across internal projects.

From “Chat Box” to Production‑Grade Workflow

  • OpenAI’s announcement ends the era of bespoke infrastructure.
  • Historically, orchestrating an agent required:
    1. Custom state‑management logic for long conversations.
    2. Secure, ephemeral sandboxes to execute code.
  • The focus now shifts to:
    • Which skills are authorized for which users?
    • How to audit artifacts produced in the hosted filesystem?

OpenAI supplies the engine and the chassis; the orchestrator now defines the rules of the road.

Security Operations (SecOps) Perspective

  • Giving an AI model a shell and network access is a high‑stakes evolution.
  • Domain Secrets and Org Allowlists provide defense‑in‑depth: agents can call APIs without exposing raw credentials in the model’s context.
  • As “Skills” simplify deployment, SecOps must watch for malicious skills that could introduce:
    • Prompt‑injection vulnerabilities.
    • Unauthorized data‑exfiltration paths.

How Should Enterprises Decide?

CriteriaOpenAIAnthropic
Integrated, high‑velocity environment for long‑running autonomous work
Model‑agnostic portability & open‑ecosystem standard

Bottom Line

  • OpenAI is no longer just selling a brain (the model); it also provides the office (the container), the memory (compaction), and the training manual (skills).
  • The announcements signal AI’s migration from the chat box into system architecture, turning “prompt spaghetti” into maintainable, versioned, and scalable business workflows.
0 views
Back to Blog

Related posts

Read more »