OpenAI upgrades its Responses API to support agent skills and a complete terminal shell

Published: 2 days ago (February 10, 2026 at 05:25 PM EST)

6 min read

Source: VentureBeat

Introduction

Until recently, building AI agents was a bit like training a long‑distance runner with a thirty‑second memory.
You could give your models tools and instructions, but after a few dozen interactions — several laps around the track, to extend the analogy — they would inevitably lose context and start hallucinating.

OpenAI’s latest updates to its Responses API (the interface that lets developers access multiple agentic tools such as web search and file search with a single call) signal that the era of the limited agent is waning.

The announcements include three major upgrades:

Server‑side Compaction
Hosted Shell Containers
A new “Skills” standard for agents

Together, these give agents a permanent desk, a terminal, and a memory that doesn’t fade, helping them evolve into reliable, long‑term digital workers.

Technology: Overcoming “Context Amnesia”

The biggest technical hurdle for autonomous agents has always been the clutter of long‑running tasks. Every time an agent calls a tool or runs a script, the conversation history grows. Eventually the model hits its token limit, forcing developers to truncate the history—often deleting the very reasoning the agent needs to finish the job.

Server‑side Compaction

OpenAI’s answer is Server‑side Compaction. Unlike simple truncation, compaction lets agents run for hours or even days. Early data from e‑commerce platform Triple Whale suggests this is a breakthrough in stability: their agent Moby successfully navigated a session involving 5 million tokens and 150 tool calls without a drop in accuracy.

In practical terms, the model can summarize its own past actions into a compressed state, keeping essential context alive while clearing the noise. This transforms the model from a forgetful assistant into a persistent system process.

Managed Cloud Sandboxes

The introduction of the Shell Tool moves OpenAI into the realm of managed compute. Developers can now opt for container_auto, which provisions an OpenAI‑hosted Debian 12 environment.

What the Hosted Shell Provides

Native execution environments
- Python 3.11
- Node.js 22
- Java 17
- Go 1.23
- Ruby 3.1
Persistent storage via /mnt/data – agents can generate, save, and download artifacts.
Networking capabilities – agents can reach the internet to install libraries or interact with third‑party APIs.

The Hosted Shell and its persistent /mnt/data storage give agents a managed environment where they can perform complex data transformations using Python or Java without requiring teams to build and maintain custom ETL (Extract, Transform, Load) middleware for every AI project. By leveraging these containers, data engineers can implement high‑performance processing tasks while minimizing the overhead of managing bespoke infrastructure. OpenAI’s message is clear: “Give us the instructions; we’ll provide the computer.”

OpenAI’s Skills vs. Anthropic’s Skills

Both OpenAI and Anthropic have converged on a similar file structure—a SKILL.md manifest with YAML front‑matter—but their underlying strategies diverge.

OpenAI’s Approach

Programmable substrate optimized for developer velocity.
Bundles the shell, memory, and skills into the Responses API for a “turnkey” experience.
Enterprise impact: Glean reported a jump in tool accuracy from 73 % to 85 % using OpenAI’s Skills framework.

Anthropic’s Approach

Open standard (agentskills.io) designed for portability.
A skill built for Claude can be moved to VS Code, Cursor, or any platform that adopts the specification.

Real‑world example

The open‑source AI agent OpenClaw adopted the SKILL.md manifest, inheriting a wealth of procedural knowledge originally designed for Claude. This compatibility fueled a community‑driven “skills boom” on platforms like ClawHub, which now hosts over 3,000 community‑built extensions ranging from smart‑home integrations to complex enterprise workflow automations.

Because OpenClaw supports multiple models—including OpenAI’s GPT‑5 series and local Llama instances—developers can write a skill once and deploy it across a heterogeneous landscape of agents. For technical decision‑makers, this open standard is becoming the industry’s preferred way to externalize and share agentic knowledge, moving past proprietary prompts toward a shared, inspectable, and interoperable infrastructure.

Key Architectural Difference

Feature	OpenAI	Anthropic
State Management	Server‑side Compaction keeps a compressed active state for long‑running sessions.	Progressive Disclosure: the model initially sees only skill names/descriptions; full details are loaded on demand.
Memory Impact	Compacting the entire session reduces token usage while preserving essential context.	Loading details only when needed prevents overwhelming the model’s working memory, enabling massive skill libraries (brand guidelines, legal checklists, code templates, etc.).

Implications for Enterprise Technical Decision‑Makers

Scalability: Server‑side Compaction and hosted containers allow agents to operate over extended periods without manual context management.
Operational Simplicity: Managed shells eliminate the need for custom sandbox infrastructure, reducing DevOps overhead.
Portability & Ecosystem Growth: Anthropic’s open Skills standard encourages reusable, versioned assets that can be shared across models and platforms, fostering a vibrant community marketplace.
Strategic Choice: Organizations must decide whether they prioritize a tightly integrated, turnkey stack (OpenAI) or a portable, vendor‑agnostic skill ecosystem (Anthropic) based on their long‑term AI strategy.

Engineers – Rapid Deployment & Fine‑Tuning

Server‑side Compaction + Skills = massive productivity boost.
No need to build custom state‑management for every agent run; built‑in compaction handles multi‑hour tasks.
Skills act as “packaged IP”: fine‑tuned or specialized procedural knowledge can be modularised and reused across internal projects.

From “Chat Box” to Production‑Grade Workflow

OpenAI’s announcement ends the era of bespoke infrastructure.
Historically, orchestrating an agent required:
1. Custom state‑management logic for long conversations.
2. Secure, ephemeral sandboxes to execute code.
The focus now shifts to:
- Which skills are authorized for which users?
- How to audit artifacts produced in the hosted filesystem?

OpenAI supplies the engine and the chassis; the orchestrator now defines the rules of the road.

Security Operations (SecOps) Perspective

Giving an AI model a shell and network access is a high‑stakes evolution.
Domain Secrets and Org Allowlists provide defense‑in‑depth: agents can call APIs without exposing raw credentials in the model’s context.
As “Skills” simplify deployment, SecOps must watch for malicious skills that could introduce:
- Prompt‑injection vulnerabilities.
- Unauthorized data‑exfiltration paths.

How Should Enterprises Decide?

Criteria	OpenAI	Anthropic
Integrated, high‑velocity environment for long‑running autonomous work	✅	❌
Model‑agnostic portability & open‑ecosystem standard	❌	✅

Bottom Line

OpenAI is no longer just selling a brain (the model); it also provides the office (the container), the memory (compaction), and the training manual (skills).
The announcements signal AI’s migration from the chat box into system architecture, turning “prompt spaghetti” into maintainable, versioned, and scalable business workflows.

OpenAI upgrades its Responses API to support agent skills and a complete terminal shell

Introduction

Technology: Overcoming “Context Amnesia”

Server‑side Compaction

Managed Cloud Sandboxes

What the Hosted Shell Provides

OpenAI’s Skills vs. Anthropic’s Skills

OpenAI’s Approach

Anthropic’s Approach

Real‑world example

Key Architectural Difference

Implications for Enterprise Technical Decision‑Makers

Engineers – Rapid Deployment & Fine‑Tuning

From “Chat Box” to Production‑Grade Workflow

Security Operations (SecOps) Perspective

How Should Enterprises Decide?

Bottom Line

Related posts

OpenAI deploys Cerebras chips for 'near-instant' code generation in first major move beyond Nvidia

Google Chrome ships WebMCP in early preview, turning every website into a structured tool for AI agents

AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

NanoClaw solves one of OpenClaw's biggest security issues — and it's already powering the creator's biz

Introduction

Technology: Overcoming “Context Amnesia”

Server‑side Compaction

Managed Cloud Sandboxes

What the Hosted Shell Provides

OpenAI’s Skills vs. Anthropic’s Skills

OpenAI’s Approach

Anthropic’s Approach

Real‑world example

Key Architectural Difference

Implications for Enterprise Technical Decision‑Makers

Engineers – Rapid Deployment & Fine‑Tuning

From “Chat Box” to Production‑Grade Workflow

Security Operations (SecOps) Perspective

How Should Enterprises Decide?

Bottom Line

Related posts

OpenAI deploys Cerebras chips for 'near-instant' code generation in first major move beyond Nvidia

Google Chrome ships WebMCP in early preview, turning every website into a structured tool for AI agents

AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

NanoClaw solves one of OpenClaw's biggest security issues — and it's already powering the creator's biz

Engineers – Rapid Deployment & Fine‑Tuning