Kubernetes v1.35: New level of efficiency with in-place Pod restart
Source: Kubernetes Blog
Enabling the Feature
The functionality is available by enabling the RestartAllContainersOnContainerExits feature gate. This alpha feature extends the Container Restart Rules feature, which graduated to beta in Kubernetes 1.35.
# Example feature‑gate configuration
apiVersion: apiserver.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: RestartAllContainersOnContainerExits
configuration:
enabled: true
The Problem
When a single‑container restart isn’t enough and recreating Pods is too costly, the existing restart mechanisms fall short.
- Kubernetes has long supported restartPolicy at the Pod level and, more recently, at the individual container level.
- These policies work well for isolated crashes, but many modern applications have complex inter‑container dependencies.
Typical Scenarios
-
Init container prepares the environment (e.g., mounts a volume or generates a config file).
If the main application container corrupts this environment, simply restarting that one container isn’t sufficient—the entire initialization process must run again. -
Watcher sidecar monitors system health.
If it detects an unrecoverable but retriable error state, it must trigger a restart of the main application container from a clean slate. -
Resource‑managing sidecar fails.
Even if the sidecar restarts on its own, the main container may be stuck trying to access an outdated or broken connection.
In all these cases, the desired action is not to restart a single container, but all of them. Previously, the only way to achieve this was to delete the Pod and let a controller (e.g., a Job or ReplicaSet) create a new one—a slow, expensive process involving the scheduler, node‑resource allocation, and re‑initialization of networking and storage.
Impact on Large‑Scale AI/ML Workloads
- Scale: ≥ 1,000 Nodes with one Pod per Node.
- Requirement: When a failure occurs (e.g., a Node crash), all Pods in the fleet must be recreated to reset state before training can resume, even if most Pods were not directly affected.
- Cost: Deleting, creating, and scheduling thousands of Pods simultaneously creates a massive bottleneck. The estimated overhead can cost ≈ $100 k per month in wasted resources.
Handling these failures traditionally requires a complex integration between the training framework and Kubernetes—often fragile and toilsome. The new feature provides a Kubernetes‑native solution, improving robustness and letting developers focus on core training logic.
Additional Benefit
Keeping Pods on their assigned Nodes enables further optimizations, such as node‑level caching tied to a specific Pod identity—something impossible when Pods are unnecessarily recreated on different Nodes.
Introducing the RestartAllContainers Action
Kubernetes v1.35 adds a new action to the container restart rules: RestartAllContainers.
When a container exits in a way that matches a rule with this action, the kubelet initiates a fast, in‑place restart of the Pod.
What Is Preserved During an In‑Place Restart?
- Pod UID, IP address, and network namespace
- Pod sandbox and any attached devices
- All volumes, including
emptyDirand mounted volumes from PVCs
After terminating all running containers, the Pod’s startup sequence is re‑executed from the very beginning:
- All init containers run again in order.
- Sidecars and regular containers start thereafter, ensuring a completely fresh start in a known‑good environment.
Note: Ephemeral containers are terminated. All other containers—including those that previously succeeded or failed—are restarted, regardless of their individual restart policies.
Use Cases
1. Efficient Restarts for ML/Batch Jobs
- Problem: Rescheduling a worker Pod on failure is costly; on a 1,000‑node training cluster, the overhead can waste > $100 k in compute resources per month.
- Solution: Use
RestartAllContainersto enable a fast, hybrid recovery strategy:- Recreate only the “bad” Pods (e.g., those on unhealthy Nodes).
- Trigger
RestartAllContainersfor the remaining healthy Pods.
- Result: Benchmarks show recovery overhead drops from minutes to a few seconds.
2. Watcher‑Sidecar‑Driven Reset
A watcher sidecar can monitor the main training process. If it encounters a specific, retriable error, the watcher exits with a designated code that triggers a fast reset of the worker Pod, allowing it to restart from the last checkpoint without involving the Job controller. This capability is now natively supported by Kubernetes.
Read more: Future development and JobSet features are described in KEP‑467 – JobSet in‑place restart.
Example Pod Specification
apiVersion: v1
kind: Pod
metadata:
name: ml-worker-pod
spec:
restartPolicy: Never
initContainers:
# This init container will re‑run on every in‑place restart
- name: setup-environment
image: my-repo/setup-worker:1.0
containers:
- name: watcher-sidecar
image: my-repo/watcher:1.0
# Container‑level restart policy (still respected for individual restarts)
restartPolicy: Always
restartPolicyRules:
- action: RestartAllContainers
# Example rule: trigger when the watcher exits with code 42
exitCodes: [42]
The snippet above demonstrates how to declare a pod that will restart all containers when the watcher-sidecar exits with exit code 42.
Bottom Line
RestartAllContainers gives Kubernetes users a lightweight, in‑place pod reset mechanism that:
- Saves compute resources and money at scale.
- Reduces recovery time from minutes to seconds.
- Preserves critical pod identity (UID, IP, volumes, etc.).
- Enables richer sidecar‑driven recovery patterns without extra controller logic.
This feature marks a significant step forward for building robust, efficient AI/ML platforms on Kubernetes.
RestartAllContainers
onExit
exitCodes:
operator: In # A specific exit code from the watcher triggers a full pod restart
values: [88]
containers:
- name: main-application
image: my-repo/training-app:1.0
1. Re‑running Init Containers for a Clean State
Imagine a scenario where an init container is responsible for fetching credentials or setting up a shared volume.
If the main application fails in a way that corrupts this shared state, you need the init container to rerun.
By configuring the main application to exit with a specific code upon detecting such a corruption, you can trigger the RestartAllContainers action, guaranteeing that the init container provides a clean setup before the application restarts.
2. Handling a High Rate of Similar Task Executions
Some workloads are best represented as a Pod execution, where each task requires a clean environment (e.g., game‑session backends or queue‑item processors).
When the task rate is high, the full cycle of Pod creation, scheduling, and initialization becomes too expensive—especially for short‑lived tasks.
The ability to restart all containers from scratch gives a Kubernetes‑native way to handle this scenario without custom solutions or frameworks.
How to Use It
-
Enable the feature gate
- Set
RestartAllContainersOnContainerExitson your Kubernetes control‑plane components (API server and kubelet). - Requires Kubernetes v1.35+.
- This alpha feature extends the
ContainerRestartRulesfeature, which graduated to beta in v1.35 and is enabled by default.
- Set
-
Add
restartPolicyRulesto any container (init, sidecar, or regular) and use theRestartAllContainersaction. -
Best‑practice checklist
- Ensure all containers are re‑entrant.
- Verify external tooling can handle init containers re‑running.
- Remember that preStop hooks are NOT executed when a full‑container restart occurs; containers must tolerate abrupt termination.
Observing the Restart
-
A new Pod condition
AllContainersRestartingis added to the Pod’s status.- Becomes
Truewhen a restart is triggered. - Reverts to
Falseonce all containers have terminated and the Pod is ready to start its lifecycle anew.
- Becomes
-
All containers restarted by this action will have their restart count incremented in the container status.
Learn More
- Pod Lifecycle – official documentation.
- KEP‑5532 – Restart All Containers on Container Exits (detailed proposal).
- JobSet in‑place restart – discussion in JobSet issue #467.
We Want Your Feedback!
As an alpha feature, RestartAllContainers is ready for experimentation.
Your use‑cases and feedback are welcome. This feature is driven by the SIG Node community.
Get involved:
- Slack:
#sig-node - Mailing list: (link to SIG Node mailing list)