Kubernetes v1.35: Extended Toleration Operators to Support Numeric Comparisons (Alpha)
Source: Kubernetes Blog
Overview
Many production Kubernetes clusters blend on‑demand (higher‑SLA) and spot / preemptible (lower‑SLA) nodes to optimise costs while maintaining reliability for critical workloads. Platform teams need a safe default that keeps most workloads away from risky capacity, while allowing specific workloads to opt‑in with explicit thresholds such as “I can tolerate nodes with a failure probability up to 5 %”.
Today, Kubernetes taints and tolerations can match exact values or check for existence, but they cannot compare numeric thresholds. The usual work‑arounds are:
- Creating discrete taint categories
- Using external admission controllers
- Accepting sub‑optimal placement decisions
In Kubernetes v1.35 we introduce Extended Toleration Operators (alpha). This enhancement adds Gt (Greater‑Than) and Lt (Less‑Than) operators to spec.tolerations, enabling threshold‑based scheduling decisions that unlock new possibilities for SLA‑based placement, cost optimisation, and performance‑aware workload distribution.
The Evolution of Tolerations
Historically Kubernetes supported two primary toleration operators:
| Operator | Behaviour |
|---|---|
| Equal | Matches a taint if the key and value are exactly equal |
| Exists | Matches a taint if the key exists, regardless of value |
These work well for categorical scenarios but fall short for numeric comparisons. Starting with v1.35 we close this gap.
Real‑world scenarios enabled by numeric operators
- SLA requirements – Schedule high‑availability workloads only on nodes with failure probability below a threshold.
- Cost optimisation – Allow cost‑sensitive batch jobs to run on cheaper nodes that exceed a specific cost‑per‑hour value.
- Performance guarantees – Ensure latency‑sensitive applications run only on nodes with disk IOPS or network bandwidth above minimum thresholds.
Without numeric comparison operators, cluster operators have had to resort to work‑arounds that don’t scale and lack flexibility.
Why extend tolerations instead of using NodeAffinity?
NodeAffinity already supports numeric comparison operators, so why add them to tolerations?
| Aspect | NodeAffinity | Taints & Tolerations |
|---|---|---|
| Policy orientation | Per‑pod; every workload must explicitly opt‑out of risky nodes. | Node‑side policy; nodes declare their risk level, and only pods with matching tolerations may land there. Safer default. |
| Eviction semantics | No eviction capability. | Supports NoExecute effect with tolerationSeconds, enabling graceful draining when a node’s SLA degrades or a spot instance receives a termination notice. |
| Operational ergonomics | Distributed across many pod specs. | Centralised, node‑side policy aligns with existing safety taints (e.g., disk-pressure, memory-pressure). |
Extending tolerations preserves the well‑understood safety model while enabling threshold‑based placement for SLA‑aware scheduling.
Introducing Gt and Lt operators
Kubernetes v1.35 adds two new operators for tolerations:
| Operator | Meaning |
|---|---|
| Gt (Greater‑Than) | The toleration matches if the taint’s numeric value is less than the toleration’s value. |
| Lt (Less‑Than) | The toleration matches if the taint’s numeric value is greater than the toleration’s value. |
When a pod tolerates a taint with Lt, it is saying “I can tolerate nodes where this metric is less than my threshold”. Since tolerations allow scheduling, the pod can run on nodes where the taint value is greater than the toleration value – i.e., “I tolerate nodes that are above my minimum requirements”.
These operators work with numeric taint values and are applicable to all taint effects: NoSchedule, NoExecute, and PreferNoSchedule.
Note
Numeric values forGtandLtmust be positive 64‑bit integers without leading zeros. For example,"100"is valid, but"0100"(leading zero) and"0"(zero value) are not permitted.
Use Cases & Examples
Example 1 – Spot instance protection with SLA thresholds
Many clusters mix on‑demand and spot/preemptible nodes to optimise costs. Spot nodes offer savings but have higher failure rates. The goal: most workloads avoid spot nodes by default, while specific workloads can opt‑in with clear SLA boundaries.
1️⃣ Taint spot nodes with their failure probability (e.g., 15 % annual failure rate)
apiVersion: v1
kind: Node
metadata:
name: spot-node-1
spec:
taints:
- key: "failure-probability"
value: "15"
effect: "NoExecute"
2️⃣ Taint on‑demand nodes with a lower failure rate
apiVersion: v1
kind: Node
metadata:
name: ondemand-node-1
spec:
taints:
- key: "failure-probability"
value: "2"
effect: "NoExecute"
3️⃣ Critical workload that requires a strict SLA (≤ 5 % failure probability)
apiVersion: v1
kind: Pod
metadata:
name: payment-processor
spec:
tolerations:
- key: "failure-probability"
operator: "Lt"
value: "5"
effect: "NoExecute"
tolerationSeconds: 30 # Grace period if the node’s SLA degrades
containers:
- name: app
image: payment-app:v1
Result: The pod schedules only on nodes where failure-probability is less than 5 (i.e., ondemand-node-1 with value 2). The NoExecute effect with tolerationSeconds: 30 gives the pod 30 seconds to shut down gracefully if the node’s taint value changes.
4️⃣ Fault‑tolerant batch job that explicitly opts‑in to spot instances
apiVersion: v1
kind: Pod
metadata:
name: batch-job
spec:
tolerations:
- key: "failure-probability"
operator: "Gt"
value: "10"
effect: "NoExecute"
tolerationSeconds: 60
containers:
- name: worker
image: batch-worker:v2
Result: This pod tolerates nodes with a failure probability greater than 10 (i.e., spot nodes). If a spot node’s failure probability later drops below the threshold, the pod will be evicted after the 60‑second grace period.
Example 2 – AI Workload Placement with GPU Tiers
1. Taint GPU nodes with their compute‑capability score
apiVersion: v1
kind: Node
metadata:
name: gpu-node-a100
spec:
taints:
- key: "gpu-compute-score"
value: "1000"
effect: "NoSchedule"
---
apiVersion: v1
kind: Node
metadata:
name: gpu-node-t4
spec:
taints:
- key: "gpu-compute-score"
value: "500"
effect: "NoSchedule"
2. Heavy‑training workload – requires high‑performance GPUs
apiVersion: v1
kind: Pod
metadata:
name: model-training
spec:
tolerations:
- key: "gpu-compute-score"
operator: "Gt"
value: "800"
effect: "NoSchedule"
containers:
- name: trainer
image: ml-trainer:v1
resources:
limits:
nvidia.com/gpu: 1
The pod will only schedule on nodes with a compute score > 800 (e.g., the A100 node), preventing placement on lower‑tier GPUs that would slow down training.
3. Inference workload – can use any available GPU
apiVersion: v1
kind: Pod
metadata:
name: model-inference
spec:
tolerations:
- key: "gpu-compute-score"
operator: "Gt"
value: "400"
effect: "NoSchedule"
containers:
- name: inference
image: ml-inference:v1
resources:
limits:
nvidia.com/gpu: 1
This pod will schedule on any node with a compute score > 400, covering both A100 and T4 nodes.
Example 3 – Cost‑Optimized Workload Placement
1. Taint nodes with their cost rating
spec:
taints:
- key: "cost-per-hour"
value: "50"
effect: "NoSchedule"
2. Cost‑sensitive batch job – tolerates only inexpensive nodes
tolerations:
- key: "cost-per-hour"
operator: "Lt"
value: "100"
effect: "NoSchedule"
The job will schedule on nodes costing < $100 / hour, avoiding more expensive resources.
Example 4 – Performance‑Based Placement (Disk IOPS)
tolerations:
- key: "disk-iops"
operator: "Gt"
value: "3000"
effect: "NoSchedule"
The pod will only schedule on nodes where disk-iops exceeds 3000.
How to Use This Feature
Extended Toleration Operators is an alpha feature in Kubernetes v1.35.
-
Enable the feature gate on both the API server and the scheduler:
--feature-gates=TaintTolerationComparisonOperators=true -
Taint your nodes with numeric values that represent the metrics relevant to your scheduling needs:
kubectl taint nodes node-1 failure-probability=5:NoSchedule kubectl taint nodes node-2 disk-iops=5000:NoSchedule -
Use the new operators in your pod specifications:
spec: tolerations: - key: "failure-probability" operator: "Lt" value: "1" effect: "NoSchedule"
Note
- As an alpha feature, Extended Toleration Operators may change in future releases.
- Use with caution in production environments and test thoroughly in non‑production clusters first.
What’s Next?
This alpha release is just the beginning. As we gather community feedback, we plan to:
- Add support for CEL (Common Expression Language) expressions in tolerations and node affinity for even more flexible scheduling logic, including semantic‑version comparisons.
- Improve integration with cluster autoscaling for threshold‑aware capacity planning.
- Graduate the feature to beta and eventually GA with production‑ready stability.
We’re especially interested in hearing about your use cases!
Do you have scenarios where threshold‑based scheduling would solve problems?
Are there additional operators or capabilities you’d like to see?
Getting Involved
The feature is driven by the SIG Scheduling community. Join us to share ideas and feedback:
- Slack:
#sig-schedulingon the Kubernetes Slack workspace - Mailing list:
kubernetes-sig-scheduling@googlegroups.com
For specific inquiries related to Extended Toleration Operators, please reach out to the SIG Scheduling community. We look forward to hearing from you!
Learn More
- Taints and Tolerations – fundamentals of node‑pod scheduling
- Numeric Comparison Operators – details on using
GtandLtoperators - KEP‑5471: Extended Toleration Operators for Threshold‑Based Placement