Kubernetes v1.35: Extended Toleration Operators to Support Numeric Comparisons (Alpha)

Published: 1 month ago (January 5, 2026 at 01:30 PM EST)

6 min read

Source: Kubernetes Blog

Overview

Many production Kubernetes clusters blend on‑demand (higher‑SLA) and spot / preemptible (lower‑SLA) nodes to optimise costs while maintaining reliability for critical workloads. Platform teams need a safe default that keeps most workloads away from risky capacity, while allowing specific workloads to opt‑in with explicit thresholds such as “I can tolerate nodes with a failure probability up to 5 %”.

Today, Kubernetes taints and tolerations can match exact values or check for existence, but they cannot compare numeric thresholds. The usual work‑arounds are:

Creating discrete taint categories
Using external admission controllers
Accepting sub‑optimal placement decisions

In Kubernetes v1.35 we introduce Extended Toleration Operators (alpha). This enhancement adds Gt (Greater‑Than) and Lt (Less‑Than) operators to spec.tolerations, enabling threshold‑based scheduling decisions that unlock new possibilities for SLA‑based placement, cost optimisation, and performance‑aware workload distribution.

The Evolution of Tolerations

Historically Kubernetes supported two primary toleration operators:

Operator	Behaviour
Equal	Matches a taint if the key and value are exactly equal
Exists	Matches a taint if the key exists, regardless of value

These work well for categorical scenarios but fall short for numeric comparisons. Starting with v1.35 we close this gap.

Real‑world scenarios enabled by numeric operators

SLA requirements – Schedule high‑availability workloads only on nodes with failure probability below a threshold.
Cost optimisation – Allow cost‑sensitive batch jobs to run on cheaper nodes that exceed a specific cost‑per‑hour value.
Performance guarantees – Ensure latency‑sensitive applications run only on nodes with disk IOPS or network bandwidth above minimum thresholds.

Without numeric comparison operators, cluster operators have had to resort to work‑arounds that don’t scale and lack flexibility.

Why extend tolerations instead of using NodeAffinity?

NodeAffinity already supports numeric comparison operators, so why add them to tolerations?

Aspect	NodeAffinity	Taints & Tolerations
Policy orientation	Per‑pod; every workload must explicitly opt‑out of risky nodes.	Node‑side policy; nodes declare their risk level, and only pods with matching tolerations may land there. Safer default.
Eviction semantics	No eviction capability.	Supports `NoExecute` effect with `tolerationSeconds`, enabling graceful draining when a node’s SLA degrades or a spot instance receives a termination notice.
Operational ergonomics	Distributed across many pod specs.	Centralised, node‑side policy aligns with existing safety taints (e.g., `disk-pressure`, `memory-pressure`).

Extending tolerations preserves the well‑understood safety model while enabling threshold‑based placement for SLA‑aware scheduling.

Introducing Gt and Lt operators

Kubernetes v1.35 adds two new operators for tolerations:

Operator	Meaning
Gt (Greater‑Than)	The toleration matches *if the taint’s numeric value is less* than the toleration’s value**.
Lt (Less‑Than)	The toleration matches *if the taint’s numeric value is greater* than the toleration’s value**.

When a pod tolerates a taint with Lt, it is saying “I can tolerate nodes where this metric is less than my threshold”. Since tolerations allow scheduling, the pod can run on nodes where the taint value is greater than the toleration value – i.e., “I tolerate nodes that are above my minimum requirements”.

These operators work with numeric taint values and are applicable to all taint effects: NoSchedule, NoExecute, and PreferNoSchedule.

Note
Numeric values for Gt and Lt must be positive 64‑bit integers without leading zeros. For example, "100" is valid, but "0100" (leading zero) and "0" (zero value) are not permitted.

Use Cases & Examples

Example 1 – Spot instance protection with SLA thresholds

Many clusters mix on‑demand and spot/preemptible nodes to optimise costs. Spot nodes offer savings but have higher failure rates. The goal: most workloads avoid spot nodes by default, while specific workloads can opt‑in with clear SLA boundaries.

1️⃣ Taint spot nodes with their failure probability (e.g., 15 % annual failure rate)

apiVersion: v1
kind: Node
metadata:
  name: spot-node-1
spec:
  taints:
  - key: "failure-probability"
    value: "15"
    effect: "NoExecute"

2️⃣ Taint on‑demand nodes with a lower failure rate

apiVersion: v1
kind: Node
metadata:
  name: ondemand-node-1
spec:
  taints:
  - key: "failure-probability"
    value: "2"
    effect: "NoExecute"

3️⃣ Critical workload that requires a strict SLA (≤ 5 % failure probability)

apiVersion: v1
kind: Pod
metadata:
  name: payment-processor
spec:
  tolerations:
  - key: "failure-probability"
    operator: "Lt"
    value: "5"
    effect: "NoExecute"
    tolerationSeconds: 30   # Grace period if the node’s SLA degrades
  containers:
  - name: app
    image: payment-app:v1

Result: The pod schedules only on nodes where failure-probability is less than 5 (i.e., ondemand-node-1 with value 2). The NoExecute effect with tolerationSeconds: 30 gives the pod 30 seconds to shut down gracefully if the node’s taint value changes.

4️⃣ Fault‑tolerant batch job that explicitly opts‑in to spot instances

apiVersion: v1
kind: Pod
metadata:
  name: batch-job
spec:
  tolerations:
  - key: "failure-probability"
    operator: "Gt"
    value: "10"
    effect: "NoExecute"
    tolerationSeconds: 60
  containers:
  - name: worker
    image: batch-worker:v2

Result: This pod tolerates nodes with a failure probability greater than 10 (i.e., spot nodes). If a spot node’s failure probability later drops below the threshold, the pod will be evicted after the 60‑second grace period.

Example 2 – AI Workload Placement with GPU Tiers

1. Taint GPU nodes with their compute‑capability score

apiVersion: v1
kind: Node
metadata:
  name: gpu-node-a100
spec:
  taints:
  - key: "gpu-compute-score"
    value: "1000"
    effect: "NoSchedule"
---
apiVersion: v1
kind: Node
metadata:
  name: gpu-node-t4
spec:
  taints:
  - key: "gpu-compute-score"
    value: "500"
    effect: "NoSchedule"

2. Heavy‑training workload – requires high‑performance GPUs

apiVersion: v1
kind: Pod
metadata:
  name: model-training
spec:
  tolerations:
  - key: "gpu-compute-score"
    operator: "Gt"
    value: "800"
    effect: "NoSchedule"
  containers:
  - name: trainer
    image: ml-trainer:v1
    resources:
      limits:
        nvidia.com/gpu: 1

The pod will only schedule on nodes with a compute score > 800 (e.g., the A100 node), preventing placement on lower‑tier GPUs that would slow down training.

3. Inference workload – can use any available GPU

apiVersion: v1
kind: Pod
metadata:
  name: model-inference
spec:
  tolerations:
  - key: "gpu-compute-score"
    operator: "Gt"
    value: "400"
    effect: "NoSchedule"
  containers:
  - name: inference
    image: ml-inference:v1
    resources:
      limits:
        nvidia.com/gpu: 1

This pod will schedule on any node with a compute score > 400, covering both A100 and T4 nodes.

Example 3 – Cost‑Optimized Workload Placement

1. Taint nodes with their cost rating

spec:
  taints:
  - key: "cost-per-hour"
    value: "50"
    effect: "NoSchedule"

2. Cost‑sensitive batch job – tolerates only inexpensive nodes

tolerations:
- key: "cost-per-hour"
  operator: "Lt"
  value: "100"
  effect: "NoSchedule"

The job will schedule on nodes costing < $100 / hour, avoiding more expensive resources.

Example 4 – Performance‑Based Placement (Disk IOPS)

tolerations:
- key: "disk-iops"
  operator: "Gt"
  value: "3000"
  effect: "NoSchedule"

The pod will only schedule on nodes where disk-iops exceeds 3000.

How to Use This Feature

Extended Toleration Operators is an alpha feature in Kubernetes v1.35.

Enable the feature gate on both the API server and the scheduler:
```
--feature-gates=TaintTolerationComparisonOperators=true
```

Taint your nodes with numeric values that represent the metrics relevant to your scheduling needs:

kubectl taint nodes node-1 failure-probability=5:NoSchedule
kubectl taint nodes node-2 disk-iops=5000:NoSchedule

Use the new operators in your pod specifications:

spec:
  tolerations:
  - key: "failure-probability"
    operator: "Lt"
    value: "1"
    effect: "NoSchedule"

Note

As an alpha feature, Extended Toleration Operators may change in future releases.
Use with caution in production environments and test thoroughly in non‑production clusters first.

What’s Next?

This alpha release is just the beginning. As we gather community feedback, we plan to:

Add support for CEL (Common Expression Language) expressions in tolerations and node affinity for even more flexible scheduling logic, including semantic‑version comparisons.
Improve integration with cluster autoscaling for threshold‑aware capacity planning.
Graduate the feature to beta and eventually GA with production‑ready stability.

We’re especially interested in hearing about your use cases!
Do you have scenarios where threshold‑based scheduling would solve problems?
Are there additional operators or capabilities you’d like to see?

Getting Involved

The feature is driven by the SIG Scheduling community. Join us to share ideas and feedback:

Slack: #sig-scheduling on the Kubernetes Slack workspace
Mailing list: kubernetes-sig-scheduling@googlegroups.com

For specific inquiries related to Extended Toleration Operators, please reach out to the SIG Scheduling community. We look forward to hearing from you!

Learn More

Taints and Tolerations – fundamentals of node‑pod scheduling
Numeric Comparison Operators – details on using Gt and Lt operators
KEP‑5471: Extended Toleration Operators for Threshold‑Based Placement

Kubernetes v1.35: Extended Toleration Operators to Support Numeric Comparisons (Alpha)

Overview

The Evolution of Tolerations

Real‑world scenarios enabled by numeric operators

Why extend tolerations instead of using NodeAffinity?

Introducing Gt and Lt operators

Use Cases & Examples

Example 1 – Spot instance protection with SLA thresholds

1️⃣ Taint spot nodes with their failure probability (e.g., 15 % annual failure rate)

2️⃣ Taint on‑demand nodes with a lower failure rate

3️⃣ Critical workload that requires a strict SLA (≤ 5 % failure probability)

4️⃣ Fault‑tolerant batch job that explicitly opts‑in to spot instances

Example 2 – AI Workload Placement with GPU Tiers

1. Taint GPU nodes with their compute‑capability score

2. Heavy‑training workload – requires high‑performance GPUs

3. Inference workload – can use any available GPU

Example 3 – Cost‑Optimized Workload Placement

1. Taint nodes with their cost rating

2. Cost‑sensitive batch job – tolerates only inexpensive nodes

Example 4 – Performance‑Based Placement (Disk IOPS)

How to Use This Feature

Note

What’s Next?

Getting Involved

Learn More

Related posts

Kubernetes v1.35: New level of efficiency with in-place Pod restart

Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager

Deployment strategies

kubernetes project #1

Overview

The Evolution of Tolerations

Real‑world scenarios enabled by numeric operators

Why extend tolerations instead of using NodeAffinity?

Introducing Gt and Lt operators

Use Cases & Examples

Example 1 – Spot instance protection with SLA thresholds

1️⃣ Taint spot nodes with their failure probability (e.g., 15 % annual failure rate)

2️⃣ Taint on‑demand nodes with a lower failure rate

3️⃣ Critical workload that requires a strict SLA (≤ 5 % failure probability)

4️⃣ Fault‑tolerant batch job that explicitly opts‑in to spot instances

Example 2 – AI Workload Placement with GPU Tiers

1. Taint GPU nodes with their compute‑capability score

2. Heavy‑training workload – requires high‑performance GPUs

3. Inference workload – can use any available GPU

Example 3 – Cost‑Optimized Workload Placement

1. Taint nodes with their cost rating

2. Cost‑sensitive batch job – tolerates only inexpensive nodes

Example 4 – Performance‑Based Placement (Disk IOPS)

How to Use This Feature

Note

What’s Next?

Getting Involved

Learn More

Related posts

Kubernetes v1.35: New level of efficiency with in-place Pod restart

Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager

Deployment strategies

kubernetes project #1

Example 1 – Spot instance protection with SLA thresholds

1️⃣ Taint spot nodes with their failure probability (e.g., 15 % annual failure rate)

3️⃣ Critical workload that requires a strict SLA (≤ 5 % failure probability)

Example 2 – AI Workload Placement with GPU Tiers

Example 3 – Cost‑Optimized Workload Placement

Example 4 – Performance‑Based Placement (Disk IOPS)