Introducing Node Readiness Controller

Published: 3 days ago (February 2, 2026 at 09:00 PM EST)

3 min read

Source: Kubernetes Blog

Why the Node Readiness Controller?

Core Kubernetes node Ready status is often insufficient for clusters with sophisticated bootstrapping requirements. Operators frequently struggle to ensure that specific DaemonSets or local services are healthy before a node enters the scheduling pool.

The Node Readiness Controller fills this gap by allowing operators to define custom scheduling gates tailored to specific node groups. This enables you to enforce distinct readiness requirements across heterogeneous clusters, for example:

GPU‑equipped nodes only accept pods once specialized drivers are verified.
General‑purpose nodes follow a standard path.

Primary Advantages

Custom Readiness Definitions – Define what “ready” means for your platform.
Automated Taint Management – The controller automatically applies or removes node taints based on condition status, preventing pods from landing on unready infrastructure.
Declarative Node Bootstrapping – Manage multi‑step node initialization reliably, with clear observability into the bootstrapping process.

Core Concepts and Features

The controller centers around the NodeReadinessRule (NRR) API, which lets you declare gates for your nodes.

Flexible Enforcement Modes

Mode	Description
Continuous enforcement	Actively maintains the readiness guarantee throughout the node’s entire lifecycle. If a critical dependency (e.g., a device driver) fails later, the node is immediately tainted to prevent new scheduling.
Bootstrap‑only enforcement	Designed for one‑time initialization steps (e.g., pre‑pulling heavy images or hardware provisioning). Once conditions are met, the controller marks the bootstrap as complete and stops monitoring that specific rule for the node.

Condition Reporting

The controller reacts to Node Conditions rather than performing health checks itself. This decoupled design allows seamless integration with existing tools and custom solutions:

Node Problem Detector (NPD) – Use existing NPD setups and custom scripts to report node health.
Readiness Condition Reporter – A lightweight agent provided by the project that can be deployed to periodically check local HTTP endpoints and patch node conditions accordingly.

Operational Safety with Dry Run

Deploying new readiness rules across a fleet carries inherent risk. Dry‑run mode lets operators simulate the impact before enforcement:

The controller logs intended actions.
Updates the rule’s status to show affected nodes without applying actual taints.

This enables safe validation prior to production rollout.

Example: CNI Bootstrapping

The following NodeReadinessRule ensures a node remains unschedulable until its CNI agent is functional. The controller monitors a custom cniplugin.example.net/NetworkReady condition and removes the readiness.k8s.io/acme.com/network-unavailable taint once the status is True.

apiVersion: readiness.node.x-k8s.io/v1alpha1
kind: NodeReadinessRule
metadata:
  name: network-readiness-rule
spec:
  conditions:
    - type: "cniplugin.example.net/NetworkReady"
      requiredStatus: "True"
  taint:
    key: "readiness.k8s.io/acme.com/network-unavailable"
    effect: "NoSchedule"
    value: "pending"
  enforcementMode: "bootstrap-only"
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: ""

Getting Involved

The Node Readiness Controller is just getting started. Following our productive Unconference discussions at KubeCon NA 2025, we are eager to continue the conversation in person.

Upcoming session: KubeCon + CloudNativeCon Europe 2026 – Addressing Non‑Deterministic Scheduling: Introducing the Node Readiness Controller
GitHub:
Slack: Join the conversation in #sig-node-readiness-controller
Documentation: Getting Started (link to docs)

Introducing Node Readiness Controller

Why the Node Readiness Controller?

Primary Advantages

Core Concepts and Features

Flexible Enforcement Modes

Condition Reporting

Operational Safety with Dry Run

Example: CNI Bootstrapping

Getting Involved

Related posts

The 10-Layer Monitoring Framework That Saved Our Clients From 3 a.m. Pages

A Perfect Match for Big Data: VMware vSphere Kubernetes Service and Tanzu Greenplum

What is OPA (Open Policy Agent)?

Top 15 DevOps Trends to Watch in 2026