A tale of two flows: Metaflow and Kubeflow
Source: Hacker News
Source: Hacker News
A Tale of Two Flows: Metaflow and Kubeflow
- Why Metaflow → Kubeflow
- Development
- Scaling
- Deployment
- Metaflow → Kubeflow in practice
- Feedback welcome!
Metaflow is a Python framework for building and operating ML/AI projects, originally developed and open‑sourced by Netflix in 2019. In many ways, Kubeflow and Metaflow are cousins: closely related in spirit, but designed with distinct goals and priorities.
-
Metaflow – emerged from Netflix’s need to empower data scientists and ML/AI developers with developer‑friendly, Python‑native tooling so they could iterate quickly, compare modeling approaches, and ship the best solutions to production without heavy engineering or DevOps involvement. On the infrastructure side, Metaflow started with AWS‑native services (AWS Batch, Step Functions) and later added first‑class support for Kubernetes and other hyperscaler clouds.
-
Kubeflow – began as a set of Kubernetes operators for distributed TensorFlow and Jupyter Notebook management. It has since evolved into a comprehensive cloud‑native AI ecosystem, offering tools such as Trainer, Katib, Spark Operator, Workspaces, Hub, KServe, and Pipelines for end‑to‑end ML workflows.
Over the years, Metaflow has delighted end users with its intuitive APIs, while Kubeflow has delivered tons of value to infrastructure teams through its robust platform components. This complementary nature motivated us to build a bridge between the two: you can now author projects in Metaflow and deploy them as Kubeflow Pipelines, side‑by‑side with your existing Kubeflow workloads.
In the most recent CNCF Technology Radar survey (October 2025), Metaflow received the highest positive scores in the “likelihood to recommend” and “usefulness” categories, reflecting its success in providing stable, productivity‑boosting APIs for ML/AI developers.
Metaflow spans the entire development lifecycle—from early experimentation to production deployment and ongoing operations. The core features below illustrate the breadth of its API surface, grouped by project stage.
Development
- Straightforward APIs for creating and composing workflows.
- Automated state transfer via artifacts, enabling you to build flows incrementally and resume them freely (see a recent article by Netflix).
- Interactive, real‑time visual outputs from tasks through cards – an ideal substrate for custom observability solutions, created quickly with AI copilots.
- Built‑in configuration management to balance code and configuration.
- Custom decorators for domain‑specific abstractions and project‑level policies.
Scaling
- Horizontal and vertical scaling – supports both task‑level and data‑level parallelism.
- Graceful failure handling – ensures jobs can recover or shut down cleanly when errors occur.
- Automatic dependency packaging – works with Conda, PyPI, and uv to bundle required libraries.
- Distributed‑computing paradigms – integrates with Ray, MPI, and Torch Distributed for large‑scale workloads.
- Checkpointing – provides reliable checkpoint/restart for long‑running tasks with consistent state management.
Deployment
- Namespaces – separate experimentation, production, and individual developers.
- CI/CD & GitOps – follow best‑practice branching strategies.
- Event‑triggered sub‑flows – compose large, reactive systems.
These features give a unified, user‑facing API for the capabilities required by real‑world ML/AI systems. Behind the scenes, Metaflow integrates with production‑grade infrastructure, acting as a user‑interface layer over platforms like Kubernetes—and now, Kubeflow.
Responsibility diagram
Key benefit:
The Metaflow–Kubeflow integration lets organizations keep their existing Kubernetes and Kubeflow infrastructure intact while upgrading the developer experience with higher‑level abstractions and additional functionality provided by Metaflow.
Currently, the integration supports deploying Metaflow flows as Kubeflow Pipelines. Once Metaflow tasks run on Kubernetes, you can schedule them through Kubeflow, gaining the best of both worlds.
Feedback welcome!
Getting Started with Metaflow‑Kubeflow Integration
You can access other components such as Katib and Trainer from Metaflow tasks through their Python clients as usual.
Because the integration requires no changes to your existing Kubeflow infrastructure, it is straightforward to get started.
1. Deploy Metaflow
| Environment | How to Deploy |
|---|---|
| Cloud | Deploy Metaflow in an existing cloud account (GCP, Azure, or AWS). |
| Local | Install the dev‑stack on your laptop with a single command: pip install metaflow[dev] (or the appropriate installer for your platform) |
2. Install the Integration Extension
Once Metaflow and Kubeflow are running independently, install the extension that provides the integration:
pip install metaflow-kubeflow
3. Configure Metaflow
Point Metaflow at your Kubeflow Pipelines service. Add the line below to your Metaflow config or set it as an environment variable:
METAFLOW_KUBEFLOW_PIPELINES_URL = "http://my-kubeflow"
4. Author and Test a Flow
-
Run locally to verify everything works:
python flow.py run -
Deploy as a Kubeflow pipeline once the local run succeeds:
python flow.py kubeflow-pipelines create
Metaflow will automatically:
- Package all source code and dependencies.
- Compile the flow into a Kubeflow Pipelines YAML.
- Deploy it to Kubeflow, where it appears alongside your existing pipelines in the Kubeflow UI.
Demo Screencast
5. Current Limitations
The integration does not yet support 100 % of Metaflow features. Notably, the following are unavailable:
- Conditional steps
- Recursive steps
Future releases may add convenience APIs for other Kubeflow components (e.g., KServe). You can also create custom decorators yourself:
- Custom decorators
- Direct use of the Kubeflow SDK
6. Learn More
- Announcement webinar – [Link to webinar]
- Documentation – [Metaflow‑Kubeflow docs]
- GitHub repository – [github.com/Netflix/metaflow-kubeflow]
About Metaflow & Kubeflow
Both are open‑source projects actively developed by multiple organizations. Metaflow is maintained by Netflix (with a dedicated team) and Outerbounds, which offers a managed Metaflow platform deployed in customers’ own cloud environments.
Join the Community
- Metaflow Slack – We welcome questions, feedback on the Kubeflow integration, and roadmap wishlist items.
We look forward to a fruitful collaboration between the Metaflow and Kubeflow communities!
