Modernizing Prometheus: Native Storage for Composite Types
Source: Prometheus Blog
Over the last year, the Prometheus community has been working hard on several interesting and ambitious changes that previously would have been seen as controversial or not feasible. While there might be little visibility about those from the outside (e.g., it’s not an OpenClaw Prometheus plugin, sorry 🙃), Prometheus developers are, organically, steering Prometheus into a certain, coherent future. Piece by piece, we unexpectedly get closer to goals we never dreamed we would achieve as an open‑source project!
This post starts (hopefully!) as a series of blog posts that share a few ambitious shifts that might be exciting to new and existing Prometheus users and developers. In this post, I’d love to focus on the idea of native storage for the composite types, which is tidying up a lot of challenges that piled up over time. Make sure to check the provided inlined links on how you can adopt some of those changes early or contribute!
CAUTION: Disclaimer: This post is intended as a fun overview, from my own personal point of view as a Prometheus maintainer. Some of the mentioned changes haven’t been (yet) officially approved by the Prometheus Team; some of them were not proved in production.
NOTE: This post was written by humans; AI was used only for cosmetic and grammar fixes.
Classic Representation: Primitive Samples
As you might know, the Prometheus data model (so server, PromQL, protocols) supports gauges, counters, histograms, and summaries. OpenMetrics 1.0 extended this with gaugehistogram, info, and stateset types.
Impressively, for a long time Prometheus’ TSDB storage implementation had an explicitly clean and simple data model. The TSDB allowed the storage and retrieval of string‑labelled primitive samples containing only float64 values and int64 timestamps. It was completely metric‑type‑agnostic.
The metric types were implied on top of the TSDB, for humans and best‑effort tooling for PromQL. For simplicity, let’s call this way of storing types a classic model (or representation). In this model:
Primitive types
| Type | Description |
|---|---|
| gauge | “default” type with no special rules – just a float sample with labels. |
| counter | Should have a _total suffix in the name for humans to understand its semantics.foo_total 17.0 |
| info | Needs an _info suffix in the metric name and always has a value of 1. |
Composite types
This is where the fun begins. In the classic representation, composite metrics are represented as a set of primitive float samples.
Histogram – a group of counters with certain mandatory suffixes and le labels:
foo_bucket{le="0.0"} 0
foo_bucket{le="1e-05"} 0
foo_bucket{le="0.0001"} 5
foo_bucket{le="0.1"} 8
foo_bucket{le="1.0"} 10
foo_bucket{le="10.0"} 11
foo_bucket{le="100000.0"}11
foo_bucket{le="1e+06"} 15
foo_bucket{le="1e+23"} 16
foo_bucket{le="1.1e+23"} 17
foo_bucket{le="+Inf"} 17
foo_count 17
foo_sum 324789.3
gaugehistogram, summary, and stateset types follow the same logic – a group of special gauges or counters that compose a single metric.
The classic model served the Prometheus project well. It significantly simplified the storage implementation, enabling Prometheus to be one of the most optimized, open‑source time‑series databases, with distributed versions based on the same data model available in projects like Cortex, Thanos, Mimir, etc.
Limitations of the Classic Model
| Category | Issue |
|---|---|
| Efficiency | Overhead for composite types because every new piece of data (e.g., a new bucket) takes precious index space (it’s a new unique series), whereas samples are far more compressible (rarely change, time‑oriented). |
| Functionality | Limits the shape and flexibility of stored data (unless we resort to JSON‑encoded labels, which have massive downsides). |
| Transactionality | Primitive pieces of composite types (separate counters) are processed independently. Write isolation works for scrapes, but breaks for remote‑write, OTLP, or long‑term distributed storage. A histogram may be partially sent, causing false‑positive or missed alerts. |
| Reliability | Consumers of the TSDB data must essentially guess the type semantics. Nothing stops a user from writing a foo_bucket gauge or a foo_total histogram. |
A Glimpse of Native Storage for Composite Types
The classic model was challenged by the introduction of native histograms. The TSDB was extended to store composite histogram samples other than plain floats. We tend to call this a native histogram because the TSDB can now natively store a full (sparse and exponential) histogram as an atomic, composite sample.
At that point, the common wisdom was to stop there. The special advanced histogram that is generally meant to replace the “classic” histograms uses a composite sample, while the rest of the metrics continue to use the classic model. Making other composite types consistent with the new native model felt extremely disruptive to users, with too much work and risk.
A common counter‑argument was that users would eventually migrate their classic histograms naturally, and that summaries are less useful given the more powerful bucketing and lower cost of native histograms.
Unfortunately, the migration to native histograms was known to take time:
- PromQL changes – Slight syntax adjustments are required to query native histograms.
- Client changes – Applications must define new or edit existing metrics to use native histograms.
- Legacy software – Old software may remain in production indefinitely, never migrating.
Consequently, Prometheus cannot simply deprecate classic histograms; all downstream solutions must continue to support the classic model.
Native Histograms, NHCB, and the Path Toward a Fully Composite Sample Model
Background
Native histograms pushed the TSDB and its ecosystem into a new composite‑sample pattern. Some of those changes could be adapted to all composite types, and native histograms gave us a glimpse of the many benefits of native support.
“Would it be possible to add native counterparts of the existing composite metrics to replace them, ideally transparently?”
In 2024, for transactionality and efficiency, we introduced Native Histogram Custom Buckets (NHCB) – a concept that stores classic histograms with explicit buckets natively, re‑using the native‑histogram composite‑sample data structures.
-
Efficiency – NHCB is at least 30 % more efficient than the classic representation while offering functional parity.
-
Adoption challenges – two practical problems slowed uptake:
- Expanding (converting NHCB → classic) is trivial.
- Combining (classic → NHCB) is often infeasible.
- Converting on scrape is expensive.
- Remote‑write “pushes” may split a histogram across shards or sequential messages, making combination impossible.
- This is why OpenTelemetry‑collector users see extra overhead on
prometheusreceiver– the OpenTelemetry model follows the composite‑sample model strictly.
Consumption Differences (PromQL)
Classic histogram example
foo_bucket{le="0.0"} 0
# …
foo_bucket{le="1.1e+23"} 17
foo_bucket{le="+Inf"} 17
foo_count 17
foo_sum 324789.3
NHCB representation
The metric name is now foo (no _bucket suffix).
# New syntax (native)
histogram_quantile(0.9, sum(foo{job="a"}))
# Old syntax (expanded)
histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))
Consequences
- The “what you see is what you query” rule for text formats is violated (until OpenMetrics 2).
- Similar problems appear on other Prometheus outputs (federation, remote‑read, remote‑write).
Note: Prometheus client data model (SDKs) and the PrometheusProto scrape protocol already use the composite‑sample model!
Transparent Native Representation
Community Direction
The Prometheus community appears to be converging on two ideas:
- Move to a fully composite‑sample model on the storage layer – to reap all associated benefits.
- Allow users to switch (e.g., on scrape) from classic to native form without breaking the consumption layer – easing migration, avoiding dual‑mode protocol changes, and deprecating the classic model as quickly as possible.
Ongoing Efforts You Can Contribute To
| Initiative | Goal | Status |
|---|---|---|
| Native summary & stateset | Eliminate the classic model for all composite types. | Early discussion – contributions welcome. |
| OpenMetrics 2.0 | Consolidate and improve the pull protocol; move to composite values in text format, making parsing trivial for storages that support native composites. | Text format will still expand to classic on scrape by default (no breaking change). |
| Remote Write 2.0 | Transport histograms in native form (classic still supported). Future versions (e.g., 2.1) may add native summaries and stateset. | Stabilisation needed – contributions welcome. |
| Compatibility modes | Translate stored composite samples back to classic representation for consumption (PromQL, federation, remote‑read, etc.). | Prototype exists; edge cases remain. |
PromQL Compatibility Example
# New syntax – works directly on the NHCB "foo"
histogram_quantile(0.9, sum(foo{job="a"}))
# Old syntax – expands the NHCB to classic representation
histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))
Alternatives (e.g., a special label or annotation) are also under discussion.
When fully implemented, pipelines can switch transparently to native form at any stage.
Summary
Moving Prometheus to a native composite‑type world is challenging and will take time:
- Performance characteristics shift from uniform, predictable sample sizes to sizes that depend on the metric type.
- Code architecture becomes more complex – maintaining multiple sample types has already proven difficult.
Nevertheless, the community is actively working on the necessary protocol updates, compatibility layers, and storage changes. Your contributions—whether to OpenMetrics 2.0, Remote Write 2.0, or the compatibility modes—are essential to make this transition smooth and sustainable.
Updates & Opportunities
Why This Matters
Recent work has uncovered a clean, viable path that brings clear benefits in:
- Functionality
- Transactionality
- Reliability
- Efficiency
These improvements are expected in the relatively near future—very exciting!
How to Get Involved
- Direct Message me on Slack.
- Post questions in the #prometheus‑dev Slack channel.
- Comment on related issues, create PRs, and review PRs (the most impactful work!).
Prometheus at KubeCon EU 2026 – Amsterdam
| When | What |
|---|---|
| Booth | Visit the Prometheus KubeCon booth |
| Wed Mar 25 2026, 16:00 | Contributing Workshop |
| Thu Mar 26 2026, 13:45 | Prometheus V3 – One Year In: OpenMetrics 2.0 and More! session |
Future Topics (Work‑in‑Progress)
No promises, but help is welcome! Below is a non‑exhaustive, random‑order list of areas we plan to cover in upcoming posts.
- Native start‑timestamp feature – cleanly unlocks native delta temporality without hacks (e.g., re‑using gauges, extra metric types, or label annotations like
__temporality__). - Optional schematization of Prometheus metrics – tackles stability problems in metric naming/shape, building on OpenTelemetry semantic conventions.
- Metadata storage improvements – enhances the OpenTelemetry Entities and resource‑attributes storage/consumption experience.
- Extended scrape/pull protocols – aligns Prometheus with the recent OpenMetrics ownership move.
- TSDB Parquet effort – a joint initiative from the three LTS project groups (Cortex, Thanos, Mimir) aimed at high‑cardinality use cases.
- PromQL extensions – experiments with pipes, variables, and new SQL‑transpilation ideas.
- Governance changes – ongoing updates to project governance.
See You in Open‑Source!
Feel free to reach out, contribute, or simply follow the journey. Your participation makes all of this possible.