Prometheus Architecture

Published: 1 month ago (December 19, 2025 at 12:08 AM EST)

8 min read

Source: Dev.to

Introduction

This is the first in a series of articles focused on the architecture of the main components of a modern monitoring stack.

Initially, I planned to start by comparing the different variants and tools in the Prometheus ecosystem (such as Mimir, Thanos, and Cortex). However, it makes more sense to begin with Prometheus itself; after all, it is the foundation and origin of all these solutions.

Most likely, at some point in your IT journey you have heard of, seen, or used a metric exposed in Prometheus format for observability. Prometheus is an open‑source project, graduated by the Cloud Native Computing Foundation (CNCF)—the second project to achieve this status, right after Kubernetes.

It works extremely well in Kubernetes environments, but also adapts perfectly to clouds and container‑based environments in general.

Prometheus uses a pull‑based approach to collect metrics. Unlike systems where agents actively send data, Prometheus goes to the source and pulls the data.

Figure 1 – Prometheus data‑collection flow (pull vs. push)

[Insert diagram showing pull and push mechanisms]

Getting Started

The simplest way to get started is to run Prometheus as a container:

docker run -p 9090:9090 prom/prometheus:latest

To operate it you need a YAML configuration file where you define global parameters, scrape frequencies, and targets.

Basic startup configuration

global:
  scrape_interval: 15s          # Scrape frequency
  evaluation_interval: 15s     # Rule evaluation frequency
  external_labels:
    cluster: 'demo-cluster'
    environment: 'dev'

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
        labels:
          instance: 'prometheus-server'

The instance label (and any others you add) allows you to filter and aggregate metrics later during queries, providing context to the data.

Typical Prometheus Deployment

A typical deployment consists of several elements working together:

Component	Description
Prometheus Server	The brain of the operation, responsible for collection and storage.
Targets	Endpoints that your applications or servers expose with metrics.
Exporters	Agents that translate metrics from third‑party systems to Prometheus format.
Time‑Series Database (TSDB)	Internal database optimized for time‑series data.
PromQL	Powerful query language for analyzing data.
Pushgateway	Auxiliary component for handling short‑lived jobs.
Alertmanager	Manages, groups, and routes alert notifications.
Client Libraries	Libraries for instrumenting custom code directly in applications.

Figure 2 – Complete Prometheus architecture with all core components

[Insert diagram showing all core components and their interactions]

Component Details

1. Prometheus Server

The server is the central component. It performs three main functions:

Scraping – Periodically connects to configured targets via HTTP to fetch metrics.
Storage – Writes collected data to its local TSDB.
Evaluation & Querying – Evaluates alert rules and responds to queries (e.g., from Grafana) via the PromQL API.

In practice, it is the heart of the operation, ensuring the continuous flow of data from source to storage.

2. Targets

Targets are the sources of your metrics. They can be virtually anything: a Linux server, a Java application, an API endpoint, or Kubernetes pods. By default, Prometheus fetches metrics from the /metrics HTTP endpoint of the target, although this path is configurable.

scrape_configs:
  - job_name: 'node-metrics'
    static_configs:
      - targets: ['instance-dev:9100']
        labels:
          instance: 'instance-dev'

Common target ports

Port	Typical Exporter / Service
9100	`node_exporter` – OS metrics
8080	Custom application metrics
8081	`cAdvisor` – Docker container metrics

3. Exporters

Not all software natively exposes metrics in Prometheus format (e.g., MySQL, Redis). Exporters are small binaries that act as translators: they collect metrics from the original system (using its native APIs) and convert them to the plain‑text format that Prometheus understands, exposing them on an HTTP endpoint.

Popular exporters (maintained by the community and officially)

node_exporter – Hardware & OS metrics (CPU, memory, disk).
blackbox_exporter – Probing of external endpoints via HTTP, DNS, TCP, ICMP.
mysqld_exporter / postgres_exporter / redis_exporter – Database‑specific metrics.

You can find the complete list in the official documentation.

Example metric output

From node_exporter (system)

node_cpu_seconds_total{instance="instance-dev",cpu="0",mode="idle"} 145893.45
node_memory_MemAvailable_bytes{instance="instance-dev"} 4294967296

From cAdvisor (containers)

container_cpu_usage_seconds_total{instance="instance-dev",name="my-app",image="nginx:latest"} 234.56

Custom application metrics

http_requests_total{instance="instance-dev",method="GET",status="200"} 1547
http_request_duration_seconds{instance="instance-dev",endpoint="/api/users"} 0.234

4. Time‑Series Database (TSDB)

The data that Prometheus collects is, by definition, time series: numerical values that change over time, always associated with a timestamp. To store this efficiently, Prometheus uses its own TSDB, optimized for high‑write throughput and fast reads.

Prometheus stores metrics on disk in structures called blocks.

Figure 3 – Prometheus TSDB lifecycle and storage flow

[Insert diagram showing in‑memory buffering, block creation, compression, and retention]

Key characteristics

Append‑only design → high write performance.
Recent data is kept in memory for fast access; periodically flushed to disk.
Each block contains compressed samples, an index for fast lookups, and metadata.

Retention – By default, Prometheus keeps data locally for 15 days. Older blocks are deleted to free up space. While Prometheus was not originally designed as a long‑term storage solution, the retention period can be adjusted, or remote storage integrations (e.g., Thanos, Cortex) can be used for longer retention.

Stay tuned for the next article where we’ll explore remote‑storage solutions and scaling strategies for Prometheus‑based monitoring stacks.

PromQL (Prometheus Query Language)

PromQL is the integrated functional query language for retrieving and analyzing data. It’s through PromQL that you create dashboards in Grafana or define alerts.
The language allows you to select, filter, aggregate, and perform complex mathematical operations on time‑series data.

Simple metric selection (current value)

http_requests_total

Filtering by labels

http_requests_total{instance="instance-dev", status="200"}

Requests per second rate (average over the last 5 minutes)

rate(http_requests_total[5m])

Global sum of request rate, aggregated by instance

sum(rate(http_requests_total[5m])) by (instance)

Available memory percentage calculation

(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

CPU usage (everything that’s not idle)

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

PromQL also includes functions for:

Percentiles: histogram_quantile
Linear predictions: predict_linear
Temporal comparisons: offset

Push Gateway (for short‑lived jobs)

Prometheus uses a pull model, but short‑lived jobs (e.g., a batch backup script) may finish before Prometheus can scrape them.
The Push Gateway acts as an intermediate cache: the job pushes its metrics to the gateway, and Prometheus scrapes the gateway at its regular interval.

Example: Send a metric indicating how long a job took

echo "job_duration_seconds 45.2" | curl --data-binary @- \
  http://pushgateway:9091/metrics/job/batch-job/instance/worker-1

Note: The Push Gateway is intended for very specific use cases. It should not be used to turn Prometheus into a push‑based system. The pull model remains preferable because it:

Allows Prometheus to control load.

Easily detects inactive targets (up/down).

Simplifies service discovery.

Alerting Architecture

It’s common to confuse the responsibilities of the components involved in alerting.

Component	Responsibility
Prometheus Server	Detects problems (evaluates PromQL rules) and fires alert states.
Alertmanager	Receives firing alerts and decides what to do with them (group, inhibit, silence, route).

Alertmanager Features

Grouping – Combine many similar alerts into a single notification.
Inhibition – Suppress less important alerts when a critical one is active.
Silencing – Mute alerts during planned maintenance windows.
Routing – Send alerts to different channels (e.g., PagerDuty, Slack).

Example: Prometheus detection rule

groups:
  - name: example
    rules:
      - alert: HighCPUUsage
        expr: rate(node_cpu_seconds_total{mode!="idle"}[5m]) > 0.8
        for: 5m   # Condition must hold for 5 minutes
        labels:
          severity: warning

Example: Alertmanager routing configuration

route:
  # Default route
  receiver: 'team-slack'
  routes:
    # Specific route for critical alerts
    - match:
        severity: critical
      receiver: 'pagerduty-oncall'

receivers:
  - name: 'team-slack'
    slack_configs:
      - channel: '#alerts-general'
  - name: 'pagerduty-oncall'
    pagerduty_configs:
      - service_key: '...'

Instrumenting Your Own Applications

Besides using exporters for ready‑made systems, the best practice is to instrument your own applications so they natively expose business and performance metrics.

Prometheus offers official client libraries for Go, Java/Scala, Python, and Ruby, plus community‑maintained libraries for .NET, Node.js, Rust, etc. With just a few lines of code, your application can serve a /metrics endpoint.

Python example

from prometheus_client import Counter, Histogram, start_http_server
import time

# 1. Define the metrics
requests_total = Counter(
    'http_requests_total',
    'Total HTTP requests received',
    ['method', 'endpoint', 'status']  # Labels for dimensionality
)

request_duration = Histogram(
    'http_request_duration_seconds',
    'Histogram of request duration',
    ['endpoint']
)

# 2. Use in application code (e.g., using decorators)
@request_duration.labels(endpoint='/api/users').time()
def handle_user_request():
    # Application logic...
    time.sleep(0.1)
    # Increment the counter at the end
    requests_total.labels(method='GET', endpoint='/api/users', status='200').inc()

if __name__ == '__main__':
    # 3. Start an HTTP server to expose the metrics
    start_http_server(8000)
    print("Metrics server running on port 8000...")
    # Main application loop...

The client libraries handle thread‑safety, correct data formatting, and other complexities for you.

Why Prometheus Became the Market Standard

Strength	Description
Pull Model	Facilitates flow control, debugging, and failure detection in targets.
Multidimensional Data	Labels enable incredibly flexible analyses.
PromQL	Query language designed specifically for monitoring data.
Operational Simplicity	Single static binary, easy to deploy, no complex external dependencies.
Service Discovery	Native, dynamic integration with Kubernetes, AWS, Azure, etc.
Open Ecosystem	Hundreds of exporters for almost any technology.

Limitations of Prometheus

Single‑Node Architecture – Not designed for native horizontal scaling; overload requires manual sharding across multiple servers.
Local & Ephemeral Storage – Data lives on the server’s local disk; if the server dies and the disk is lost, the data is gone. There is no native data replication.
Long‑term Retention – Storing data for very long periods can become costly or impractical without external solutions (e.g., remote write to Thanos, Cortex, or Mimir).

Understanding these trade‑offs helps you decide when Prometheus is the right tool and when you might need complementary systems for high‑availability or long‑term storage.

Overview

Term Retention: Storing years of historical data in Prometheus is not efficient.
Fragmented Global View: When you have multiple Kubernetes clusters, each with its own Prometheus instance, you lack a native, unified view of metrics across all clusters.

These architectural limitations motivated the creation of tools that “embrace and extend” Prometheus, such as Thanos, Cortex, and Mimir.

What’s Next?

In upcoming articles I’ll explore how to overcome the limitations above by diving into:

Thanos – Adds long‑term storage (via object storage) and provides a unified global view for Prometheus.
Cortex – The original solution for multi‑tenant, horizontally scalable Prometheus.
Mimir – The evolution of Cortex, focused on massive scale and operational simplicity.