Observability Practices: Implementing Real-World Monitoring With Python and Prometheus
Source: Dev.to
Introduction
Modern applications don’t just need to run — they need to be understood. When something goes wrong in production, teams must be able to detect issues, diagnose the root cause, and monitor the system’s behavior in real time. This is where observability becomes essential.
What Is Observability?
Observability is the ability to understand the internal state of a system based on the data it produces. It is built around three core pillars:
1. Metrics
Numeric values that reflect system state.
Examples: request latency, CPU usage, memory consumption.
2. Logs
Detailed event records generated by applications and systems.
Examples: authentication messages, errors, warnings.
3. Traces
End‑to‑end tracking of requests across services.
Useful in microservices and distributed systems.
Together, these help answer:
- What is happening?
- Why is it happening?
- Where is it failing?
Why Observability Matters
- Detect issues earlier
- Reduce downtime
- Improve performance
- Understand user impact
- Monitor applications at scale
- Make data‑driven decisions
Without observability, debugging becomes slow, reactive, and inconsistent.
Real-World Example: Observability With Python + Prometheus
1. Install Dependencies
pip install fastapi uvicorn prometheus-client
2. Python API With Prometheus Metrics
from fastapi import FastAPI
from prometheus_client import Counter, Histogram, generate_latest
from fastapi.responses import Response
import time
import random
app = FastAPI()
REQUEST_COUNT = Counter("api_requests_total", "Total number of API requests received")
REQUEST_LATENCY = Histogram("api_request_latency_seconds", "API request latency")
@app.get("/")
def home():
REQUEST_COUNT.inc()
with REQUEST_LATENCY.time():
time.sleep(random.uniform(0.1, 0.5))
return {"message": "API is running successfully"}
@app.get("/metrics")
def metrics():
return Response(generate_latest(), media_type="text/plain")
Metrics exposed
| Metric | Description |
|---|---|
api_requests_total | Counts all incoming requests |
api_request_latency_seconds | Measures request duration (seconds) |
3. Prometheus Configuration
Create prometheus.yml:
global:
scrape_interval: 5s
scrape_configs:
- job_name: "python-api"
static_configs:
- targets: ["localhost:8000"]
Prometheus will scrape the metrics endpoint at the configured target.
4. Run Prometheus
./prometheus --config.file=prometheus.yml
Open the Prometheus UI and query metrics such as:
api_requests_totalrate(api_requests_total[1m])api_request_latency_seconds_bucket
5. Optional: Grafana Dashboard
Grafana can visualize your Prometheus metrics. Typical graphs include:
- Request rate
- CPU and memory usage
- Error percentage
- Latency percentiles (p95, p99)
Observability Best Practices
- ✔ Instrument every major endpoint – expose metrics for performance‑critical APIs.
- ✔ Standardize metric names – avoid random or unstructured naming.
- ✔ Include labels (tags) – e.g.,
status_code,endpoint,methodfor richer context. - ✔ Use alerts – e.g., “95th percentile latency exceeds 500 ms for 3 minutes.”
- ✔ Visualize everything – dashboards make patterns obvious.
- ✔ Combine logs, metrics, and traces – observability works best when all three pillars are present.
Conclusion
Observability allows teams to deeply understand how their systems behave. Using Prometheus + FastAPI, you can expose useful metrics that support:
- Faster debugging
- Better performance insights
- Safer deployments
- Scalable system monitoring
The example can be expanded with tracing (OpenTelemetry), log pipelines (ELK Stack), or full‑stack observability platforms such as AWS CloudWatch, Datadog, or Azure Monitor.
References
- Prometheus Documentation –
- Grafana Documentation –
- FastAPI –
- OpenTelemetry –