What do you actually check in the first 15 minutes after deploy?
Source: Dev.to
Background
CI passed.
The deploy finished.
Nothing is obviously broken.
And yet, for a few minutes after release, production still feels uncertain. I think this is one of the most awkward parts of shipping software.
A deployment can be technically successful:
- build passes
- tests pass
- pipeline passes
- container starts
- health checks look fine
But real‑runtime problems can still show up only after actual traffic hits the system. That creates a weird gap between deploy success and runtime confidence.
In many smaller teams, the first few minutes after a deploy look something like this:
- Open logs
- Check recent exceptions
- Watch for error spikes
- Compare current noise with what “normal” felt like before
- Decide whether to ignore, investigate, or roll back
We have plenty of tools for detection (exceptions, timeouts, retries, latency spikes, failed external API calls, degraded endpoints), but detection is not the same as judgment. The real post‑deploy question is usually:
Did this deploy actually make things worse?
And then:
Does it need attention right now?
That second layer still feels surprisingly manual.
If you have mature release control, canary rollouts, feature flags, and a strong observability setup, the uncertainty window is probably much smaller. But many teams do not have all of that, and even when they do, someone still has to interpret what production is actually saying after a release.
The key is not just “can we collect signals?” but:
- Which signals matter most right after deploy?
- How do you compare them against normal behavior?
- How do you tell noise from regression?
- What gives you enough confidence to say “this deploy is fine”?
- What makes you stop and investigate immediately?
When I think about the first 10–15 minutes after a deploy, I usually care less about giant dashboards and more about a small number of judgment signals:
- Did new runtime exceptions appear?
- Did existing exception patterns get worse?
- Are failures concentrated on one service or API path?
- Is the error pattern meaningfully different from recent baseline behavior?
- Does this look transient, or does it look deploy‑related?
That feels like a different problem from general monitoring; it’s closer to post‑deploy runtime diagnosis.
My Approach
This line of thinking led me to start building Relivio. The idea is narrow: not a full observability platform, just a focused way to answer “Is this deploy safe, or does it need attention?”
- Minimal FastAPI demo:
- Main project site:
Discussion Questions
- What do you actually check in the first 10–15 minutes after a deploy?
- Do you rely mostly on logs, alerts, dashboards, release views, or something else?
- What signal makes you think “this deploy is probably bad”?
- What signal makes you confident enough to leave it alone?
- If you already have a strong workflow for this, what does it look like?
- If you do not, what part still feels manual or annoying?
I’m especially interested in answers from small teams and side projects, because that is where this still feels the most human and least automated. If you think this is already solved well by your current stack, I’d like to hear that too. And if you think the problem isn’t painful enough to deserve a dedicated tool, I’d genuinely like to know why.