나는 사고 조사용 AI 도구를 만들었습니다 (솔직한 피드백을 원합니다)
Source: Dev.to
Introduction
Hey everyone 👋
Over the past couple of weeks, I’ve been building a side project called Opsrift.
It started from a simple frustration: postmortems, handovers, and incident documentation take way too much time — and most of it is repetitive. While building it, I realized the real problem isn’t just writing postmortems; it’s understanding what actually happened during an incident.
What Opsrift does right now
The platform focuses on incident workflows, primarily for SRE, support, or operations teams. Current features include:
Postmortem generator
Takes incident data and generates structured postmortems in seconds.
Handover generator
Useful for shift‑based teams — turns messy updates into clean handovers.
Runbook generator
Creates structured runbooks based on incident patterns or inputs.
Incident Investigator (main focus)
- Pulls data from tools like Jira, PagerDuty, and Opsgenie
- Correlates it with deployments from GitHub
- Attempts to reconstruct what actually happened (timeline, possible causes, etc.)
The goal is to reduce the time spent jumping between tools during investigations.
Status page
Basic external communication for incidents.
Integrations
Current integrations (still early; some are rough):
- Jira
- PagerDuty
- Opsgenie
- GitHub
- Slack
- Confluence
What it’s NOT (yet)
- Not a replacement for your incident‑management tools
- Not perfect at root‑cause analysis
- Not “production‑grade” in every edge case
Right now it’s closer to an AI layer on top of your existing tools to speed up investigation and documentation.
Known issues
- GitHub login ❌ (bugged)
- Slack login ❌ (bugged)
You can still use:
- Google login
- Email/password signup
Fixes are in progress.
What I’m trying to figure out
I’d really appreciate help validating a few things:
- Does the Incident Investigator actually help, or is it just “nice to have”?
- Are the outputs accurate enough to be trusted?
- Would you use something like this in real workflows?
- What’s missing for it to be genuinely useful?
Where I want to take this
Long‑term ideas include moving beyond generating outputs to:
- Detecting patterns across incidents
- Identifying unstable services
- Highlighting teams with high escalation rates
- Correlating deployments with incidents automatically
In short: turning incident data into actionable insights.
If you want to try it
👉
No pressure — even quick feedback is super helpful.
Final note
I’ve worked in NOC/SOC and incident‑heavy environments, so this is a “scratch my own itch” project. I’m aware tools like this can become:
- Too generic
- Inaccurate
- Just another dashboard nobody uses
I’d rather get honest feedback early, even if it’s:
“this doesn’t solve anything for me”
That’s useful.
Thanks in advance 🙌