How I built an AI tool that diagnoses CI/CD pipeline failures in seconds

Published: 3 days ago (February 27, 2026 at 11:03 PM EST)

2 min read

Source: Dev.to

What it does

When a CI/CD pipeline fails, PipelineIQ automatically:

Captures the error logs
Sends them to Claude AI for analysis
Delivers a Slack alert with the exact root cause and fix steps — within seconds

Example Slack alert

🔴 Pipeline Failure: Stripe API connection timeout blocking payment webhooks

AI Diagnosis: The deployment is failing because the application cannot establish a connection to Stripe's API within the 30‑second timeout limit. This is preventing payment webhook processing.

Recommended Fix: Check STRIPE_SECRET_KEY and STRIPE_PUBLISHABLE_KEY in production environment variables. Test connectivity to api.stripe.com from your deployment environment. Increase API timeout from 30s to 60s.

No log diving. No guessing. Specific, actionable steps.

The stack

FastAPI – Python backend with async support
Supabase – PostgreSQL database with Row Level Security
Anthropic Claude API – AI diagnosis engine
Slack API – Rich block‑based alerts
Railway – Production deployment
GitHub Actions – Integration via one workflow step

How the integration works

Add a single step to any existing GitHub Actions workflow:

- name: Notify PipelineIQ
  if: always()
  run: |
    curl -X POST $PIPELINEIQ_URL/api/v1/pipelines/runs \
      -H "X-PipelineIQ-Key: $PIPELINEIQ_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "repo_full_name": "${{ github.repository }}",
        "branch": "${{ github.ref_name }}",
        "commit_sha": "${{ github.sha }}",
        "commit_message": "${{ github.event.head_commit.message }}",
        "workflow_name": "${{ github.workflow }}",
        "status": "${{ job.status }}",
        "started_at": "${{ github.event.head_commit.timestamp }}"
      }'

Every run—success or failure—is stored. Failures automatically trigger AI diagnosis.

The AI diagnosis engine

A FastAPI background task fires Claude when a failure is stored:

async def run_ai_diagnosis(run: dict, org_id: str, supabase: Client):
    insight = await diagnose_from_run(run)
    if not insight:
        return

    supabase.table("insights").insert({
        "severity": insight.get("severity"),
        "title": insight.get("title"),
        "diagnosis": insight.get("diagnosis"),
        "recommendation": insight.get("recommendation"),
        "estimated_time_save_minutes": insight.get("estimated_time_save_minutes"),
        "confidence": insight.get("confidence"),
    }).execute()

    await send_pipeline_alert(insight, run)

Claude returns structured JSON with severity, diagnosis, recommendation, confidence score, and estimated time saved. The whole process runs in under 5 seconds.

What’s next

Web dashboard with pipeline health across all repos
DORA metrics (deployment frequency, change failure rate, recovery time)
Environment drift detection
Industry benchmarks — how does your team compare?

Try it free

PipelineIQ is in free beta. I’m looking for engineering teams to try it and give honest feedback on what’s missing.

Website: https://pipelineiq.dev
API docs: https://pipelineiq-production-3496.up.railway.app/docs

Happy to answer questions in the comments—especially from DevOps engineers who deal with pipeline failures daily.

How I built an AI tool that diagnoses CI/CD pipeline failures in seconds

What it does

The stack

How the integration works

The AI diagnosis engine

What’s next

Try it free

Related posts

Shared Workflows: minha experiência definindo pipelines reutilizáveis

Building a Local-First Financial IDE: How I forced Gemini AI to do strict Double-Entry Accounting

I ran cursor-doctor on 50 real projects. Here's what broke.

Google Gemini Writing Challenge