How I built an AI tool that diagnoses CI/CD pipeline failures in seconds

Published: (February 27, 2026 at 11:03 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

What it does

When a CI/CD pipeline fails, PipelineIQ automatically:

  • Captures the error logs
  • Sends them to Claude AI for analysis
  • Delivers a Slack alert with the exact root cause and fix steps — within seconds

Example Slack alert

🔴 Pipeline Failure: Stripe API connection timeout blocking payment webhooks

AI Diagnosis: The deployment is failing because the application cannot establish a connection to Stripe's API within the 30‑second timeout limit. This is preventing payment webhook processing.

Recommended Fix: Check STRIPE_SECRET_KEY and STRIPE_PUBLISHABLE_KEY in production environment variables. Test connectivity to api.stripe.com from your deployment environment. Increase API timeout from 30s to 60s.

No log diving. No guessing. Specific, actionable steps.

The stack

  • FastAPI – Python backend with async support
  • Supabase – PostgreSQL database with Row Level Security
  • Anthropic Claude API – AI diagnosis engine
  • Slack API – Rich block‑based alerts
  • Railway – Production deployment
  • GitHub Actions – Integration via one workflow step

How the integration works

Add a single step to any existing GitHub Actions workflow:

- name: Notify PipelineIQ
  if: always()
  run: |
    curl -X POST $PIPELINEIQ_URL/api/v1/pipelines/runs \
      -H "X-PipelineIQ-Key: $PIPELINEIQ_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "repo_full_name": "${{ github.repository }}",
        "branch": "${{ github.ref_name }}",
        "commit_sha": "${{ github.sha }}",
        "commit_message": "${{ github.event.head_commit.message }}",
        "workflow_name": "${{ github.workflow }}",
        "status": "${{ job.status }}",
        "started_at": "${{ github.event.head_commit.timestamp }}"
      }'

Every run—success or failure—is stored. Failures automatically trigger AI diagnosis.

The AI diagnosis engine

A FastAPI background task fires Claude when a failure is stored:

async def run_ai_diagnosis(run: dict, org_id: str, supabase: Client):
    insight = await diagnose_from_run(run)
    if not insight:
        return

    supabase.table("insights").insert({
        "severity": insight.get("severity"),
        "title": insight.get("title"),
        "diagnosis": insight.get("diagnosis"),
        "recommendation": insight.get("recommendation"),
        "estimated_time_save_minutes": insight.get("estimated_time_save_minutes"),
        "confidence": insight.get("confidence"),
    }).execute()

    await send_pipeline_alert(insight, run)

Claude returns structured JSON with severity, diagnosis, recommendation, confidence score, and estimated time saved. The whole process runs in under 5 seconds.

What’s next

  • Web dashboard with pipeline health across all repos
  • DORA metrics (deployment frequency, change failure rate, recovery time)
  • Environment drift detection
  • Industry benchmarks — how does your team compare?

Try it free

PipelineIQ is in free beta. I’m looking for engineering teams to try it and give honest feedback on what’s missing.

Happy to answer questions in the comments—especially from DevOps engineers who deal with pipeline failures daily.

0 views
Back to Blog

Related posts

Read more »

Google Gemini Writing Challenge

What I Built - Where Gemini fit in - Used Gemini’s multimodal capabilities to let users upload screenshots of notes, diagrams, or code snippets. - Gemini gener...