I Built a Multi-Agent Job Search System with Claude Code — 631 Evaluations, 12 Modes

Published: 1 month ago (March 17, 2026 at 06:07 AM EDT)

4 min read

Source: Dev.to

Source: Dev.to

What I Built

A multi‑agent system with 12 operational modes, each a Claude Code skill file with its own context and rules. Not a single script – an agent that reasons about the problem domain.

Key architectural choice: modes over one long prompt.

career-ops/
├── modes/
│   ├── _shared.md          # North Star archetypes, proof points
│   ├── auto-pipeline.md    # Full pipeline: JD → eval → PDF → tracker
│   ├── oferta.md           # Single‑offer evaluation (A‑F)
│   ├── batch.md            # Parallel processing with workers
│   ├── pdf.md              # ATS‑optimized CV per offer
│   ├── scan.md             # Portal discovery
│   ├── apply.md            # Playwright form‑filling
│   └── ... (12 total)
├── reports/                # 631 evaluation files
├── output/                 # Generated PDFs
├── applications.md         # Central tracker
└── scan-history.tsv        # 680 deduplicated URLs

Why modes? Each mode loads only the context it needs. auto‑pipeline skips contact rules, apply skips scoring logic. Less context → better decisions from the LLM.

The 10‑Dimension Scoring

Every offer runs through a weighted evaluation framework.

Dimension	What It Measures	Weight
Role Match	Alignment with CV proof points	Gate‑pass
Skills Alignment	Tech stack overlap	Gate‑pass
Seniority	Stretch level	High
Compensation	Market rate vs target	High
Geographic	Remote/hybrid feasibility	Medium
Company Stage	Startup/growth/enterprise fit	Medium
Product‑Market Fit	Problem domain resonance	Medium
Growth Trajectory	Career ladder visibility	Medium
Interview Likelihood	Callback probability	High
Timeline	Hiring urgency	Low

Role Match and Skills Alignment are gate‑pass dimensions – if they fail, the final score drops regardless of the other scores. 74 % of evaluated offers scored below 4.0.

The Pipeline

auto‑pipeline is the flagship mode. A URL goes in, and out comes:

Extract JD – Playwright navigates to the URL, extracts structured content.
Evaluate 10D – Claude reads JD + CV + portfolio, generates scoring.
Generate report – Markdown with six blocks: summary, CV match, level, compensation, personalization, interview probability.
Generate PDF – HTML template + keyword injection + Puppeteer render.
Register tracker – TSV auto‑merge via a Node.js script.
Dedup – Checks 680 URLs in scan-history.tsv. Zero re‑evaluations.

Batch Processing

For high volume, batch mode launches a conductor that orchestrates parallel workers.

# conductor spawns N workers, each an independent Claude Code process
./batch-runner.sh --input batch/batch-input.tsv --workers 4

# Each worker:
# 1. Claims a URL from the queue (lock file prevents doubles)
# 2. Runs auto-pipeline
# 3. Writes result to batch-state.tsv
# 4. Picks next URL

122 URLs processed in parallel.
Fault‑tolerant: a worker failure never blocks the rest.
Resumable: reads state and skips completed items.

The AI Resume Builder

A generic PDF loses. Career‑Ops generates a different ATS‑optimized CV for each offer:

Extract 15‑20 keywords from the JD.
Detect language (e.g., English JD → English CV).
Detect region (US → Letter, Europe → A4).
Detect archetype (6 predefined: AI Platform, Agentic, PM, SA, FDE, Transformation).
Select top 3‑4 projects by relevance.
Reorder bullets – most relevant experience moves up.
Render PDF – Puppeteer, self‑hosted fonts, single‑column ATS‑safe.

Same CV, six different framings. All real – keywords are reformulated, never fabricated.

Results

Two months in production (real numbers, not demos):

631 reports generated
68 applications sent
354 PDFs generated
680 URLs deduplicated
0 re‑evaluations

What I Learned

Automate analysis, not decisions. Career‑Ops evaluates 631 offers; I decide which ones get my time. Human‑in‑the‑loop is a design feature, not a limitation.
Modes beat a long prompt. Twelve focused modes outperform a single 10 k‑token system prompt. My early attempt with one massive prompt produced terrible quality.
Deduplication is more valuable than scoring. 680 deduplicated URLs saved 680 unnecessary evaluations – boring infrastructure with the highest ROI.
A CV is an argument, not a document. Tailoring proof points and framing to the archetype converts far better than a one‑size‑fits‑all PDF.
The system is the portfolio. Building a multi‑agent job‑search system is a direct proof of competence for multi‑agent roles.

Stack

Claude Code – LLM agent: reasoning, evaluation, content generation
Playwright – Browser automation: portal scanning and form‑filling
Puppeteer – PDF rendering from HTML templates
Node.js – Utility scripts: merge‑tracker, cv‑sync‑check
tmux – Parallel sessions: conductor + workers in batch

Full case study

https://santifer.io/career-ops-system

I Built a Multi-Agent Job Search System with Claude Code — 631 Evaluations, 12 Modes

What I Built

The 10‑Dimension Scoring

The Pipeline

Batch Processing

The AI Resume Builder

Results

What I Learned

Stack

Full case study

Related posts

Your Pipeline Is 21.5h Behind: Catching Startups Sentiment Leads with Pulsebit

The Claude Code CVE That Should Change How You Review AI-Generated Code

Are Banking Apps Safe? Why Yes, But Your Habits Matter More

45,000 Layoffs in March. Companies Blamed AI. The Numbers Say Otherwise.