I Built a Multi-Agent Job Search System with Claude Code — 631 Evaluations, 12 Modes

Published: (March 17, 2026 at 06:07 AM EDT)
4 min read
Source: Dev.to

Source: Dev.to

What I Built

A multi‑agent system with 12 operational modes, each a Claude Code skill file with its own context and rules. Not a single script – an agent that reasons about the problem domain.

Key architectural choice: modes over one long prompt.

career-ops/
├── modes/
│   ├── _shared.md          # North Star archetypes, proof points
│   ├── auto-pipeline.md    # Full pipeline: JD → eval → PDF → tracker
│   ├── oferta.md           # Single‑offer evaluation (A‑F)
│   ├── batch.md            # Parallel processing with workers
│   ├── pdf.md              # ATS‑optimized CV per offer
│   ├── scan.md             # Portal discovery
│   ├── apply.md            # Playwright form‑filling
│   └── ... (12 total)
├── reports/                # 631 evaluation files
├── output/                 # Generated PDFs
├── applications.md         # Central tracker
└── scan-history.tsv        # 680 deduplicated URLs

Why modes? Each mode loads only the context it needs. auto‑pipeline skips contact rules, apply skips scoring logic. Less context → better decisions from the LLM.

The 10‑Dimension Scoring

Every offer runs through a weighted evaluation framework.

DimensionWhat It MeasuresWeight
Role MatchAlignment with CV proof pointsGate‑pass
Skills AlignmentTech stack overlapGate‑pass
SeniorityStretch levelHigh
CompensationMarket rate vs targetHigh
GeographicRemote/hybrid feasibilityMedium
Company StageStartup/growth/enterprise fitMedium
Product‑Market FitProblem domain resonanceMedium
Growth TrajectoryCareer ladder visibilityMedium
Interview LikelihoodCallback probabilityHigh
TimelineHiring urgencyLow

Role Match and Skills Alignment are gate‑pass dimensions – if they fail, the final score drops regardless of the other scores. 74 % of evaluated offers scored below 4.0.

The Pipeline

auto‑pipeline is the flagship mode. A URL goes in, and out comes:

  1. Extract JD – Playwright navigates to the URL, extracts structured content.
  2. Evaluate 10D – Claude reads JD + CV + portfolio, generates scoring.
  3. Generate report – Markdown with six blocks: summary, CV match, level, compensation, personalization, interview probability.
  4. Generate PDF – HTML template + keyword injection + Puppeteer render.
  5. Register tracker – TSV auto‑merge via a Node.js script.
  6. Dedup – Checks 680 URLs in scan-history.tsv. Zero re‑evaluations.

Batch Processing

For high volume, batch mode launches a conductor that orchestrates parallel workers.

# conductor spawns N workers, each an independent Claude Code process
./batch-runner.sh --input batch/batch-input.tsv --workers 4

# Each worker:
# 1. Claims a URL from the queue (lock file prevents doubles)
# 2. Runs auto-pipeline
# 3. Writes result to batch-state.tsv
# 4. Picks next URL
  • 122 URLs processed in parallel.
  • Fault‑tolerant: a worker failure never blocks the rest.
  • Resumable: reads state and skips completed items.

The AI Resume Builder

A generic PDF loses. Career‑Ops generates a different ATS‑optimized CV for each offer:

  • Extract 15‑20 keywords from the JD.
  • Detect language (e.g., English JD → English CV).
  • Detect region (US → Letter, Europe → A4).
  • Detect archetype (6 predefined: AI Platform, Agentic, PM, SA, FDE, Transformation).
  • Select top 3‑4 projects by relevance.
  • Reorder bullets – most relevant experience moves up.
  • Render PDF – Puppeteer, self‑hosted fonts, single‑column ATS‑safe.

Same CV, six different framings. All real – keywords are reformulated, never fabricated.

Results

Two months in production (real numbers, not demos):

  • 631 reports generated
  • 68 applications sent
  • 354 PDFs generated
  • 680 URLs deduplicated
  • 0 re‑evaluations

What I Learned

  • Automate analysis, not decisions. Career‑Ops evaluates 631 offers; I decide which ones get my time. Human‑in‑the‑loop is a design feature, not a limitation.
  • Modes beat a long prompt. Twelve focused modes outperform a single 10 k‑token system prompt. My early attempt with one massive prompt produced terrible quality.
  • Deduplication is more valuable than scoring. 680 deduplicated URLs saved 680 unnecessary evaluations – boring infrastructure with the highest ROI.
  • A CV is an argument, not a document. Tailoring proof points and framing to the archetype converts far better than a one‑size‑fits‑all PDF.
  • The system is the portfolio. Building a multi‑agent job‑search system is a direct proof of competence for multi‑agent roles.

Stack

  • Claude Code – LLM agent: reasoning, evaluation, content generation
  • Playwright – Browser automation: portal scanning and form‑filling
  • Puppeteer – PDF rendering from HTML templates
  • Node.js – Utility scripts: merge‑tracker, cv‑sync‑check
  • tmux – Parallel sessions: conductor + workers in batch

Full case study

https://santifer.io/career-ops-system

0 views
Back to Blog

Related posts

Read more »