How I structured logs around Hindsight

Published: 1 month ago (March 23, 2026 at 01:24 PM EDT)

4 min read

Source: Dev.to

Source: Dev.to

“Why did it reject a perfect resume?”

I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.

job sense ai

“Why did it reject a perfect resume?” I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.

What I actually built

This project is a job‑matching agent that reads resumes, scores candidates, and ranks them against job descriptions. Nothing fancy on the surface:

resume → extract features → score → return top candidates

The interesting part is that the scoring logic isn’t fixed.

I wired it up with the Hindsight GitHub repository so the agent could learn from feedback—things like:

“This candidate should have been ranked higher”
“This profile is irrelevant despite keyword match”

Instead of retraining a model, I let the agent adapt its behavior by replaying past decisions and corrections.

How the system is structured

At a high level, the code splits into three parts:

Resume ingestion + parsing
Scoring pipeline
Hindsight‑backed memory + feedback loop

Scoring pipeline (simplified)

def score_candidate(resume, job_description):
    features = extract_features(resume, job_description)
    base_score = weighted_score(features)
    adjustments = hindsight_adjustments(resume, job_description)
    return base_score + adjustments

The key is that hindsight_adjustments isn’t static; it’s derived from past feedback stored and replayed through Hindsight.

Feedback events

event = {
    "resume_id": resume.id,
    "job_id": job.id,
    "original_score": score,
    "feedback": "should_rank_higher",
    "timestamp": now()
}

These events are indexed and replayed later when similar candidates appear.

If you’ve read the Hindsight documentation, this is basically using event replay as a lightweight learning layer instead of retraining.

The bug that made this interesting

Everything seemed fine until I noticed something weird:

A strong candidate—clean experience, perfect keyword match—was consistently ranked low.

My first guesses:

parsing bug?
feature‑extraction issue?
bad weights?

Nope. It was Hindsight.

What actually happened

A recruiter had earlier marked a similar resume as “not relevant”.
That feedback got stored and replayed.
The similarity match was too broad, so the new candidate inherited a negative adjustment.

The score dropped silently, and no logs screamed “this is wrong.” It just looked like the system “decided” differently.

Debugging the feedback loop

I added explicit logging to see how Hindsight was influencing decisions:

def hindsight_adjustments(resume, job):
    events = hindsight.retrieve_similar(resume, job)
    for e in events:
        print("Replaying event:", e)
    return aggregate_adjustments(events)

That’s when it clicked:

The system wasn’t wrong; it was too eager to generalize.
The feedback loop had effectively created a soft rule:
“Candidates like this are bad”
…based on a single data point.

Fixing it without killing learning

I didn’t want to remove Hindsight—it’s the whole point. Instead, I constrained it.

1. Tightened similarity matching

if similarity_score  “This candidate looks good on paper but lacks real project depth.”  

2. Stored as an event.  

3. Later, a similar resume arrives (same keywords, similar experience).  

4. System retrieves past feedback, applies a **small negative adjustment**, and slightly lowers the rank.  

5. After multiple similar feedback events, the agent implicitly learns:  

   > “Keyword match isn’t enough—depth matters.”  

No retraining. Just accumulated corrections.

---

## What I learned  

- **Feedback loops are brittle by default** – one bad signal can poison future decisions if you don’t gate it.  
- **Similarity is everything** – loose retrieval = noisy learning. Tightening similarity improved behavior more than any model tweak.  
- **Logging matters more than modeling** – I didn’t change the scoring model much; I just made Hindsight visible.  
- **Local context beats global memory** – scoping feedback to `job_role + skill_cluster` made the system far more stable.  
- **“Learning” is just controlled bias accumulation** – Hindsight doesn’t magically learn; it accumulates past decisions. Your job is to control how that bias spreads.

---

## Would I do this again?  

**Yes—but with guardrails from day one.**  

Hindsight is powerful, but it will happily amplify your mistakes if you let it.

Treat it like:

- **A suggestion system (not ground truth)**  
- **A contextual memory (not global truth)**  

and you’ll reap the benefits without the surprise side‑effects.

```markdown
becomes a practical way to make agents adapt without retraining.

Otherwise, you’ll end up debugging why your system rejected a perfect resume—and realizing it was your own feedback loop all along.