How I structured logs around Hindsight

Published: (March 23, 2026 at 01:24 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

“Why did it reject a perfect resume?”

I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.


job sense ai

“Why did it reject a perfect resume?” I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.


What I actually built

This project is a job‑matching agent that reads resumes, scores candidates, and ranks them against job descriptions. Nothing fancy on the surface:

resume → extract features → score → return top candidates

The interesting part is that the scoring logic isn’t fixed.

I wired it up with the Hindsight GitHub repository so the agent could learn from feedback—things like:

  • “This candidate should have been ranked higher”
  • “This profile is irrelevant despite keyword match”

Instead of retraining a model, I let the agent adapt its behavior by replaying past decisions and corrections.


How the system is structured

At a high level, the code splits into three parts:

  1. Resume ingestion + parsing
  2. Scoring pipeline
  3. Hindsight‑backed memory + feedback loop

Scoring pipeline (simplified)

def score_candidate(resume, job_description):
    features = extract_features(resume, job_description)
    base_score = weighted_score(features)
    adjustments = hindsight_adjustments(resume, job_description)
    return base_score + adjustments

The key is that hindsight_adjustments isn’t static; it’s derived from past feedback stored and replayed through Hindsight.

Feedback events

event = {
    "resume_id": resume.id,
    "job_id": job.id,
    "original_score": score,
    "feedback": "should_rank_higher",
    "timestamp": now()
}

These events are indexed and replayed later when similar candidates appear.

If you’ve read the Hindsight documentation, this is basically using event replay as a lightweight learning layer instead of retraining.


The bug that made this interesting

Everything seemed fine until I noticed something weird:

A strong candidate—clean experience, perfect keyword match—was consistently ranked low.

My first guesses:

  • parsing bug?
  • feature‑extraction issue?
  • bad weights?

Nope. It was Hindsight.

What actually happened

  1. A recruiter had earlier marked a similar resume as “not relevant”.
  2. That feedback got stored and replayed.
  3. The similarity match was too broad, so the new candidate inherited a negative adjustment.

The score dropped silently, and no logs screamed “this is wrong.” It just looked like the system “decided” differently.


Debugging the feedback loop

I added explicit logging to see how Hindsight was influencing decisions:

def hindsight_adjustments(resume, job):
    events = hindsight.retrieve_similar(resume, job)
    for e in events:
        print("Replaying event:", e)
    return aggregate_adjustments(events)

That’s when it clicked:

  • The system wasn’t wrong; it was too eager to generalize.

  • The feedback loop had effectively created a soft rule:

    “Candidates like this are bad”

    …based on a single data point.


Fixing it without killing learning

I didn’t want to remove Hindsight—it’s the whole point. Instead, I constrained it.

1. Tightened similarity matching

if similarity_score  “This candidate looks good on paper but lacks real project depth.”  

2. Stored as an event.  

3. Later, a similar resume arrives (same keywords, similar experience).  

4. System retrieves past feedback, applies a **small negative adjustment**, and slightly lowers the rank.  

5. After multiple similar feedback events, the agent implicitly learns:  

   > “Keyword match isn’t enough—depth matters.”  

No retraining. Just accumulated corrections.

---

## What I learned  

- **Feedback loops are brittle by default** – one bad signal can poison future decisions if you don’t gate it.  
- **Similarity is everything** – loose retrieval = noisy learning. Tightening similarity improved behavior more than any model tweak.  
- **Logging matters more than modeling** – I didn’t change the scoring model much; I just made Hindsight visible.  
- **Local context beats global memory** – scoping feedback to `job_role + skill_cluster` made the system far more stable.  
- **“Learning” is just controlled bias accumulation** – Hindsight doesn’t magically learn; it accumulates past decisions. Your job is to control how that bias spreads.

---

## Would I do this again?  

**Yes—but with guardrails from day one.**  

Hindsight is powerful, but it will happily amplify your mistakes if you let it.

Treat it like:

- **A suggestion system (not ground truth)**  
- **A contextual memory (not global truth)**  

and you’ll reap the benefits without the surprise side‑effects.

```markdown
becomes a practical way to make agents adapt without retraining.

Otherwise, you’ll end up debugging why your system rejected a perfect resume—and realizing it was your own feedback loop all along.
0 views
Back to Blog

Related posts

Read more »

No, Windows Start does not use React

Mar 23, 2026 — Pat Hartl Windows is in the news again. This time Microsoft has put out a standard corporate Our commitment to Windows qualityhttps://blogs.windo...