How I structured logs around Hindsight
Source: Dev.to
“Why did it reject a perfect resume?”
I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.
job sense ai
“Why did it reject a perfect resume?” I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.
What I actually built
This project is a job‑matching agent that reads resumes, scores candidates, and ranks them against job descriptions. Nothing fancy on the surface:
resume → extract features → score → return top candidatesThe interesting part is that the scoring logic isn’t fixed.
I wired it up with the Hindsight GitHub repository so the agent could learn from feedback—things like:
- “This candidate should have been ranked higher”
- “This profile is irrelevant despite keyword match”
Instead of retraining a model, I let the agent adapt its behavior by replaying past decisions and corrections.
How the system is structured
At a high level, the code splits into three parts:
- Resume ingestion + parsing
- Scoring pipeline
- Hindsight‑backed memory + feedback loop
Scoring pipeline (simplified)
def score_candidate(resume, job_description):
features = extract_features(resume, job_description)
base_score = weighted_score(features)
adjustments = hindsight_adjustments(resume, job_description)
return base_score + adjustmentsThe key is that hindsight_adjustments isn’t static; it’s derived from past feedback stored and replayed through Hindsight.
Feedback events
event = {
"resume_id": resume.id,
"job_id": job.id,
"original_score": score,
"feedback": "should_rank_higher",
"timestamp": now()
}These events are indexed and replayed later when similar candidates appear.
If you’ve read the Hindsight documentation, this is basically using event replay as a lightweight learning layer instead of retraining.
The bug that made this interesting
Everything seemed fine until I noticed something weird:
A strong candidate—clean experience, perfect keyword match—was consistently ranked low.
My first guesses:
- parsing bug?
- feature‑extraction issue?
- bad weights?
Nope. It was Hindsight.
What actually happened
- A recruiter had earlier marked a similar resume as “not relevant”.
- That feedback got stored and replayed.
- The similarity match was too broad, so the new candidate inherited a negative adjustment.
The score dropped silently, and no logs screamed “this is wrong.” It just looked like the system “decided” differently.
Debugging the feedback loop
I added explicit logging to see how Hindsight was influencing decisions:
def hindsight_adjustments(resume, job):
events = hindsight.retrieve_similar(resume, job)
for e in events:
print("Replaying event:", e)
return aggregate_adjustments(events)That’s when it clicked:
The system wasn’t wrong; it was too eager to generalize.
The feedback loop had effectively created a soft rule:
“Candidates like this are bad”
…based on a single data point.
Fixing it without killing learning
I didn’t want to remove Hindsight—it’s the whole point. Instead, I constrained it.
1. Tightened similarity matching
if similarity_score “This candidate looks good on paper but lacks real project depth.”
2. Stored as an event.
3. Later, a similar resume arrives (same keywords, similar experience).
4. System retrieves past feedback, applies a **small negative adjustment**, and slightly lowers the rank.
5. After multiple similar feedback events, the agent implicitly learns:
> “Keyword match isn’t enough—depth matters.”
No retraining. Just accumulated corrections.
---
## What I learned
- **Feedback loops are brittle by default** – one bad signal can poison future decisions if you don’t gate it.
- **Similarity is everything** – loose retrieval = noisy learning. Tightening similarity improved behavior more than any model tweak.
- **Logging matters more than modeling** – I didn’t change the scoring model much; I just made Hindsight visible.
- **Local context beats global memory** – scoping feedback to `job_role + skill_cluster` made the system far more stable.
- **“Learning” is just controlled bias accumulation** – Hindsight doesn’t magically learn; it accumulates past decisions. Your job is to control how that bias spreads.
---
## Would I do this again?
**Yes—but with guardrails from day one.**
Hindsight is powerful, but it will happily amplify your mistakes if you let it.
Treat it like:
- **A suggestion system (not ground truth)**
- **A contextual memory (not global truth)**
and you’ll reap the benefits without the surprise side‑effects.
```markdown
becomes a practical way to make agents adapt without retraining.
Otherwise, you’ll end up debugging why your system rejected a perfect resume—and realizing it was your own feedback loop all along.