Agentic AI and Search Agents
Source: Dev.to
Agentic AI – Reflections on Two Papers
Notes from my three‑day deep‑dive into the MDPI review and the arXiv pre‑print 2508.05668. I used Notebook LM to generate a quick summary before reading; I’ll comment on that at the end.
1. Why “Agentic” AI Is Different
In our AI class we’ve spent a lot of time talking about agents – the sense‑think‑act loop, how they perceive, form opinions, and take actions.
Standard LLMs (the “bones” we discussed in class) are reactive:
Input → Output → Done.
Agentic AI goes further:
- The system can pursue a goal over multiple steps without constant human guidance.
- It sets sub‑goals, selects tools, executes actions, evaluates results, and adjusts its plan.
- All of this happens with minimal human involvement.
The idea sounds simple, but the actual implementation is far more complex – that’s what both papers explore.
2. The MDPI Review (Big‑Picture Overview)
The MDPI article is a broad review that covers:
- Delineations – definitions and scope.
- Frameworks & Infrastructures – architectural models.
- Operations & Challenges – practical considerations.
2.1 Architectural Models that Stood Out
| Model | Core Idea | When It Shines |
|---|---|---|
| ReAct | Cycle: Reason → Act → Observe → Reason again. | Simple, fast tasks; mirrors the introductory sense‑think‑act cycle we covered. |
| Hierarchical / Supervisor | A top‑level agent breaks a problem into sub‑tasks and delegates to specialized sub‑agents. | E.g., “write a request‑exploration report”: the supervisor hands web‑search to one agent, data analysis to another, writing to a third. This maps directly onto the multi‑agent systems we discussed. |
| BDI (Belief‑Desire‑Intention) | Tracks an agent’s beliefs (what it thinks is true), desires (what it wants), and intentions (what it is currently committed to). | Provides a clear trace for responsibility and transparency, key themes in AI ethics. |
| Neuro‑Symbolic (Concentrated Neuro‑Symbolic Design) | Combines neural networks (good at perception, pattern matching) with symbolic reasoning (good at structured, traceable logic). | Argues that both are needed; echoes our semester‑long debate on whether deep learning alone suffices. |
3. The arXiv Paper (Deep Hunt Agents)
The second paper (arXiv 2508.05668) zooms in on a specific type of agentic system: deep hunt agents—agents dedicated to information reclamation.
3.1 What Is a “Hunt” Agent?
- Not just a simple Google‑style search.
- Controls the entire retrieval pipeline: web crawling, private databases, internal memory, etc.
- Decides what to search for, reads the results, re‑plans based on what it finds, and repeats until the query is fully answered.
3.2 Three Hunt Structures
| Structure | Description | Typical Use |
|---|---|---|
| Resemblant Hunt | Splits the original query into multiple sub‑queries and runs them in parallel. | Maximizes breadth and coverage. |
| Successional / Iterative Hunt | Runs a single loop: hunt → read → reflect → decide next hunt. | Mirrors how one personally probes answers step‑by‑step. |
| Mongrel (Tree/Graph‑Grounded) Hunt | Explores multiple paths, can abort unproductive branches, and re‑optimizes the overall strategy mid‑task. | The most complex; aligns with the “counter‑manding” concepts seen in AI hunt algorithms. |
4. Training & Optimization (The Technical Core)
The arXiv paper devotes a substantial section to methodology – the part I had to read twice.
- Supervised Fine‑Tuning (SFT) – the initial training stage: present the model with input‑output pairs to teach basic behavior.
- Reinforcement Learning from Human Feedback (RLHF) – refine the model using reward signals derived from human preferences.
- Curriculum Design – structure training data from simple to complex tasks to improve learning efficiency.
- Evaluation Metrics – assess performance with a combination of automatic metrics (e.g., BLEU, ROUGE) and human evaluations (e.g., helpfulness, safety).
5. Reflections on the Notebook LM Summary
- The AI‑generated summary was decent but missed the key distinction between MDPI’s broad review and the arXiv paper’s focused deep‑hunt approach.
- It also glossed over the architectural nuances (e.g., BDI vs. hierarchical models) that were crucial for my understanding.
- Going forward, I’ll keep the summary as a quick‑look tool but will rely on a second, more detailed pass for nuanced topics.
Bottom Line
Both papers reinforce that agentic AI is more than just a smarter LLM; it’s about autonomous goal‑driven behavior, modular architectures, and transparent reasoning. The MDPI review provides the big picture, while the arXiv pre‑print shows a concrete implementation (deep‑hunt agents) that ties directly to the information‑retrieval concepts we’ve studied.
End of notes.
Model Exemplifications of Good Logic and Search Circles
The paper explains how agents learn what “doing it right” looks like by underpinning learning with a commodity called RLVR (Reinforcement Learning with Empirical Price Signals).
Price signals are grounded in whether the agent’s outputs are actually correct and empirical, rather than merely presumptive.
The price functions are multi‑objective, balancing several effects:
- Answer correctness
- Effectiveness of the reclamation
- Quality of substantiation
- Penalties for spare queries or labor that is longer than necessary
The last point is intriguing because it trains the system to be terse, a property not typically emphasized in standard RL setups.
Spanning Up Test‑Time Hunt
The authors also propose a test‑time “hunt”: providing the model with additional computational “coffers” at inference time (i.e., when it is actually being used) rather than only during training.
- The argument is that extra thinking time at the moment of use can improve logical quality.
- This feels counter‑intuitive compared to the usual “bigger training = better model” assumption, but the paper makes a reasonable case.
Benchmarks
The experiments use three benchmark suites:
| Benchmark | Reported Score |
|---|---|
| FRAMES | > 94 % |
| GAIA | > 94 % |
| HotpotQA | > 94 % |
Note: Although the scores are impressive, the paper honestly notes that this performance does not always transfer to messier real‑world environments.
Operations
The paper lists many domains where these agents could be applied:
- Healthcare
- Finance
- Legal exploration
- Automated reporting
The list is long, so I skimmed it, but one item stood out:
Deep Research
An agent runs an extended, independent exploration session across multiple sources and produces a full report at the end.
- I have already seen this capability in a couple of AI tools.
- Understanding the underlying mechanisms (iterative hunt cycles, tool selection, price shaping during training) makes it feel less like a black box.
There is also a notion that agents use the hunt process to ameliorate their own internal capabilities—navigating memory, selecting the right tools, and reacquiring knowledge to reason better in the future. This recursive idea is intriguing, though I don’t fully grasp it yet.
Challenges and Limitations
Both papers devote substantial space to what is still broken, and this section is important to read.
The Fineness Problem
- Agents perform well in controlled settings but degrade when unanticipated situations arise.
- This is a real deployment issue: a system that works 94 % of the time in the lab may fail piecemeal when data differ slightly from the training distribution.
Responsibility
- Who is accountable when an independent agent makes a bad decision (e.g., a wrong medical recommendation)?
- The papers raise the question but offer no definitive answer, reflecting the broader AI‑ethics gap: technical progress outpaces governance.
Security
- Hostile attacks, data‑poisoning, and unverifiable tool use become serious vulnerabilities as agents gain more independence.
- These are not merely academic concerns.
NotebookLM Experience — Honest Reflection
- I loaded both papers into NotebookLM and asked it to generate a summary before reading.
- The summary was accurate at a high level: it captured the main generalities and named the infrastructures.
Issues Encountered
Over‑generalization
- The summary treated both papers as a single unified document about agentic AI, even though they differ substantially.
- The MDPI paper provides a broad taxonomic review.
- The arXiv paper offers a focused, specialized examination of a specific subtype of system.
- This distinction matters for understanding each paper’s contribution.
- The summary treated both papers as a single unified document about agentic AI, even though they differ substantially.
Equal weighting of content
- Important technical sections (e.g., the RLVR approach, test‑time scaling, multi‑objective price functions) were reduced to a couple of sentences.
- Those sections are crucial for grasping why the systems behave as they do, and the summary omitted that depth.
Takeaway
Using the LLM summary as a starting point was useful for exposure—it gave me a preview of the motifs before diving into the dense text. However, it was insufficient for discerning which parts were central versus background; I still needed to return to the original papers for proper judgment.
Final Studies
What stuck with me from both papers is the gap between “AI that answers questions” and “AI that pursues pretensions.”
The gap is larger than it sounds, spanning technical, conceptual, and responsibility/safety dimensions.
Key Takeaways
The course material gave me the vocabulary to engage with these papers:
- Agent infrastructures
- Search structures
- Multi‑agent collaboration
- Ethics of independent systems
This made reading feel less like decoding and more like connecting known concepts.
Open Challenges
- Responsibility remains the hardest problem to solve.
- Specialized technical challenges seem solvable with further exploration.
- Governance challenges appear to lag behind technical progress.
That’s the essence of my reflection.
Probably enough. It’s late.
Mention: @raqeeb_26