Agentic AI and Search Agents
Source: Dev.to
Agentic AI – Reflections on Two Papers
(Notes from my three‑day deep‑dive into the MDPI review and the arXiv pre‑print 2508.05668. I used Notebook LM to generate a quick summary before reading; I’ll comment on that at the end.)
1. Why “Agentic” AI Is Different
In our AI class we’ve spent a lot of time talking about agents – the sense‑think‑act loop, how they perceive, form opinions, and take actions.
-
Standard LLMs (the “bones” we discussed in class) are reactive:
- Input → Output → Done.
-
Agentic AI goes further:
- The system can pursue a goal over multiple steps without constant human guidance.
- It sets sub‑goals, selects tools, executes actions, evaluates results, and adjusts its plan.
- All of this happens with minimal human involvement.
The idea sounds simple, but the actual implementation is far more complex – that’s what both papers explore.
2. The MDPI Review (Big‑Picture Overview)
The MDPI article is a broad review that covers:
- Delineations – definitions and scope.
- Frameworks & Infrastructures – architectural models.
- Operations & Challenges – practical considerations.
2.1 Architectural Models that Stood Out
| Model | Core Idea | When It Shines |
|---|---|---|
| ReAct | Cycle: Reason → Act → Observe → Reason again. | Simple, fast tasks; mirrors the introductory sense‑think‑act cycle we covered. |
| Hierarchical / Supervisor | A top‑level agent breaks a problem into sub‑tasks and delegates to specialized sub‑agents. | E.g., “write a request‑exploration report”: the supervisor hands web‑search to one agent, data analysis to another, writing to a third. This maps directly onto the multi‑agent systems we discussed. |
| BDI (Belief‑Desire‑Intention) | Tracks an agent’s beliefs (what it thinks is true), desires (what it wants), and intentions (what it is currently committed to). | Provides a clear trace for responsibility and transparency, key themes in AI ethics. |
| Neuro‑Symbolic (Concentrated Neuro‑Symbolic Design) | Combines neural networks (good at perception, pattern matching) with symbolic reasoning (good at structured, traceable logic). | Argues that both are needed; echoes our semester‑long debate on whether deep learning alone suffices. |
3. The arXiv Paper (Deep Hunt Agents)
The second paper (arXiv 2508.05668) zooms in on a specific type of agentic system: deep hunt agents—agents dedicated to information reclamation.
3.1 What Is a “Hunt” Agent?
- Not just a simple Google‑style search.
- Controls the entire retrieval pipeline: web crawling, private databases, internal memory, etc.
- Decides what to search for, reads the results, re‑plans based on what it finds, and repeats until the query is fully answered.
3.2 Three Hunt Structures
| Structure | Description | Typical Use |
|---|---|---|
| Resemblant Hunt | Splits the original query into multiple sub‑queries and runs them in parallel. | Maximizes breadth and coverage. |
| Successional / Iterative Hunt | Runs a single loop: hunt → read → reflect → decide next hunt. | Mirrors how I personally probe answers step‑by‑step. |
| Mongrel (Tree/Graph‑Grounded) Hunt | Explores multiple paths, can abort unproductive branches, and re‑optimizes the overall strategy mid‑task. | The most complex; aligns with the “counter‑manding” concepts we saw in AI hunt algorithms. |
4. Training & Optimization (The Technical Core)
The arXiv paper devotes a substantial section to methodology – the part I had to read twice.
- Supervised Fine‑Tuning (SFT) – the initial training channel: present the model with input‑output pairs to teach basic behavior.
- (The original notes cut off here; the paper continues with reinforcement learning, curriculum design, and evaluation metrics.)
5. Reflections on the Notebook LM Summary
- The AI‑generated summary was decent but missed the key distinction between the MDPI’s broad review and the arXiv paper’s focused deep‑hunt approach.
- It also glossed over the architectural nuances (e.g., BDI vs. Hierarchical models) that were crucial for my understanding.
- Going forward, I’ll keep the summary as a quick‑look tool but will rely on a second, more detailed pass for nuanced topics.
Bottom Line
Both papers reinforce that agentic AI is more than just a smarter LLM; it’s about autonomous goal‑driven behavior, modular architectures, and transparent reasoning. The MDPI review gives us the big picture, while the arXiv pre‑print shows a concrete implementation (deep hunt agents) that ties directly to the information‑retrieval concepts we’ve studied.
End of notes.
Model Exemplifications of Good Logic and Search Circles
The paper describes how agents learn what “doing it right” looks like by underpinning learning with a commodity called RLVR (reinforcement learning with empirical price signals).
-
The agent receives price signals grounded in whether its labors are actually correct and empirical, rather than merely presumptive.
-
The price functions are multi‑objective, balancing several effects:
- Answer correctness
- Effectiveness of the reclamation
- Quality of substantiation
- Penalties for spare quests or labors that are longer than necessary
The last point is intriguing because it trains the system to be terse, which isn’t a commodity I initially anticipated.
Spanning Up Test‑Time Hunt
The authors also propose a test‑time “hunt”: giving the model additional computational “coffers” at inference time (when it’s actually being used) rather than only during training.
- The argument is that extra thinking time at the moment of use can improve logical quality.
- This feels counter‑intuitive compared to the usual “bigger training = better model” assumption, but the paper makes a reasonable case.
Benchmarks used include FRAMES, GAIA, and HotpotQA.
- Technical settings report scores over 94 %, which sounds great, but the paper honestly notes that this performance does not always transfer to messier real‑world environments.
Operations
The paper lists many domains where these agents could be applied:
- Healthcare
- Finance
- Legal exploration
- Automated reporting
The list is long, so I skimmed it, but one item stood out:
Deep Research
An agent runs an extended, independent exploration session across multiple sources and produces a full report at the end.
- I have already seen this capability in a couple of AI tools.
- Understanding the underlying mechanisms (iterative hunt circles, tool selection, price shaping during training) makes it feel less like a black box.
There is also a notion that agents use the hunt process to ameliorate their own internal capabilities—navigating memory, selecting the right tools, and reacquiring knowledge to reason better in the future. This recursive idea is intriguing, though I don’t fully grasp it yet.
Challenges and Limitations
Both papers devote substantial space to what is still broken, and this section is important to read.
The Fineness Problem
- Agents perform well in controlled settings but degrade when unanticipated situations arise.
- This is a real deployment issue: a system that works 94 % of the time in the lab may fail piecemeal when data differ slightly from the training distribution.
Responsibility
- Who is accountable when an independent agent makes a bad decision (e.g., a wrong medical recommendation)?
- The paper raises the question but offers no definitive answer, reflecting the broader AI‑ethics gap: technical progress outpaces governance.
Security
- Inimical attacks, data poisoning, and unverifiable tool use become serious vulnerabilities as agents gain more independence.
- These are not merely academic concerns.
NotebookLM Experience — Honest Reflection
- I loaded both papers into NotebookLM and asked it to generate a summary before reading.
- The summary was accurate at a high level: it captured the main generalities and named the infrastructures.
However, two issues emerged:
-
Over‑generalization: The summary treated both papers as a single unified document about agentic AI, even though they differ substantially.
- The MDPI paper provides a broad taxonomic review.
- The arXiv paper offers a focused, specialized examination of a specific subtype of system.
- This distinction matters for understanding each paper’s contribution.
-
Equal weighting of content: Important technical sections (e.g., the RLVR approach, test‑time scaling, multi‑objective price functions) were reduced to a couple of sentences.
- Those sections are crucial for grasping why the systems behave as they do, and the summary omitted that depth.
Takeaway: Using the LLM summary as a starting point was useful for exposure—it gave me a preview of the motifs before diving into the dense text. But it was insufficient for discerning which parts were central versus background; I still needed to return to the original papers for proper judgment.
Final Studies
What stuck with me from both papers is the gap between “AI that answers questions” and “AI that pursues pretensions.”
- The gap is larger than it sounds, spanning technical, conceptual, and responsibility/safety dimensions.
The course material equipped me with the vocabulary to engage with these papers:
- Agent infrastructures
- Search structures
- Multi‑agent collaboration
- Ethics of independent systems
This made reading feel less like decoding and more like connecting known concepts.
- Responsibility remains the hardest problem to solve.
- Specialized technical challenges seem solvable with further exploration.
- Governance challenges appear to lag behind technical progress.
That’s the essence of my reflection.
Probably enough. It’s late.
Mention: @raqeeb_26