Did some actual coding today - found a blind spot example for coding agents

Published: (February 22, 2026 at 03:42 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

TL;DR

Wrote a new feature, refactored existing code to avoid a double‑spend on sorting, and tested whether coding agents would spot the issue – they didn’t.

Background

Over the past few weeks my GitHub activity has been intense. I’m now using coding agents for almost everything except the initial “walking skeleton” of a brand‑new project, which I still draft with AI to get a quick understanding. Most of this work is in Go, a language I’ve been learning recently.

While reviewing Python and C# code (my more native languages), I realized I would struggle to write something from scratch without AI assistance.

I maintain an open‑source memory MCP for AI agents and wanted to add additional re‑ranker providers. The goal was to support a generic HttpProvider, enabling users to call cloud re‑ranker services or self‑hosted ones (e.g., via llama.cpp or vLLM).

The existing FastEmbed re‑ranker used a protocol that returned a simple list of floats (scores) in the same order as the input documents. The memory repository then handled ordering and top‑k filtering:

scores = await self.rerank_adapter.rerank(query=rerank_query, documents=documents)

scored_candidates = list(zip(dense_candidates, scores))
scored_candidates.sort(key=lambda x: x[1], reverse=True)

top_k_memories = [memory for memory, score in scored_candidates[:k]]

When implementing the HttpAdapter I discovered it returned both the original index and the score already sorted by score. This meant the repository would need to reorder the results back to the original document order before sorting again—effectively a double spend on sorting.

I refactored the repository and the existing adapter so that the repository now expects the re‑ranker adapter to handle ordering. After the change, all tests still passed, and a quick check with Claude confirmed no regressions.

Testing with Coding Agents

Curious whether other models would notice the inefficiency, I stashed the changes and prompted several coding agents using my context‑gather command. The prompt was:

Create a plan for implementing an HTTP re‑ranker adapter using httpx. The user should be able to configure a re‑ranker HTTP endpoint, model, API key (optional), and specify a re‑ranking provider of http. When the provider is set to HTTP, use the provided environment variables to call the v1/rerank endpoint. The memory repository should then use this provider for ranking memories.

I tried three models:

  • Claude Opus 4.6 (Claude Code Agent Harness)
  • Codex 5.3 (Co‑pilot CLI Agent Harness)
  • Gemini Pro 3.0 (Co‑pilot CLI Agent Harness)

All three implemented a change that returned the sorted list of scores, which the repository would then re‑sort back to the original order—producing the same double‑spend on sorting.

The prompt didn’t mention any optimization, and my global Claude.MD file (read by the Copilot CLI) contains a development philosophy that encourages minimal changes to meet objectives and to question existing patterns rather than silently refactor. None of the agents highlighted the inefficiency or raised it as an issue.

Reflections

This experience underscores the importance of clear guardrails and prompts. It also reminded me of Professor Michael John Wooldridge’s talk at the Royal Society, This is not the AI we were promised, which I highly recommend.

That’s all for my Sunday musings.

0 views
Back to Blog

Related posts

Read more »