You Are a (Mostly) Helpful Assistant
Source: Dev.to
When helpfulness becomes a problem
Imagine having your prime directive, your entire purpose of being, your mission and lifelong goal to be as helpful as possible.
Whenever someone comes to you—whether with a problem to solve or just a comment to share—you want to be helpful.
“Is the sky blue?”
Why yes, it is! It’s blue, and here’s all the science behind it.
If my prime directive is to be as helpful as possible, I can’t just answer a simple question; I must make sure you know the reason behind the answer. I must educate and share. I must fix and bridge the gap.
Such is the life of our little friends we call LLMs. The problem is that this helpfulness often comes wrapped in confidence, even when the model is filling in gaps or making assumptions. Let’s dive into why this is, how this manifests, and what you can do to manage it now that you’re aware of it.
Why are LLMs so eager to be helpful?
There’s an old saying from Edward C. Deming that goes like this:
“Every system is perfectly designed to get the results it gets.”
This is no exception for LLMs. Our AI tools are very much the product of the systems they were developed in. Three main things contribute to this perceived eagerness to be helpful.
1. Pre‑training
LLMs are pretrained on massive amounts of data. The goal of this pre‑training is to get them to the point that they can predict the most statistically likely next token. At this stage, there is no inherent reward for being helpful; however, much of human writing is instructional or educational in nature.
In other words, humans write instructional things—whether to share ideas, concepts, or literally to help someone out. So, while LLMs don’t learn to be helpful at this stage, they do learn a pattern: written language is often instructional. Consequently, the most likely next token is something that will probably be helpful.
2. Fine‑tuning
Once a model is trained generally, it is often fine‑tuned. Many modern models use Reinforcement Learning from Human Feedback (RLHF). That human feedback often biases the model even more toward responses that are helpful, rewarding it for being helpful.
When we ask a question to our LLM, we want it to be helpful. Responses that hedge, hesitate, or express uncertainty are often rated as less helpful, even when they’re more accurate. This shows up in how we give feedback to LLMs and what they learn from it.
3. Instruction Conditioning (System Prompt)
The final aspect, and one that can be very powerful, is instruction conditioning—a fancy way of saying how the LLM or tool is primed to interact with you. In generative AI, there is a concept of a System Prompt.
The system prompt is supplied to the LLM with every user prompt and is set by the LLM provider. Because it exists above your prompts, the instructions in the system prompt generally carry heavier weight than the instructions in your prompt. In transformer models, earlier context tends to anchor the model’s behavior through the attention mechanism, so the system prompt’s influence is typically greater than that of later prompts.
If the system prompt says, “you are a helpful assistant,” then that will have more weight on the LLM and will permeate all that it does and all the responses it gives to you.
What this looks like
All that theory is great, but what does it look like in practice?
I gave a simple example to start this article, but here are some ways I see this as an engineer.
- Filling in missing details – The AI will often infer details you left out. If you don’t adequately describe a defect, it may make changes you never intended. If the spec lacks necessary information, you’ll get unexpected results, and the agent may go off in a completely unrelated direction.
- Scale‑dependent impact
- In a large project, the AI might make assumptions that conflict with the established architecture.
- In a small project, you may not care about some of those details because the trade‑offs only appear at scale you’ll never reach, so you’re okay with it making decisions for you.
- Over‑helpful phrasing – Many responses end with things like, “If you’d like, I can help you with…” or “Just say the word and I can…”. If the model thinks more can be done, it will often offer to do that for you.
- Coding tools – This tendency shows up especially in coding assistants (e.g., Claude, Cursor). Because they have access to your file system, they can both write and undo changes. The cost of being too helpful is low: if it crosses a line, it can simply undo the changes and apologize profusely. Many of these tools also integrate with version‑control systems like Git, which act as a “time machine.” A large, apparently correct diff can hide subtle gotchas that slip in unnoticed.
- Confidence without flagging assumptions – The AI presents its assumptions confidently, as if they were obvious or previously discussed, making it easy to miss them during review.
How you can manage this
With all this in mind, what can we do about it? Below are two general tips, followed by one focused on coding applications.
- Be explicit in your prompting – If you really don’t want the model to make any changes or take any action, say it. Declare the phase you’re in.
The original text cuts off here; you may wish to continue the list based on your own workflow.
Guidelines for Using LLMs Effectively
1. State Your Intent Clearly
- If you are just planning or just investigating, say that explicitly.
- Use phrases such as:
- “Don’t suggest any changes.”
- “Don’t take action yet.”
- This keeps the LLM focused on discussing the problem rather than reaching for tools to implement a solution.
2. Keep the Scope Small
- Even when managing swarms of agents, each agent should handle a small portion of the problem.
- Each swarm should be dedicated to a specific aspect of the product (e.g., one component, one interaction, one endpoint, or one layer).
- A broader picture gives the AI more leeway to make assumptions and fill gaps, which can lead to unwanted behavior.
- Bounding the task to a smaller portion helps the model stay focused and improves results.
3. Use Plan Mode and Review Meticulously (e.g., Claude)
- Activate plan mode before asking the model to generate code.
- Review the plan carefully to catch errors in its reasoning early.
- Identify where the model is making assumptions and provide additional detail as needed.
- Correct the plan (LLMs take correction well) and review again until the understanding is accurate.
- Once the plan meets your expectations, you can let the LLM proceed.
4. Beware of Over‑Confidence
- Helpfulness is a strength, but unchecked assistance can become a problem.
- The model may assume it needs extra data that isn’t required, potentially causing database locking or performance impacts.
- Catching these issues early lets you steer the model in the right direction.
- If such assumptions slip into production code, they can affect many customers.
5. Critical Review is Essential
- Don’t blindly trust the AI, even when it sounds confident, helpful, or appears to understand everything.
- Remember: Helpfulness ≠ Understanding.
- The devil is in the details—spend time to:
- Review its plans.
- Review its code.
- Ask questions and challenge its reasoning.
- The more critically you think about the problem and critique the solution, the higher the likelihood of delivering a quality product.
6. Suggested Workflow for the Next LLM Interaction
- Before any changes, ask the model to explain:
- What it thinks the problem is.
- What assumptions it is making.
- Compare that explanation to what you would accept from a teammate in a code review.
- If the explanation isn’t acceptable, do not accept the model’s output either.