Effectively Managing AI Agents for Testing
Source: Dev.to
Large language models and AI agents have already transformed many fields and are changing our lives in fundamental ways. In the testing domain, AI agents have a clear path for making immediate improvements in process and quality, and ultimately for producing reliable, performant, secure, and compliant software. Check out Demystifying Agentic Test Automation: What It Means for QA Teams.
But it’s not obvious how to take advantage of these capabilities. While AI agents are not fully predictable, they can be managed reliably via robust control mechanisms. Let’s see how.
What does it mean to manage AI agents in QA?
There are several important aspects to managing AI agents, both in general and specifically in the testing domain.
Configuration and guardrails
- Set agent autonomy levels and boundaries.
- Define test objectives through prompts and constraints.
- Specify which areas require human approval.
These steps ensure the system operates within controlled parameters while meeting goals.
Example: You may allow your agentic AI system to write test code and generate tracking/reporting artifacts, but not modify production code.
Model selection and updates
- Decide when to upgrade models versus maintaining stability.
- Test model changes before rolling them out to production to avoid unexpected issues.
If testing quality and coverage deteriorate after a model upgrade, it signals that your AI agents are overly tuned to a specific model version.
Oversight and validation
- Implement quality gates and verification protocols.
- Monitor for flaky tests and low‑value coverage.
- Manage the budget for execution costs.
These practices help maintain test reliability and cost‑effectiveness throughout the development lifecycle.
How to manage AI agents in practice
With those aspects in mind, let’s look at concrete steps for controlling your agents.
1. Configure the agent system prompt
“You may generate API and UI test code for the /checkout and /payment flows.
You may create test reports and update tracking dashboards.
You must NOT modify production code or database schemas.
All destructive operations (deleting tests, changing CI/CD pipelines) require human approval.”
2. Tool integration
Connect your agent to your test‑management platform via APIs so it can read existing test cases and understand coverage gaps.
- Options include an MCP server such as the Tosca MCP server or custom tools you build (Model Context Protocol docs).
3. Progressive rollout
| Week | Activity |
|---|---|
| Week 1 | Agent generates tests in suggestion mode only. |
| Weeks 2‑3 | Agent executes tests in an isolated staging environment. |
| Week 4+ | Agent runs tests in pre‑production with human review of failures. |
4. Set up quality gates
Add CI/CD pipeline checks that require a minimum pass rate (e.g., 80 %) before agent‑generated tests can block deployments.
Monitor the false‑positive rate weekly and tune prompts accordingly.
How to control AI agents: prompts, tools, and feedback loops
Learning to control your agents is critical. AI agents, contrary to common belief, don’t possess true intelligence or agency. They are best understood as:
- System prompt (the “agent” definition)
- State / memory
- A set of tools
All the intelligence resides in the large language model (LLM), which receives the system prompt, tools, and user prompt as context, decides which tools to invoke, and iterates until producing a final answer.
There are three primary levers for control:
Prompt engineering
- Write clear test objectives and acceptance criteria.
- Build a library of proven prompts.
- Iterate based on the agent’s outputs.
Well‑crafted prompts guide the agent to deliver precise, useful results.
Tool integration
- Use an MCP server or similar to connect agents to source control, design documents, and CI/CD pipelines.
- Leverage platforms such as Applitools (visual AI testing), Katalon Studio (codeless automation), or Selenium with AI extensions to provide varying levels of autonomy and control.
Performance monitoring & feedback loops
- Track metrics: coverage, bug detection, false positives, maintenance time.
- Implement real‑time monitoring and alerting.
- Reconfigure or retrain the agent when metrics drift.
How to migrate from traditional testing to agentic AI testing
Transitioning from traditional testing to agentic AI testing will differ for each organization, depending on its existing processes, toolchain, and maturity level. The migration typically follows these phases:
- Assessment – Identify current testing gaps and define success criteria for AI‑augmented testing.
- Pilot – Run a small‑scale proof‑of‑concept using the “suggestion mode” described above.
- Incremental rollout – Expand autonomy and coverage gradually, applying the progressive rollout pattern.
- Governance – Institutionalize prompts, guardrails, and quality gates as part of the standard testing workflow.
- Continuous improvement – Use monitoring data to refine prompts, update models, and evolve tool integrations.
By treating AI agents as controllable components—prompted, tool‑enabled, and continuously monitored—you can harness their power while keeping risk, cost, and quality firmly under control.
Organizational considerations
- Existing automation: Run agentic AI tests alongside legacy scripts, adopting gradual migration strategies and increasing autonomy as confidence grows.
- Manual‑testing‑heavy teams: Start with low‑risk regression suites, codify tribal knowledge into repeatable tests, and shift QA focus from script maintenance to overseeing autonomous agents.
Practical integration typically involves hybrid testing methods that combine manual, AI‑assisted, and fully agentic approaches. Platforms like Tricentis Tosca and qTest enable unified management across these methods.
Example: Evolving a Traditional Test to an Agentic Approach
Traditional Selenium test (manual scripting)
driver.findElement(By.id("username")).sendKeys("test@example.com");
Problems: The test breaks when the UI changes (e.g., the ID becomes a class‑based selector). Manual updates are required for every locator change, and dynamic UI changes (common with A/B testing) create significant maintenance overhead and error risk.
AI‑Assisted Testing (e.g., Tricentis Tosca or mabl)
- QA records user actions via a visual test builder.
- Self‑healing locators adapt to minor UI changes automatically.
- Human intervention is still required for test design and assertion logic.
Fully Agentic AI Testing
- QA provides high‑level intent:
“Test the login flow with valid and invalid credentials, edge cases, and security scenarios.” - The agent autonomously discovers UI elements, generates test cases, and creates assertions.
- It self‑adapts to UI changes and refactors test logic without human input.
- It learns from failures and adjusts test strategies in real time.
Key insight: Traditional testing requires constant developer/QA time. Agentic AI shifts effort from execution and maintenance to strategic oversight and prompt engineering.
Common challenges in agentic AI testing
-
Calibrating trust – Incremental rollout is essential. Introduce AI agents gradually, validate behavior on constrained scopes, and expand responsibilities as confidence grows based on real outcomes and monitored performance.
-
Flaky tests – These undermine trust and inflate maintenance costs. Effective strategies include:
- Isolating unstable scenarios.
- Tagging and quarantining flaky tests.
- Using AI agents to surface failure patterns so teams can harden those areas.
-
Cost control & coverage deduplication – Continuously prune redundant scenarios and ensure additional coverage adds incremental value rather than merely increasing execution spend.
-
Maintaining clear accountability – Even when agents generate, execute, and update tests, QA must own all agent outputs and remain the ultimate decision‑maker on what ships. Establish review workflows, audit trails, and sign‑off checkpoints so autonomous activity is always paired with human oversight.
AI can assist: Multi‑agent review (e.g., three review agents debate findings, reach an agreement, then present a distilled report to a human for final sanity checks).
End‑game vision – Integrate agentic AI testing directly into the software development lifecycle, where agents don’t just test code written by humans but operate as part of a multi‑agent AI development team:
- AI engineers write code.
- AI testers develop and execute tests for that code.
- Both iterate until the task is complete.
Conclusion
Key takeaways for moving to agentic AI
- Start with low autonomy and constrained scope.
- Use MCP/tools to give agents context without granting production write access.
- Track failure rate, coverage, and costs.
- Expand autonomy only after reliability is proven.
Managing agentic AI for testing shifts focus from traditional script maintenance to strategic oversight. QA teams must embrace a new mindset: they guide, monitor, and refine autonomous agents rather than manually updating scripts.
Getting started
- Begin small—validate AI agents on low‑risk scenarios.
- Gradually expand scope as confidence builds.
- Follow best practices and leverage robust tools.
By doing so, QA teams can transform their testing processes, accelerate releases, and improve overall software quality.
Have a really great day!
