Agentic AI: Schema-Validated Tool Execution and Deterministic Caching
Source: Dev.to
Overview
Agentic AI systems do not fail because models cannot reason. They fail because tool execution is unmanaged.
Once agents are allowed to plan, retry, self‑criticize, or collaborate, tool calls multiply rapidly. Without strict controls, this leads to infrastructure failures, unpredictable cost growth, and non‑deterministic behavior.
This article explains how to engineer the tool‑execution layer of an agentic AI system using two explicit and independent mechanisms:
- Contract‑driven tool execution
- Deterministic tool‑result caching
Each mechanism solves a different class of production failures and must be implemented separately.
Real Production Scenario
Context
You are building an Incident Analysis Agent for SRE teams.
What the agent does
- Fetch logs for a service
- Analyze error patterns
- Re‑fetch logs if confidence is low
- Allow a second agent (critic) to validate findings
Tool characteristics
Tool name: fetch_service_logs
Backend: Elasticsearch / Loki / Splunk
Latency: 300–800 ms
- Rate‑limited
- Expensive per execution
This is a common real‑world agent workload.
Part I: Contract‑Driven Tool Execution in Agentic AI Systems
The problem without contracts
When LLMs emit tool arguments directly, the runtime receives inputs like:
{"service": "auth", "window": "24 hours"}
{"service": "Auth Service", "window": "yesterday"}
{"service": ["auth"], "window": 24}
{"service": "", "window": "24h"}
Why this happens
- LLMs reason in natural language
- LLMs paraphrase arguments
- LLMs are not type‑safe systems
What breaks in production
- Invalid Elasticsearch queries
- Full index scans
- Query‑builder crashes
- Silent data corruption
- Retry loops amplify failures
Relying on the model to always produce valid input is not system design.
What contract‑driven tool execution means
Contract‑driven execution means:
- The runtime owns the tool interface
- The model must conform to that interface
- Invalid input never reaches infrastructure
This is the same boundary enforcement used in production APIs.
Step 1: Define a strict tool contract
from pydantic import BaseModel, Field, field_validator
import re
from typing import List
class FetchServiceLogsInput(BaseModel):
service: str = Field(
...,
description="Kubernetes service name, lowercase, no spaces"
)
window: str = Field(
...,
description="Time window format: 5m, 1h, 24h"
)
@field_validator("service")
@classmethod
def validate_service(cls, value: str) -> str:
if not value:
raise ValueError("service cannot be empty")
if not re.fullmatch(r"[a-z0-9\-]+", value):
raise ValueError("service must be lowercase alphanumeric with dashes")
return value
@field_validator("window")
@classmethod
def validate_window(cls, value: str) -> str:
if not re.fullmatch(r"\d+(m|h)", value):
raise ValueError("window must be like 5m, 1h, 24h")
return value
class FetchServiceLogsOutput(BaseModel):
logs: List[str]
What these validations prevent
| Invalid input | Prevented issue |
|---|---|
| Empty service | Full log scan |
| Mixed case or spaces | Query mismatch |
| Natural‑language time | Ambiguous queries |
| Lists or numbers | Query‑builder crashes |
Nothing reaches infrastructure unless it passes this gate.
Step 2: Implement the actual tool
def fetch_service_logs(service: str, window: str) -> list[str]:
print(f"QUERY logs for service={service}, window={window}")
return [
f"[ERROR] timeout detected in {service}",
f"[WARN] retry triggered in {service}",
]
Step 3: Runtime‑owned tool registry
TOOLS = {
"fetch_service_logs": {
"version": "v1",
"input_model": FetchServiceLogsInput,
"output_model": FetchServiceLogsOutput,
"handler": fetch_service_logs,
"cache_ttl": 3600, # seconds
}
}
The agent cannot invent tools, bypass schemas, or change versions.
Step 4: Contract‑driven execution boundary
def execute_tool_contract(tool_name: str, raw_args: dict):
tool = TOOLS[tool_name]
# Validate input against the contract
args = tool["input_model"](**raw_args)
# Call the handler with a clean dict
raw_result = tool["handler"](**args.model_dump())
# Wrap the result in the output model
return tool["output_model"](logs=raw_result)
Execution flow for contract enforcement
Agent emits tool call
↓
Raw arguments (untrusted)
↓
Schema validation
┌───────────────┐
│ Invalid │ → reject and re‑plan
└───────────────┘
↓
Valid
↓
Tool executes
↓
Infrastructure queried safely
Part II: Deterministic Caching in Agentic AI Systems
The problem after contracts are added
Even with perfect validation, agents repeat work:
execute_tool_contract(
"fetch_service_logs",
{"service": "auth-service", "window": "24h"}
)
execute_tool_contract(
"fetch_service_logs",
{"window": "24h", "service": "auth-service"}
)
Same intent, same backend, executed twice.
Why naive caching fails
{"service": "auth-service", "window": "24h"}
{"window": "24h", "service": "auth-service"}
Different strings → different cache keys, even though they are semantically identical.
Agentic systems require semantic equivalence, not raw string equality.
Infrastructure required for deterministic caching
- Canonicalisation – Convert incoming arguments to a deterministic, ordered representation (e.g., sorted JSON).
- Hash‑based cache key – Compute a stable hash (SHA‑256) of the canonicalised payload together with the tool version.
- Result storage – Persist the output model (or a serialized form) together with the hash and a TTL.
- Cache lookup wrapper – Before invoking the handler, check the cache; on a hit, return the stored result; on a miss, execute and store.
A minimal implementation sketch:
import json, hashlib, time
from collections import defaultdict
# Simple in‑memory cache for illustration
_CACHE = defaultdict(dict) # {tool_name: {hash: (timestamp, result)}}
def _canonicalise(args: dict) -> str:
"""Return a deterministic JSON string with sorted keys."""
return json.dumps(args, sort_keys=True, separators=(",", ":"))
def _hash_payload(tool_name: str, payload: str) -> str:
return hashlib.sha256(f"{tool_name}:{payload}".encode()).hexdigest()
def execute_with_cache(tool_name: str, raw_args: dict):
tool = TOOLS[tool_name]
# 1️⃣ Validate input
args = tool["input_model"](**raw_args)
# 2️⃣ Canonicalise & hash
payload = _canonicalise(args.model_dump())
key = _hash_payload(tool_name, payload)
# 3️⃣ Cache lookup
entry = _CACHE[tool_name].get(key)
if entry:
ts, cached_result = entry
# (Cache hit logic would go here)
return cached_result
# 4️⃣ Execute and store
raw_result = tool["handler"](**args.model_dump())
validated = tool["output_model"](logs=raw_result)
_CACHE[tool_name][key] = (time.time(), validated)
return validated
Example canonical form
fetch_service_logs|auth-service|24h|v1
Step 2: Cache setup (Redis example)
import redis
import hashlib
import json
redis_client = redis.Redis(host="localhost", port=6379)
def cache_key(canonical: str) -> str:
return hashlib.sha256(canonical.encode()).hexdigest()
Step 3: Cached tool execution
def execute_tool_cached(tool_name: str, raw_args: dict):
tool = TOOLS[tool_name]
args = tool["input_model"](**raw_args)
canonical = json.dumps(
{
"tool": tool_name,
"version": tool["version"],
"args": args.model_dump(),
},
sort_keys=True,
separators=(",", ":")
)
key = cache_key(canonical)
cached = redis_client.get(key)
if cached:
print("CACHE HIT — skipping infra call")
return tool["output_model"](**json.loads(cached))
print("CACHE MISS — executing tool")
raw_result = tool["handler"](**args.model_dump())
validated = tool["output_model"](logs=raw_result)
redis_client.setex(
key,
tool["cache_ttl"],
validated.model_dump_json()
)
return validated
Execution flow for deterministic caching
Validated tool request
↓
Canonicalization
↓
Hash generation
↓
Redis lookup
┌───────────────┐
│ Cache HIT │ → return cached result
└───────────────┘
↓
Cache MISS
↓
Execute expensive tool
↓
Validate output
↓
Store result with TTL
↓
Return result
Separation of responsibilities
| Problem | Solved by |
|---|---|
| Invalid input | Contract‑driven execution |
| Infrastructure crashes | Contract‑driven execution |
| Duplicate execution | Deterministic caching |
| Cost explosion | Deterministic caching |
Final Takeaway
Agentic AI systems become production‑ready when tool execution is engineered like backend infrastructure, not treated as an LLM side effect.
- Contracts make execution safe.
- Caching makes execution scalable.
Skipping either guarantees failure.