Minimal .NET LLM Observability: Reproduce Timeouts and Triage in 15 Minutes

Published: 1 day ago (March 2, 2026 at 02:30 AM EST)

7 min read

Source: Dev.to

If your LLM endpoint times out, dashboards alone rarely help. What you need is a fast path from symptom to cause.

This post shows a small .NET lab where you can force a controlled 504 and debug it with a repeatable metrics → trace → logs workflow. The stack is ASP.NET Core, Blazor, .NET Aspire, Ollama, and OpenTelemetry, and the goal is practical: reduce time‑to‑diagnosis before you ship.

Here’s the core idea: observability is not dashboards. It is time‑to‑diagnosis.

I built this because I have already lost too much time staring at logs without a reliable way to correlate logs, traces, and metrics. For this post, an “LLM workload” means an endpoint where tail latency and failures often come from a model call plus prompt or tool changes, not just your HTTP handler.

This post is repo‑first and uses the companion repository directly:

Repo:

It includes a Blazor UI to trigger healthy, delay, timeout, and real model‑call scenarios.

The Stack in One Minute

Component	Description
ASP.NET Core API	A small request surface that I can instrument end‑to‑end without noise.
Blazor Web UI	One‑click healthy, delay, timeout, and real model‑call scenarios.
.NET Aspire AppHost	Local orchestration plus the Aspire Dashboard for fast pivoting.
Ollama (`ollama/ollama:0.16.3`)	Real local model‑call behavior without cloud token cost.
OpenTelemetry	Logs tell me what, traces tell me where, metrics tell me how often.

The point is simple: one local environment where I can trigger failure and observe it end‑to‑end without guessing.

Why LLM Timeouts Feel Different

Prompt changes are deployments: the code may stay the same, but latency and failure modes can change.
Model and runtime changes can shift tail latency.
Tool or dependency calls amplify variance — one slow call can become a timeout.

Minimum Correlation Fields

To keep triage fast, I want a few fields to exist everywhere:

Field	Purpose
`run_id`	Follow one request lifecycle
`trace_id`	Follow execution across spans and services
`prompt_version`	Tie behavior to prompt changes
`tool_version`	Tie failures to integration changes

How Correlation Should Look

POST /ask → trace_id in the trace span → run_id + trace_id in logs → timeout metric increases

Naming convention I use

snake_case in logs and JSON: run_id, trace_id, prompt_version, tool_version
camelCase in C# variables: runId, traceId, promptVersion, toolVersion

Example log line

timeout during /ask run_id=9f0f2f3a6fdd4f5f9e9a1f4d8f6c6f3e trace_id=4c4f3b2e86d4d6a6b1f69a0d9d0d9f0a prompt_version=v1 tool_version=local-llm-v1

If one link in that chain is missing, triage slows down immediately.

What the Debugging Flow Looks Like

In practice, the drill looks like this:

Click Simulated Timeout (504) in the Web UI.
Open Aspire Metrics and confirm llm_timeouts_total increased.
Jump to Traces and open the failing llm.run.
Copy the trace_id, then pivot to logs and filter by trace_id or run_id.
Check whether the failure lines up with a specific prompt_version or tool_version.

That is the whole point of the lab: move from a timeout symptom to a likely cause in a few deliberate steps instead of guessing.

Prerequisites

Docker Desktop or Docker Engine installed and running
The .NET SDK version specified in the repo’s global.json installed
Aspire workload (if required by your setup)

dotnet workload install aspire

Local ports available (or adjust launch settings): 18888, 18889, 11434
If you use the stable API port appendix, you also need 17100 free

Step 1 — Clone and Run the Repository

git clone https://github.com/ovnecron/minimal-llm-observability.git
cd minimal-llm-observability
dotnet run --project LLMObservabilityLab.AppHost/LLMObservabilityLab.AppHost.csproj

Open the Aspire Dashboard URL printed in the terminal. If you see an auth prompt, use the one‑time URL from the terminal.

Fixed local HTTP launch settings

Aspire Dashboard: http://localhost:18888
OTLP endpoint (Aspire Dashboard): http://localhost:18889
Web UI (LLMObservabilityLab.Web): open it from the Aspire Dashboard resource list

Unsecured local transport is already enabled in the AppHost launch profile with ASPIRE_ALLOW_UNSECURED_TRANSPORT=true.

If you already run Ollama locally on 11434, stop it or change the container port mapping in AppHost.

If Real Ollama Call returns “model not found”, pull the default model in the running container:

docker exec -it "$(docker ps --filter "name=local-llm" --format "{{.Names}}" | head -n 1)" \
  ollama pull llama3.2:1b

Step 2 — Trigger Scenarios in the Web UI

Open Aspire Dashboard → Resources → click the web-ui endpoint.
The root page in LLMObservabilityLab.Web gives you one‑click actions:
- Healthy Run
- Simulate Delay
- Real Ollama Call
- Simulated Timeout (504)
Each run shows:
- run_id
- trace_id
- status
- elapsed time
The Web UI also includes /drill with a fixed 15‑minute triage checklist.

Step 3 — Generate a Healthy Baseline (Optional)

Click Healthy Run about 20 times in the Web UI. This gives you a quick baseline in the following metrics:

llm_runs_total
llm_success_total

You can then compare subsequent failure runs against this baseline.

Step 4 — Force a Timeout and Triage It

Click Simulated Timeout (504) in the Web UI.
Immediately open the Aspire Dashboard.

The button returns a controlled 504 so you can exercise the observability pipeline on demand.

My triage loop (target: ~15 minutes)

Phase	Action
Spot	Check `llm_timeouts_total` in Metrics
Drill	Open the failing `llm.run` trace
Pivot	Filter logs by `trace_id` and `run_id`
Inspect	Compare `prompt_version` and `tool_version`
Mitigate	Apply the smallest safe fix first
Verify	Re‑run the timeout scenario and confirm recovery

Simple flow to follow

Metrics → check llm_latency_ms for the spike
Traces → filter scenario=simulate_timeout → open the failing llm.run

Minimal Signals I Use to Make Fast Decisions

Directly emitted by this repo

llm_runs_total
llm_success_total
llm_timeouts_total
llm_errors_total
llm_latency_ms

Derived metric

task_success_rate = llm_success_total / llm_runs_total * 100

Starter Alert Heuristics

(These are seeds — tune them to your baseline.)

task_success_rate drops > 5 pp in 30 minutes
Latency percentile (derived from llm_latency_ms) rises > 30 % over baseline
Tool‑version‑scoped success (runs tagged with tool_version) falls < 90 %

Troubleshooting

Symptom	Remedy
Port 11434 already in use	Stop the local Ollama instance or change the AppHost port mapping
No traces or metrics	Verify the Aspire Dashboard is running and the OTLP endpoint is reachable
Model not found	Run `ollama pull …` inside the container
CLI or API calls fail	Copy the exact API endpoint from the Aspire Dashboard (`llm‑api → Endpoints`)

Verified vs. Opinion

Observability advice often mixes hard facts with personal workflow.

Verified (reproducible in this repo)

Scenarios (healthy, delay, timeout, real call) are triggered from the Web UI.
The correlation chain exists: metric counters → llm.run traces → logs with run_id and trace_id.

Opinion (works for me, but tune as needed)

The “15‑minute” target loop.
The alert thresholds above (starter seeds, not universal truth).
The exact four correlation fields (add more if your system needs them).

Final Thoughts

The goal isn’t perfect dashboards; it’s shrinking time‑to‑diagnosis.
If you can’t pivot from a timeout to the exact trace and log lines, you’re still guessing.

I used this lab to find a workflow that works for me, and I hope it helps you build an observability pipeline that works for you.

If you run into an issue, open a GitHub issue and I’ll be happy to help.

Minimal .NET LLM Observability: Reproduce Timeouts and Triage in 15 Minutes

The Stack in One Minute

Why LLM Timeouts Feel Different

Minimum Correlation Fields

How Correlation Should Look

What the Debugging Flow Looks Like

Prerequisites

Step 1 — Clone and Run the Repository

Step 2 — Trigger Scenarios in the Web UI

Step 3 — Generate a Healthy Baseline (Optional)

Step 4 — Force a Timeout and Triage It

My triage loop (target: ~15 minutes)

Simple flow to follow

Minimal Signals I Use to Make Fast Decisions

Starter Alert Heuristics

Troubleshooting

Verified vs. Opinion

Verified (reproducible in this repo)

Opinion (works for me, but tune as needed)

Final Thoughts

References

Related posts

Shared Workflows: minha experiência definindo pipelines reutilizáveis

Building a Local-First Financial IDE: How I forced Gemini AI to do strict Double-Entry Accounting

I ran cursor-doctor on 50 real projects. Here's what broke.

Google Gemini Writing Challenge

The Stack in One Minute

Why LLM Timeouts Feel Different

Minimum Correlation Fields

How Correlation Should Look

What the Debugging Flow Looks Like

Prerequisites

Step 1 — Clone and Run the Repository

Step 2 — Trigger Scenarios in the Web UI

Step 3 — Generate a Healthy Baseline (Optional)

Step 4 — Force a Timeout and Triage It

My triage loop (target: ~15 minutes)

Simple flow to follow

Minimal Signals I Use to Make Fast Decisions

Starter Alert Heuristics

Troubleshooting

Verified vs. Opinion

Verified (reproducible in this repo)

Opinion (works for me, but tune as needed)

Final Thoughts

References

Related posts

Shared Workflows: minha experiência definindo pipelines reutilizáveis

Building a Local-First Financial IDE: How I forced Gemini AI to do strict Double-Entry Accounting

I ran cursor-doctor on 50 real projects. Here's what broke.

Google Gemini Writing Challenge

Step 1 — Clone and Run the Repository

Step 2 — Trigger Scenarios in the Web UI

Step 3 — Generate a Healthy Baseline (Optional)

Step 4 — Force a Timeout and Triage It

My triage loop (target: ~15 minutes)