Faster fixes, less context sharing: how Grafana Assistant learns your infrastructure before you even ask
Source: Grafana Blog
When an unexpected alert fires, most engineers first ask their AI assistant for help. You ask why your checkout service is slow, but the assistant can’t provide meaningful insights quickly without proper guidance. You end up sharing details about data sources, services, connections, labels, and metrics, and every conversation starts from scratch, eating into valuable troubleshooting time.
If you’re using Grafana Assistant, the agentic observability assistant, you can skip that context‑gathering step. Assistant studies your infrastructure ahead of time and builds a persistent knowledge base, so by the time you ask your first question it already knows what’s running, how it’s connected, and where to look.
Fostering a knowledge base to jump‑start incident response
Assistant automatically builds and maintains a knowledge base about your environment:
- It knows what services you run, how they connect, which metrics and labels matter, where the logs live, and how things are deployed.
- Think of it as giving the assistant a map of your world before it starts answering questions.
As a result, conversations become faster and more accurate. When you ask about a service, the assistant doesn’t need to fumble through data‑source discovery—it already knows, for example, that your payment system talks to three downstream services, that its latency metrics live in a specific Prometheus data source, and that its logs are structured JSON in Loki.
During an incident, speed matters. Pre‑loaded context can shave valuable minutes off response time, especially for teams where not everyone has a full picture of the infrastructure. A developer can ask about upstream dependencies and receive accurate answers even without having previously examined those systems.
How does it work?
Assistant runs this “infrastructure memory” in the background with zero configuration. A swarm of AI agents performs the heavy lifting:
- Data source discovery – identifies all connected Prometheus, Loki, and Tempo data sources in your Grafana Cloud stack.
- Metrics scans – queries Prometheus data sources in parallel to find services, deployments, and infrastructure components.
- Enrichments via logs and traces – correlates Loki and Tempo data with their corresponding metrics, adding context about log formats, trace structures, and service dependencies.
- Structured knowledge generation – for each discovered service group, agents produce documentation covering five areas:
- What the service is
- Its key metrics and labels
- How it’s deployed
- What it depends on
- How its logs are structured
The knowledge is stored as searchable chunks in a vector database, enabling millisecond‑fast retrieval through semantic search. The whole process refreshes automatically on a weekly cadence, keeping the assistant’s understanding current as your environment evolves.
What does the assistant actually learn?
For every service group it discovers, Assistant captures five categories of knowledge:
| Category | Details |
|---|---|
| Identity and purpose | Service name, function, namespace, cluster, and technology stack |
| Key metrics | Metric names and labels specific to the service, including golden signals (latency, error rate, traffic, saturation) |
| Deployment topology | Kubernetes resources, replica counts, scaling configurations, container details |
| Dependencies | Upstream/downstream service connections, database/cache relationships, message‑queue interactions, external integrations |
| Log structure | Available log labels/values, detected formats (JSON, logfmt, unstructured), common patterns, extracted field names |
This context distinguishes a generic assistant from one that provides answers tailored to your environment.
You don’t have to do anything
- No configuration, enablement, or maintenance steps are required.
- It runs automatically for all Grafana Cloud customers who use Assistant.
- Your existing telemetry (Prometheus, Loki, Tempo) is the input; the assistant reads what’s already there and builds its understanding.
You can review what the assistant has learned in the Assistant settings and browse the discovered service groups, or trigger a manual scan to refresh the knowledge base ahead of the next automatic cycle. Assistant respects your organization’s access controls—each memory is linked to the data sources used to generate it, so users only see knowledge derived from data sources they have permission to access.
A foundation for smarter conversations
The feature works best when you don’t notice it. You ask a question, receive a precise answer that references the right metrics, labels, and data sources, and you don’t have to wonder whether the assistant truly understands your environment—it already mapped it.
This is a step toward an assistant that genuinely understands the infrastructure it helps you observe and eventually knows your system well enough to ask the right questions on its own.