Faster fixes, less context sharing: how Grafana Assistant learns your infrastructure before you even ask

Published: (April 30, 2026 at 01:59 PM EDT)
4 min read

Source: Grafana Blog

When an unexpected alert fires, most engineers first ask their AI assistant for help. You ask why your checkout service is slow, but the assistant can’t provide meaningful insights quickly without proper guidance. You end up sharing details about data sources, services, connections, labels, and metrics, and every conversation starts from scratch, eating into valuable troubleshooting time.

If you’re using Grafana Assistant, the agentic observability assistant, you can skip that context‑gathering step. Assistant studies your infrastructure ahead of time and builds a persistent knowledge base, so by the time you ask your first question it already knows what’s running, how it’s connected, and where to look.

Fostering a knowledge base to jump‑start incident response

Assistant automatically builds and maintains a knowledge base about your environment:

  • It knows what services you run, how they connect, which metrics and labels matter, where the logs live, and how things are deployed.
  • Think of it as giving the assistant a map of your world before it starts answering questions.

As a result, conversations become faster and more accurate. When you ask about a service, the assistant doesn’t need to fumble through data‑source discovery—it already knows, for example, that your payment system talks to three downstream services, that its latency metrics live in a specific Prometheus data source, and that its logs are structured JSON in Loki.

During an incident, speed matters. Pre‑loaded context can shave valuable minutes off response time, especially for teams where not everyone has a full picture of the infrastructure. A developer can ask about upstream dependencies and receive accurate answers even without having previously examined those systems.

How does it work?

Assistant runs this “infrastructure memory” in the background with zero configuration. A swarm of AI agents performs the heavy lifting:

  1. Data source discovery – identifies all connected Prometheus, Loki, and Tempo data sources in your Grafana Cloud stack.
  2. Metrics scans – queries Prometheus data sources in parallel to find services, deployments, and infrastructure components.
  3. Enrichments via logs and traces – correlates Loki and Tempo data with their corresponding metrics, adding context about log formats, trace structures, and service dependencies.
  4. Structured knowledge generation – for each discovered service group, agents produce documentation covering five areas:
    • What the service is
    • Its key metrics and labels
    • How it’s deployed
    • What it depends on
    • How its logs are structured

The knowledge is stored as searchable chunks in a vector database, enabling millisecond‑fast retrieval through semantic search. The whole process refreshes automatically on a weekly cadence, keeping the assistant’s understanding current as your environment evolves.

What does the assistant actually learn?

For every service group it discovers, Assistant captures five categories of knowledge:

CategoryDetails
Identity and purposeService name, function, namespace, cluster, and technology stack
Key metricsMetric names and labels specific to the service, including golden signals (latency, error rate, traffic, saturation)
Deployment topologyKubernetes resources, replica counts, scaling configurations, container details
DependenciesUpstream/downstream service connections, database/cache relationships, message‑queue interactions, external integrations
Log structureAvailable log labels/values, detected formats (JSON, logfmt, unstructured), common patterns, extracted field names

This context distinguishes a generic assistant from one that provides answers tailored to your environment.

You don’t have to do anything

  • No configuration, enablement, or maintenance steps are required.
  • It runs automatically for all Grafana Cloud customers who use Assistant.
  • Your existing telemetry (Prometheus, Loki, Tempo) is the input; the assistant reads what’s already there and builds its understanding.

You can review what the assistant has learned in the Assistant settings and browse the discovered service groups, or trigger a manual scan to refresh the knowledge base ahead of the next automatic cycle. Assistant respects your organization’s access controls—each memory is linked to the data sources used to generate it, so users only see knowledge derived from data sources they have permission to access.

A foundation for smarter conversations

The feature works best when you don’t notice it. You ask a question, receive a precise answer that references the right metrics, labels, and data sources, and you don’t have to wonder whether the assistant truly understands your environment—it already mapped it.

This is a step toward an assistant that genuinely understands the infrastructure it helps you observe and eventually knows your system well enough to ask the right questions on its own.

0 views
Back to Blog

Related posts

Read more »

Alert-Driven Monitoring

Teams usually associate the idea of infrastructure monitoring as a project to “hook up metrics” and “build dashboards”. In fact, in almost every monitoring plat...

Elastic Stack 8.19.15 released

Elastic solutions - Context engineeringhttps://www.elastic.co/elasticsearch/context-engineering – Get the most relevant context to agents so that they deliver...