GPU Flight — System Architecture

Published: 1 month ago (March 16, 2026 at 01:37 AM EDT)

6 min read

Source: Dev.to

Source: Dev.to

System Diagram

GPU Flight component diagram

In a Nutshell

Component	Role
gpufl‑client	Runs on the GPU machine, hooks into CUDA activity, and writes structured logs.
gpufl‑agent	Watches the logs and forwards events to the backend via HTTP or Kafka.
gpufl‑backend	Receives events, stores them, and provides APIs for querying.
gpufl‑front	React UI for browsing sessions and inspecting GPU behavior.

The Client

Implemented as a C++ library (or the gpufl Python package).

Link the library into your application.
Call gpufl::init() at program start – GPU Flight then automatically collects GPU‑related events.

What the client can capture

Kernel launches – name, grid/block dimensions, register count, shared memory, occupancy, timing.
Memory copies – direction, size, duration, throughput.
SASS metrics – warp‑level per‑instruction counters (when the SassMetrics engine is enabled).
PC samples – warp stall‑reason distribution (when the PcSampling engine is enabled).
System metrics – periodic NVML snapshots (GPU utilization, VRAM usage, temperature, power draw, fan speed).

Log files

The client writes three JSONL files (log rotation is automatic):

.device.log   # kernel launches, memory copies, timing
.scope.log    # GFL_SCOPE blocks with optional SASS/PC data
.system.log   # periodic NVML snapshots

The agent tracks file offsets, so events are never re‑sent across rotations.

The Agent

The agent is a separate process that tails the log files and forwards events to the backend.
Because it is decoupled from the client, it can run anywhere that can access the log folder (same machine, NFS share, separate host, etc.).

┌──────────────────────────────┐
│         GPU Machine            │
│                                │
│  gpufl-client → .log files     │
│                │               │
│          gpufl-agent           │   ← simplest: agent on the same machine
└────────────────────┼───────────┘
                     │  (or NFS / shared volume)
                     ▼
              gpufl-agent           ← or: agent on a separate machine

Why the split?

Keeping profiling logic isolated makes the instrumentation library easier to maintain and portable across environments. Changing the transport (HTTP ↔ Kafka ↔ S3) never requires recompiling the client.

Agent responsibilities

Tail all three log streams concurrently.
Persist file positions via a cursor file (survives restarts).
Follow log rotation automatically.
Publish events via HTTP or Kafka.
Optionally archive compressed raw logs to S3‑compatible storage.

The raw .log files contain the full original event stream. Archiving them enables long‑term retention, auditability, and re‑processing with updated analysis logic.

Configuration examples

# HTTP — simplest, good for dev or small deployments
gpufl-agent \
    --folder=/var/log/gpuflight \
    --type=http \
    --url=http://backend:8080/api/v1/events/

# Kafka — recommended for production
gpufl-agent \
    --folder=/var/log/gpuflight \
    --type=kafka \
    --brokers=kafka:9092

# Kafka + S3 archival
gpufl-agent \
    --folder=/var/log/gpuflight \
    --type=kafka \
    --brokers=kafka:9092 \
    --archiver-endpoint=https://s3.amazonaws.com \
    --archiver-bucket=gpu-traces \
    --archiver-access-key=KEY \
    --archiver-secret-key=SECRET

HTTP vs. Kafka

When many GPU machines send data simultaneously, publishing directly to the backend via HTTP couples the producer rate to the backend’s processing speed. If the backend slows down, restarts, or goes down temporarily, events can back up or be lost.

GPU node 1 ──┐
GPU node 2 ──┼── gpufl-agent ──► Kafka ──► gpufl-backend
GPU node 3 ──┘

Benefits of a message queue (Kafka):

Buffering – holds events while the backend restarts or lags.
Burst handling – absorbs spiky event rates typical of GPU workloads.
Multiple consumers – other services can read the same topic (alerting, analytics, archival, etc.).

For a single machine or development setup, HTTP remains simpler and perfectly adequate. The agent supports both modes with a single flag.

The Backend

Implemented as a Spring Boot REST service with two main responsibilities:

Ingestion – POST /api/v1/events/{eventType} where eventType is device, scope, or system.
In a Kafka deployment, a consumer bridge reads from the topic and calls the same endpoint internally.
Query API – provides:
- Sessions list
- System metrics over time
- Kernel events per session
- Profile samples per scope

The frontend consumes only this public API; there is no hidden internal interface.

Event types

Type	Contents
device	Kernel launches, memory copies, precise timing (CUPTI).
scope	`GFL_SCOPE` timing blocks, optionally enriched with SASS metrics and PC samples.
system	Periodic NVML snapshots (GPU util %, VRAM, temperature, power draw, fan speed).

Authentication

All backend endpoints are protected with Bearer tokens.

The Frontend

A React application that visualises:

Sessions and their timelines
Per‑kernel performance details
System‑level metrics (GPU utilization, temperature, etc.)
Scope‑level profiling data (SASS counters, PC samples)

It talks exclusively to the backend’s REST API, making the UI completely decoupled from the data‑collection pipeline.

Recap

Client → writes structured JSONL logs.
Agent → tails logs, forwards events (HTTP/Kafka), optionally archives raw logs.
Backend → ingests events, stores them, serves query API (protected by Bearer tokens).
Frontend → React UI built on top of the public API.

This modular design lets you swap transport mechanisms, scale ingestion independently, and keep profiling logic lightweight and portable.

GPU Flight Overview

Ingestion – Direct HTTP ingestion from the agent; API keys for programmatic access from the frontend or external tooling.
Data retention – Configurable cleanup policy keeps disk usage bounded. Old sessions are pruned automatically once they exceed the configured retention window.
Frontend – A React + TypeScript single‑page application (SPA) with a dark theme, connected to the backend REST API.

Main Dashboard

The main view is a three‑tab dashboard scoped to a single profiling session:

┌────────────────────────────────────────────────────┐
│  Session: my_training_run  [host: gpu-node-01]     │
├──────────────┬─────────────┬───────────────────────┤
│   Kernels    │   Scopes    │       System           │
└──────────────┴─────────────┴───────────────────────┘

Kernels tab

Timeline of kernel launches.
Clicking a kernel opens the Inspector, showing:
- Occupancy, register count, shared memory, grid & block dimensions.
- The CPU call stack that triggered the launch.

Scopes tab

Hierarchical tree of GFL_SCOPE blocks with timing breakdowns.
For sessions using the SASS Metrics engine, each scope also displays divergence data from the previous post.

System tab

Time‑series charts for:
- GPU utilization %
- VRAM usage
- Temperature
- Power draw
- Fan speed

Session Selector

The selector lets you browse by host and application name, making it easy to compare runs across machines or over time.

Next Post: Local Setup Guide

A Docker Compose file will be provided that brings up the full stack—client example, agent, backend, and frontend—on a single machine. This will let you try GPU Flight without any manual configuration.

What’s coming next?

Cloud demo – A live deployment is in progress. Once available, a link will be posted so you can explore real GPU profiling data in the UI without setting up anything yourself.

Open‑Source

GPU Flight is open source.

GPU Flight — System Architecture

System Diagram

In a Nutshell

The Client

What the client can capture

Log files

The Agent

Why the split?

Agent responsibilities

Configuration examples

HTTP vs. Kafka

The Backend

Event types

Authentication

The Frontend

Recap

GPU Flight Overview

Main Dashboard

Kernels tab

Scopes tab

System tab

Session Selector

Next Post: Local Setup Guide

What’s coming next?

Open‑Source

Related posts

I Built a Real-Time Link Shortener With Analytics Using React, Go, and PostgreSQL

React Hooks vs Vue Composables: Complete Comparison for 2026

TLDR; ReactJS project initialization with Tailwind CSS and Shadcn using Vite

TypeScript Type Guards for Discriminated Unions (Best Practices for Scalable Code)

System Diagram

In a Nutshell

The Client

What the client can capture

Log files

The Agent

Why the split?

Agent responsibilities

Configuration examples

HTTP vs. Kafka

The Backend

Event types

Authentication

The Frontend

Recap

GPU Flight Overview

Main Dashboard

Kernels tab

Scopes tab

System tab

Session Selector

Next Post: Local Setup Guide

What’s coming next?

Open‑Source

Related posts

I Built a Real-Time Link Shortener With Analytics Using React, Go, and PostgreSQL

React Hooks vs Vue Composables: Complete Comparison for 2026

TLDR; ReactJS project initialization with Tailwind CSS and Shadcn using Vite

TypeScript Type Guards for Discriminated Unions (Best Practices for Scalable Code)

GPU Flight Overview