GO profiling using pprof
Source: Dev.to
What is pprof?
pprof is Go’s built‑in profiling tool that allows you to collect and analyze runtime data from your application, such as CPU usage, memory allocations, goroutines, and blocking operations.
In simple terms, pprof answers questions like:
- Why is my application slow?
- Where is CPU time being spent?
- What is allocating so much memory?
- Are goroutines leaking?
Spoiler: pprof does not magically optimize your code. It just tells you where you messed up.
How pprof works
pprof works in two steps:
- Collect – gather metrics from your application at runtime.
- Analyze – examine the collected metrics using
go tool pprof.
At runtime, Go samples execution and records metrics. These samples are aggregated into profiles that can be visualized and explored with pprof commands. Under the hood, Go performs statistical profiling: the program is periodically interrupted and the runtime records what was running at that exact moment. Over time these snapshots form an accurate picture of resource usage.
A simple way to enable profiling in your app is to expose the built‑in HTTP server:

This exposes profiling endpoints under /debug/pprof/, such as:
/debug/pprof/profile– CPU/debug/pprof/heap– Memory- …and others.
Collecting profiles
When you hit a pprof endpoint like /debug/pprof/profile, the response is a binary protobuf (usually gzip‑compressed) containing raw sampled data, stack traces, counters, and timestamps. It is not a human‑readable report.
pprof is the tool that decodes this raw data, aggregates it, and presents it in a readable form.
Analyzing profiles
First, install the pprof CLI:
go install github.com/google/pprof@latest
You should see the binary installed in your $GOPATH/bin.
Next, with your web server running, fetch a CPU profile (10 seconds):
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=10
If everything is set up correctly, you’ll enter an interactive pprof shell where you can run commands such as top, list, web, etc. (see the full list of commands here).
The Big Bad Boy
Assume we have an endpoint that consumes a lot of CPU, e.g. /work.
Trigger some load:
curl http://localhost:8080/work
You can repeat this or use a load generator like hey or ab.
Collect a 10‑second CPU profile from the running server:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=10
Inside the pprof shell, display the top functions consuming CPU:
(pprof) top
Typical output:
Showing nodes accounting for 7470ms, 92.45% of 8080ms total
Dropped 2 nodes (cum <= 40.40ms)
Showing top 10 nodes out of 35
flat flat% sum% cum cum%
2030ms 25.12% 25.12% 2030ms 25.12% math.archExp
1570ms 19.43% 44.55% 1570ms 19.43% math.IsInf (inline)
1500ms 18.56% 63.12% 2830ms 35.02% math.log
560ms 6.93% 70.05% 5800ms 71.78% math.pow
430ms 5.32% 75.37% 1050ms 13.00% math.sin
330ms 4.08% 79.46% 330ms 4.08% runtime.pthread_cond_signal
320ms 3.96% 83.42% 1560ms 19.31% math.frexp
290ms 3.59% 87.00% 290ms 3.59% math.Float64frombits (inline)
220ms 2.72% 89.73% 220ms 2.72% math.IsNaN (inline)
220ms 2.72% 92.45% 230ms 2.85% math.normalize (inline)
To see cumulative costs, run:
(pprof) top -cum
Result (excerpt):
Showing nodes accounting for 2170ms, 26.86% of 8080ms total
Dropped 2 nodes (cum <= 40.40ms)
Showing top 10 nodes out of 35
flat flat% sum% cum cum%
110ms 1.36% 1.36% 7750ms 95.92% main.heavyComputation
0 0% 1.36% 7750ms 95.92% main.main.func1
0 0% 1.36% 7750ms 95.92% net/http.(*ServeMux).ServeHTTP
0 0% 1.36% 7750ms 95.92% net/http.(*conn).serve
0 0% 1.36% 7750ms 95.92% net/http.HandlerFunc.ServeHTTP
0 0% 1.36% 7750ms 95.92% net/http.serverHandler.ServeHTTP
0 0% 8.29% 2830ms 35.02% math.Log (inline)
The main.heavyComputation function is responsible for the majority of CPU usage – our “Big Bad Boy.”
Profiling other aspects
You can use the same approach for memory, block, mutex, and goroutine profiles.
Memory (heap)
go tool pprof http://localhost:6060/debug/pprof/heap
This will let you explore allocations, identify leaks, and see which functions are responsible for the most memory usage. Similar commands (top, list, web, etc.) apply to the heap profile as they do to CPU profiles.