使用 pprof 进行 Go 性能分析

发布: 4个月前 (2025年12月15日 GMT+8 10:07)

5 分钟阅读

原文: Dev.to

Source: Dev.to

什么是 pprof？

pprof 是 Go 内置的分析工具，能够收集并分析应用程序的运行时数据，例如 CPU 使用率、内存分配、goroutine 和阻塞操作。

简而言之，pprof 能回答以下问题：

为什么我的应用运行慢？
CPU 时间花在哪儿了？
哪些代码分配了大量内存？
goroutine 是否泄漏？

剧透：pprof 并不会神奇地优化代码，它只会告诉你 哪里写错了。

pprof 的工作原理

pprof 的工作分为两步：

收集 – 在运行时从应用程序获取指标。
分析 – 使用 go tool pprof 检查收集到的指标。

在运行时，Go 会对执行进行采样并记录指标。这些采样会聚合成可以通过 pprof 命令可视化和探索的 profile。底层实现是 统计采样：程序会定期被中断，运行时记录当时正在执行的代码。随着时间的推移，这些快照形成了资源使用的准确画像。

在应用中启用分析的最简单方式是暴露内置的 HTTP 服务器：

main.go

这会在 /debug/pprof/ 下提供分析端点，例如：

/debug/pprof/profile – CPU
/debug/pprof/heap – 内存
…以及其他端点。

收集 profile

当你访问像 /debug/pprof/profile 这样的 pprof 端点时，返回的是一个二进制 protobuf（通常经过 gzip 压缩），其中包含原始采样数据、堆栈跟踪、计数器和时间戳。它不是人类可读的报告。

pprof 是用来解码这些原始数据、聚合并以可读形式呈现的工具。

分析 profile

首先，安装 pprof CLI：

go install github.com/google/pprof@latest

你应该会在 $GOPATH/bin 中看到该二进制文件。

接下来，在你的 Web 服务器运行时，获取一次 CPU profile（10 秒）：

go tool pprof http://localhost:8080/debug/pprof/profile?seconds=10

如果一切配置正确，你将进入交互式的 pprof shell，在这里可以运行 top、list、web 等命令（完整命令列表请见这里）。

大坏蛋

假设我们有一个消耗大量 CPU 的端点，例如 /work。

触发一些负载：

curl http://localhost:8080/work

你可以重复此操作，或使用负载生成器如 hey 或 ab。

从运行中的服务器收集 10 秒的 CPU profile：

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=10

在 pprof shell 中，显示消耗 CPU 的前几名函数：

(pprof) top

典型输出：

Showing nodes accounting for 7470ms, 92.45% of 8080ms total
Dropped 2 nodes (cum <= 40.40ms)
Showing top 10 nodes out of 35
      flat  flat%   sum%        cum   cum%
    2030ms 25.12% 25.12%     2030ms 25.12%  math.archExp
    1570ms 19.43% 44.55%     1570ms 19.43%  math.IsInf (inline)
    1500ms 18.56% 63.12%     2830ms 35.02%  math.log
     560ms  6.93% 70.05%     5800ms 71.78%  math.pow
     430ms  5.32% 75.37%     1050ms 13.00%  math.sin
     330ms  4.08% 79.46%      330ms  4.08%  runtime.pthread_cond_signal
     320ms  3.96% 83.42%     1560ms 19.31%  math.frexp
     290ms  3.59% 87.00%      290ms  3.59%  math.Float64frombits (inline)
     220ms  2.72% 89.73%      220ms  2.72%  math.IsNaN (inline)
     220ms  2.72% 92.45%      230ms  2.85%  math.normalize (inline)

若想查看累计耗时，运行：

(pprof) top -cum

结果（摘录）：

Showing nodes accounting for 2170ms, 26.86% of 8080ms total
Dropped 2 nodes (cum <= 40.40ms)
Showing top 10 nodes out of 35
      flat  flat%   sum%        cum   cum%
     110ms  1.36%  1.36%     7750ms 95.92%  main.heavyComputation
         0     0%  1.36%     7750ms 95.92%  main.main.func1
         0     0%  1.36%     7750ms 95.92%  net/http.(*ServeMux).ServeHTTP
         0     0%  1.36%     7750ms 95.92%  net/http.(*conn).serve
         0     0%  1.36%     7750ms 95.92%  net/http.HandlerFunc.ServeHTTP
         0     0%  1.36%     7750ms 95.92%  net/http.serverHandler.ServeHTTP
         0     0%  8.29%     2830ms 35.02%  math.Log (inline)

main.heavyComputation 函数占用了大部分 CPU——这就是我们的 大坏蛋。

分析其他维度

同样的方法也适用于内存、阻塞、互斥和 goroutine 的 profile。

内存（heap）

go tool pprof http://localhost:6060/debug/pprof/heap

这将帮助你查看分配情况、定位泄漏，并找出哪些函数占用了最多的内存。对 heap profile 使用的命令（top、list、web 等）与 CPU profile 完全相同。

使用 pprof 进行 Go 性能分析

什么是 pprof？

pprof 的工作原理

收集 profile

分析 profile

大坏蛋

分析其他维度

内存（heap）

相关文章

Go 从零到深度 — 第3部分：栈 vs 堆与 Escape Analysis 实际工作原理

Go 服务器中的高性能 SQLite 读取

Go 切片：指针悖论——为何你的 Append 消失（了解切片修改何时持久，何时消失）

C 语言闭包的成本

什么是 pprof？

pprof 的工作原理

收集 profile

分析 profile

大坏蛋

分析其他维度

内存（heap）

相关文章

Go 从零到深度 — 第3部分：栈 vs 堆 与 Escape Analysis 实际工作原理

Go 服务器中的高性能 SQLite 读取

Go 切片：指针悖论——为何你的 Append 消失（了解切片修改何时持久，何时消失）

C 语言闭包的成本

Go 从零到深度 — 第3部分：栈 vs 堆与 Escape Analysis 实际工作原理