GPU Compute Shaders in Pure Go: gogpu/gg v0.15.0
Source: Dev.to
GPU Compute Shaders in Pure Go: gogpu/gg v0.15.0
Two days ago we shipped gogpu/gg v0.14.0 with alpha masks and a fluent PathBuilder. Yes—two days, we’re moving fast.
Looking at performance profiles, we saw a problem:
The CPU was the bottleneck.
Our GPU could render millions of pixels in milliseconds, but the CPU spent a lot of time tessellating paths. This is the classic 2‑D graphics problem: CPU tessellation doesn’t scale.
So we moved the entire rasterisation pipeline to GPU compute shaders.
Today, gogpu/gg v0.15.0 is here: 2 280 lines of WGSL compute shaders, vello‑style pipeline, dramatic speed‑ups for complex scenes. All in Pure Go.
The Performance Challenge
// Drawing 10 000 circles
ctx := gg.NewContext(800, 600)
for i := 0; i
) {
let curve = curves[id.x];
// Adaptive subdivision based on curvature
let segments = subdivide_bezier(curve);
// Write to global buffer (thousands in parallel!)
for (var i = 0u; i
) {
let segment = segments[id.x];
let bounds = segment_bounds(segment);
// Find overlapping tiles
for (var y = tile_min.y; y
) {
let tile_id = (pixel.y / TILE_SIZE) * tile_width + (pixel.x / TILE_SIZE);
var coverage = 0.0;
for (var i = 0u; i (color.rgb, saturate(coverage));
}
Perfect anti‑aliasing at any scale—no jaggies, no MSAA overhead.
Expected Performance Gains
| Workload | Expected Behaviour |
|----------|---------------------|
| Simple paths (= h.segmentThreshold { |
| // GPU path: dispatch compute shaders |
| h.gpu.Rasterize(coarse, segments, backdrop, scene.FillNonZero) |
| } else { |
| // CPU path: software rasterization |
| h.cpu.RasterizeSegments(segments, backdrop) |
| } |
if h.segmentCount < h.segmentThreshold {
// GPU path: dispatch compute shaders
h.gpu.Rasterize(coarse, segments, backdrop, scene.FillNonZero)
} else {
// CPU path: software rasterization
h.cpu.RasterizeSegments(segments, backdrop)
}
Why? Small paths (andatomic` are supported.
- Specific memory orders are required.
- Buffer layout matters.
// This works
@group(0) @binding(1) var counts: array>;
atomicAdd(&counts[i], 1u);
// This doesn't
var counts: array; // Not atomic!
Debugging was tricky – WGSL validation errors are… cryptic.
What We Shipped
Statistics
- 2 280 LOC WGSL shaders (8 shader files)
- ~20 K LOC Go in
backend/wgpu/ - 74 % test coverage overall
- 0 linter issues
Shader Files
backend/wgpu/shaders/
├── flatten.wgsl # 589 LOC — Bezier curve flattening
├── coarse.wgsl # 335 LOC — Tile binning with atomics
├── fine.wgsl # 290 LOC — Per‑pixel coverage
├── blend.wgsl # 424 LOC — 29 blend modes on GPU
├── composite.wgsl # 235 LOC — Layer compositing
├── strip.wgsl # 155 LOC — Sparse strip rendering
├── blit.wgsl # 43 LOC — Final output blit
└── msdf_text.wgsl # 209 LOC — MSDF text rendering
Go Implementation
backend/wgpu/
├── gpu_flatten.go # 809 LOC — Flatten pipeline
├── gpu_coarse.go # 698 LOC — Coarse rasterization
├── gpu_fine.go # 752 LOC — Fine rasterization
├── sparse_strips_gpu.go # 837 LOC — Hybrid CPU/GPU selection
├── renderer.go # 822 LOC — Main renderer
├── pipeline.go # 369 LOC — Pipeline orchestration
├── memory.go # 413 LOC — GPU memory management
└── ... (40+ files total)
Try It Yourself
Installation
go get github.com/gogpu/gg@v0.15.0
Quick Example
package main
import "github.com/gogpu/gg"
func main() {
ctx := gg.NewContext(512, 512)
ctx.ClearWithColor(gg.White)
// 1 000 circles — GPU backend handles complex scenes efficiently
ctx.SetColor(gg.Hex("#e74c3c"))
for i := 0; i
}
- Release: v0.15.0
- GoGPU Organization:
- Discussions: Join the conversation
From CPU bottleneck to GPU parallelism. From sequential tessellation to massively parallel compute shaders.
This is what Pure Go can do.
go get github.com/gogpu/gg@v0.15.0
⭐ Star the repo if you find it useful!
Part of the GoGPU Journey series
- GoGPU: A Pure Go Graphics Library for GPU Programming
- From Idea to 100K Lines in Two Weeks
- Building a Shader Compiler in Pure Go
- Introducing gogpu/gg v0.14.0
GPU Compute Shaders in Pure Go ← You are here