Allocating on the Stack

Published: (February 26, 2026 at 07:00 PM EST)
7 min read
Source: Go Blog

Source: Go Blog

The Go Blog

Allocating on the Stack

We’re always looking for ways to make Go programs faster. In the last two releases we have concentrated on mitigating a particular source of slowness: heap allocations.

Each time a Go program allocates memory from the heap, a fairly large chunk of code must run to satisfy that allocation. In addition, heap allocations add load to the garbage collector. Even with recent enhancements like Green Tea, the garbage collector still incurs substantial overhead.

So we’ve been working on ways to do more allocations on the stack instead of the heap. Stack allocations are considerably cheaper to perform (sometimes completely free). Moreover, they present no load to the garbage collector, because stack allocations can be reclaimed automatically together with the stack frame itself. Stack allocations also enable prompt reuse, which is very cache‑friendly.


Stack allocation of constant‑sized slices

Consider the task of building a slice of tasks to process:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

What happens at runtime?

  1. First iteration – there is no backing store for tasks, so append must allocate one. Because it doesn’t know how big the slice will eventually be, it can’t be too aggressive. Currently it allocates a backing store of size 1.
  2. Second iteration – the backing store exists but is full. append allocates a new backing store of size 2. The old store (size 1) becomes garbage.
  3. Third iteration – the backing store of size 2 is full. append allocates a new backing store of size 4. The old store (size 2) becomes garbage.
  4. Fourth iteration – the backing store of size 4 has only three items, so append can place the new item in the existing store and bump the slice length. No allocator call for this iteration.
  5. Fifth iteration – the backing store of size 4 is full again; append allocates a new store of size 8.

…and so on. We generally double the size of the allocation each time it fills up, so most subsequent appends avoid allocation. However, there is a fair amount of overhead in the “startup” phase when the slice is small. During this phase we spend a lot of time in the allocator and produce a bunch of garbage, which seems wasteful—especially if the slice never grows large.

A common optimisation

If this code is a hot part of your program, you might start the slice at a larger size to avoid those early allocations:

func process2(c chan task) {
    tasks := make([]task, 0, 10) // probably at most 10 tasks
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}
  • This optimisation is never incorrect; the program still runs correctly.
  • If the guess is too small, you get allocations from append as before.
  • If the guess is too large, you waste a bit of memory.

When the guess is good, there is only one allocation site: the make call allocates a slice backing store of the correct size, and append never needs to reallocate.

The surprising result

Benchmarking this code with 10 elements in the channel shows zero heap allocations. The compiler decides to allocate the backing store on the stack because it knows the exact size (10 × sizeof(task)). The store lives in the stack frame of process2 instead of on the heap¹. This works as long as the backing store does not escape to the heap inside processAll.


Stack allocation of variable‑sized slices

Hard‑coding a size guess is a bit rigid. We can let the caller provide an estimate:

func process3(c chan task, lengthGuess int) {
    tasks := make([]task, 0, lengthGuess)
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

The caller picks a good size for the slice, which may vary depending on where the function is called.

Unfortunately, in Go 1.24 the non‑constant size of the backing store prevents the compiler from allocating it on the stack. The slice ends up on the heap, turning our “zero‑allocation” code into a one‑allocation version. It’s still better than the many intermediate allocations, but not ideal.


Go 1.25: automatic small‑slice stack allocation

In Go 1.25 the compiler performs the transformation for us. For certain slice allocation sites, the compiler:

  1. Allocates a small (currently 32‑byte) slice backing store on the stack.
  2. Uses that backing store for the result of make if the requested size fits.
  3. Falls back to a normal heap allocation when the size exceeds the threshold.

Thus we can write the simple version (process3) and still get zero heap allocations when lengthGuess is small enough to fit into 32 bytes.

Example of a manual workaround (pre‑Go 1.25)

Before Go 1.25 you could write:

func process4(c chan task, lengthGuess int) {
    var tasks []task
    if lengthGuess  **But do you really want to write all that additional code?**  
> It seems error‑prone. Maybe the compiler can do this transformation for us?

In Go 1.26, it can!

#### Compiler‑generated transformation for escaping slices

```go
func extract3(c chan task) []task {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    tasks = runtime.move2heap(tasks)
    return tasks
}
  • runtime.move2heap is a special compiler‑runtime function that is the identity for slices already on the heap.
  • For slices on the stack, it allocates a new slice on the heap, copies the stack‑allocated slice to the heap copy, and returns the heap copy.

This ensures that for our original extract code:

  • If the number of items fits in the small stack‑allocated buffer, we perform exactly one allocation of exactly the right size.
  • If the number of items exceeds that capacity, we fall back to the normal doubling‑allocation strategy once the stack buffer overflows.

The optimization in Go 1.26 is actually better than the hand‑optimized version because it does not require the extra allocation + copy that the hand‑optimized code always does at the end. The extra copy is performed only when we have operated exclusively on a stack‑backed slice up to the return point. The cost of that copy is almost completely offset by the copies we no longer need in the startup phase (in fact, the new scheme may copy at most one more element than the old scheme).


Wrapping Up

  • Hand optimisation can still be beneficial, especially if you have a good estimate of the slice size ahead of time.
  • Hopefully the compiler now catches many of the simple cases for you, letting you focus on the remaining ones that really matter.

There are a lot of details the compiler must get right to make these optimizations safe. If you think one of them is causing correctness or (negative) performance issues for you, you can turn them off with:

-go build -gcflags=all=-d=variablemakehash=n

If disabling the optimisations helps, please file an issue so we can investigate.


Footnotes

  1. Go stacks do not have any alloca‑style mechanism for dynamically‑sized stack frames. All Go stack frames are constant‑sized.

Previous article: Using go fix to modernize Go code
Blog Index

0 views
Back to Blog

Related posts

Read more »