Pointers and Tuning and Loops! Oh My!

Published: (June 18, 2026 at 05:34 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

Introduction

While all code should be efficient, code for library-like components, especially involving loops, should be as efficient as possible since such code is often widely used. In my A Simple Dynamic Array for C, I included the source code for a function to clean-up a dynamic array: void array_cleanup( array_t *array, array_free_fn_t free_fn ) { if ( array == NULL ) return; if ( free_fn != NULL ) { char *element = array->elements; for ( size_t i = 0; i len; ++i ) { (*free_fn)( element ); element += array->esize; } } free( array->elements ); array_init( array, array->esize ); }

While this code is correct and good enough for pedagogical purposes, it’s not optimal. Before I tell you why, see if you can figure out why. Hint: it has to do with the use of both the array pointer and the function call in the loop. For those who might not get the reference in this article’s title, it’s a play on a line from The Wizard of Oz. The original line was “Lions and tigers and bears! Oh, my!” In C (and languages inspired by C including C++, C#, Go, and Java), for loop conditions are reevaluated on every loop iteration. For example, in: for ( int i = 0; i len; ++i ) { (*free_fn)( element ); element += array->esize; }

That gets compiled by clang into this x86-64 assembly: .LBB0_4: mov rdi, r15 call r14 ; (*free_fn)( element ) add r15, qword ptr [rbx + 8] ; element += array->esize inc r12 ; ++i cmp r12, qword ptr [rbx + 16]; i len jb .LBB0_4

The things to notice are the qword ptr lines which means the code has to dereference array (which means read from memory) twice on every iteration. That’s slow. The problem is that the compiler can’t know for sure that the array object pointed to by array isn’t modified by the function pointed to by free_fn. For all the compiler knows, the function has access to a global copy of the pointer and could modify the array. Compilers have to play it safe. If there were no function call in the loop, then the compiler could safely hoist both len and esize outside the loop as if the code were: size_t const esize = array->esize; size_t const len = array->len; for ( size_t i = 0; i esize dec r13 ; —i jne .LBB0_4 ; i > 0

The compiler has eliminated array->len dereference and is now counting down to zero. But why didn’t the compiler optimize the remaining dereference (qword ptr) for esize? First, similar to inline, restrict can be ignored by the compiler. Second, every CPU has only finite resources, most notably registers. In this case, the compiler presumably thought that the overall performance would still be better even if esize were not put into a register. If it were to be put into a register, then the compiler would have to ensure that it’s value is the same after free_fn is called as before — and presumably that was more costly than a dereference. As I noted in my restrict article, restrict isn’t yet part of standard C++; however, many compilers support restrict as an extension. Of course you can forget about restrict and just use local variables to cache the values as was done initially: size_t const esize = array->esize; size_t const len = array->len; for ( size_t i = 0; i < len; ++i ) { (*free_fn)( element ); element += esize; }

Then, despite the function call in the loop, the assembly becomes: .LBB0_4: mov rdi, rbx call r14 add rbx, r13 dec r15 jne .LBB0_4

with no dereferences. If you want to optimize code using pointers, functions, and loops, consider using either restrict or explicit caching. But you should always check the generated assembly to ensure your tuning is having the desired effect. If you’ve never heard of the Compiler Explorer, aka, godbolt.org, it’s an extremly useful in-browser tool for compiling C or C++ code with virtually any compiler you’ve ever heard of, at any optimization level, and showing you a mapping from source lines to their corresponding assembly lines for most any CPU architecture. (Why is it called godbolt? Because its creator is Matt Godbolt.)

0 views
Back to Blog

Related posts

Read more »