Using Arenas for Easier Object Management

Published: (February 18, 2026 at 09:47 PM EST)
7 min read
Source: Dev.to

Source: Dev.to

Introduction

When I started updating Cdecl, one of the first things I did was, as I then described, to use an abstract syntax tree (AST) data structure to represent a declaration.

During construction of the AST, placeholder nodes sometimes need to be created until the final type is able to be determined. For example, consider the declaration:

int a[2][3];

When parsing, upon encountering the first [, we know it’s an array 2 of “something” of int, but we don’t yet know either what the “something” is or whether it will turn out to be nothing. It’s not until the second [ that we know it’s an array 2 of array 3 of int. (Had the [3] not been there, then it would have been just array 2 of int.)

Once the final type is known, the placeholder node can be replaced by a new node of the correct type.

You might think all that needs to be done is to call free on the placeholder node. You could, but then you’d have to ensure that there are no other pointers pointing to it, not only in the tree, but in local variables in stack frames of functions on the call stack lest you get a dangling pointer. Simpler would be to use arenas.


Arenas

An arena typically is a large chunk of dynamically allocated memory from which actual objects of interest are subsequently allocated using a custom allocator. There are many ways to implement arenas depending upon the answers to the following questions:

  • Will all objects that will be allocated from it be of the same type (or at least size) or different types (or sizes)?
  • Will the maximum number of objects that will be allocated from it be known in advance?
  • If not, will the arena need to grow as needed?
  • Will individual objects need to be deallocated from the arena?
  • Does it need to be thread‑safe? (If your particular program doesn’t use threads, then the answer is “no.”)

The only thing all arena implementations have in common is that freeing the arena frees all objects in it — which is one of the main reasons for using an arena in the first place.


An Arena for Cdecl

In the case of Cdecl, the answers to the previous questions are:

  • Yes. (All objects are AST nodes.)
  • No. (We won’t know the maximum number of objects.)
  • Yes. (The arena will need to grow as needed.)
  • No. (Individual objects will not need deallocation.)
  • No. (It does not need to be thread‑safe.)

Given those answers, a simple singly‑linked list is sufficient:

typedef struct slist slist;
struct slist {
    slist *next;
    void  *data;
};

void slist_push_front(slist **phead, void *data) {
    slist *const new = malloc(sizeof *new);
    assert(new != NULL);
    *new = (slist){ .next = *phead, .data = data };
    *phead = new;
}

Putting AST nodes into the arena

c_ast_t c_ast_new(/* params */, slist **parena) {
    c_ast_t *const ast = malloc(sizeof *ast);
    /* ... */
    slist_push_front(parena, ast);
    return ast;
}

Freeing all AST nodes at once

void slist_cleanup(slist **phead,
                   void (*free_fn)(void *)) {
    slist *node = *phead, *next;
    for ( ; node != NULL; node = next ) {
        next = node->next;
        if (free_fn != NULL)
            (*free_fn)(node->data);
        free(node);
    }
    *phead = NULL;
}

As arenas go, a linked list is pretty simple—it might even be a stretch to call it an arena at all. In the case of Cdecl, it’s good enough—don’t over‑engineer things.


More Complex Arenas

A slightly more complex arena can be built on top of the linked list, where each node is a block of objects. For example, the first block could hold a maximum of, say, 8 objects. If you run out of space, you allocate more blocks. Each block can either be the same size or double the previous size up to a maximum size:

typedef struct arena arena;
struct arena {
    slist  *blocks;
    size_t  block_size_min, block_size_max;
    size_t  obj_size;
    size_t  avail;
};

void arena_init(arena *a,
                size_t block_size_min,
                size_t block_size_max,
                size_t obj_size) {
    *a = (arena){
        .block_size_min = block_size_min,
        .block_size_max = block_size_max,
        .obj_size       = obj_size
    };
}

Block definition

typedef struct arena_block arena_block;
struct arena_block {
    size_t size;
    alignas(max_align_t) char data[];
};

The flexible array member (data[]) provides storage for the objects.

Allocator

void *arena_alloc(arena *a) {
    if (a->avail == 0) {
        arena_block *block = arena_node2block(a->blocks);
        size_t const block_size = block == NULL ?
            a->block_size_min :
            block_size block_size_max / 2 ?
                block_size block_size_max;
        block = malloc(sizeof(arena_block) + block_size * a->obj_size);
        assert(block != NULL);
        a->avail = block->size = block_size;
        slist_push_front(&a->blocks, block->data);
    }
    return &a->blocks->data[--a->avail * a->obj_size];
}

If a->avail is 0, a new block must be allocated:

  • Ignoring arena_node2block for now, if block is NULL there are no existing blocks, so we start with a->block_size_min.
  • Otherwise we double the previous block size (capped at a->block_size_max).

The newly allocated block is pushed onto the list of blocks, and the allocator returns a pointer to the next free slot.


Allocation Logic

  • If there are no blocks, block_size will be block_size_min.
  • Otherwise, if doubling block_size won’t exceed block_size_max, then double it.
  • Otherwise, use block_size_max.

Then allocate memory for a block and block_size objects and push data — not block — onto the head of the linked list of blocks. This makes it simpler to calculate the address of any object in a block as it is done on the return line.

The size is “hidden” in memory before the object pointed to by slist’s data. This trick is commonly used in allocators; it’s typically how malloc stores the size of a chunk of memory so that it can later be recovered.

Getting the Block Size

Given an slist* for a block, you can obtain the block’s size with arena_node2block:

static inline arena_block* arena_node2block( slist *node ) {
    return node == NULL ? NULL :
           (arena_block*)(node->data - offsetof(arena_block, data));
}

The expression node->data, being an array, “decays” into a pointer to its first element. In memory, size (as mentioned) is stored before that. You might think it’s just sizeof(size_t) bytes before node->data — and it could be. The problem is that there may be padding between size and data. To subtract the correct number of bytes that includes any padding, the standard macro offsetof can be used. In this case, it returns the offset of data within the structure, which includes any padding.

Deallocator Implementation

void arena_cleanup( arena *a, void (*free_fn)(void*) ) {
    for ( slist *node = a->blocks, *next_node;
          node != NULL; node = next_node ) {
        arena_block *const block = arena_node2block( node );
        if ( free_fn != NULL ) {
            for ( size_t i = a->avail; i size; ++i )
                (*free_fn)( &block->data[ i * a->obj_size ] );
            a->avail = 0;
        }
        free( block );
        next_node = node->next;
        free( node );
    }
    a->blocks = NULL;
}

The code is fairly straightforward. The only thing to remember is that the first block may not be full, so you should call free_fn only for the actual objects. Hence, we start iterating at a->avail rather than 0; then set a->avail to 0 for the remaining blocks that are all full.

Conclusion

Using arenas for object management can simplify memory management when an entire collection of objects is deallocated together. Arenas can also be more performant because they require fewer calls to malloc and free. There are many possible arena implementations beyond the two presented here—keep arenas in mind for your next project.

0 views
Back to Blog

Related posts

Read more »

Apex B. OpenClaw, Local Embeddings.

Local Embeddings para Private Memory Search Por default, el memory search de OpenClaw envía texto a un embedding API externo típicamente Anthropic u OpenAI par...