使用 Arenas 实现更简易的对象管理

发布: 2个月前 (2026年2月19日 GMT+8 10:47)

8 分钟阅读

原文: Dev.to

Source: Dev.to

介绍

当我开始更新 Cdecl 时，我做的第一件事之一（正如我随后所描述的）是使用 抽象语法树（AST）数据结构来表示声明。

在构建 AST 的过程中，有时需要创建占位节点，直到能够确定最终类型。例如，考虑下面的声明：

int a[2][3];

在解析时，遇到第一个 [ 时，我们知道它是 “某种” int 的 数组 2，但还不知道“某种”到底是什么，也不知道它最终是否会变成空。直到第二个 [，我们才知道它是 数组 2 的数组 3 int。（如果没有 [3]，则仅是 数组 2 int。）

一旦确定了最终类型，就可以用正确类型的新节点替换占位节点。

你可能会认为只需要对占位节点调用 free 即可。可以这么做，但那样你必须确保没有其他指针指向它——不仅是树中的指针，还包括调用栈上函数的局部变量指针，否则会出现悬空指针。更简单的做法是使用 arena。

竞技场

一个 arena 通常是一个大型的动态分配内存块，实际感兴趣的对象随后使用自定义分配器从中分配。实现 arena 的方式有很多，取决于以下问题的答案：

将要从中分配的所有对象是否都是相同类型（或至少相同大小），还是不同类型（或不同大小）？
是否已知将从中分配的对象的最大数量？
如果不知道，arena 是否需要在需要时增长？
是否需要从 arena 中单独释放对象？
是否需要线程安全？（如果你的程序不使用线程，那么答案就是 “否”。）

所有 arena 实现唯一的共同点是 释放 arena 会同时释放其中的所有对象——这也是使用 arena 的主要原因之一。

Cdecl 的 Arena

在 Cdecl 的情况下，前面问题的答案是：

是。（所有对象都是 AST 节点。）
否。（我们无法知道对象的最大数量。）
是。（需要时 arena 会增长。）
否。（单个对象不需要单独释放。）
否。（它不需要是线程安全的。）

根据这些答案，一个简单的单向链表就足够了：

typedef struct slist slist;
struct slist {
    slist *next;
    void  *data;
};

void slist_push_front(slist **phead, void *data) {
    slist *const new = malloc(sizeof *new);
    assert(new != NULL);
    *new = (slist){ .next = *phead, .data = data };
    *phead = new;
}

将 AST 节点放入 arena

c_ast_t c_ast_new(/* params */, slist **parena) {
    c_ast_t *const ast = malloc(sizeof *ast);
    /* ... */
    slist_push_front(parena, ast);
    return ast;
}

一次性释放所有 AST 节点

void slist_cleanup(slist **phead,
                   void (*free_fn)(void *)) {
    slist *node = *phead, *next;
    for ( ; node != NULL; node = next ) {
        next = node->next;
        if (free_fn != NULL)
            (*free_fn)(node->data);
        free(node);
    }
    *phead = NULL;
}

作为 arena，链表相当简单——甚至可以说它不算是一个 arena。对于 Cdecl 来说，这已经足够——不要把事情弄得过于复杂。

更复杂的 Arena

在链表之上可以构建稍微复杂一点的 arena，其中每个节点是一个块（block）对象。例如，第一个块可以容纳最多 8 个对象。如果空间用完，就分配更多块。每个块可以大小相同，也可以是前一个块大小的两倍，直至达到最大块大小：

typedef struct arena arena;
struct arena {
    slist  *blocks;
    size_t  block_size_min, block_size_max;
    size_t  obj_size;
    size_t  avail;
};

void arena_init(arena *a,
                size_t block_size_min,
                size_t block_size_max,
                size_t obj_size) {
    *a = (arena){
        .block_size_min = block_size_min,
        .block_size_max = block_size_max,
        .obj_size       = obj_size
    };
}

块的定义

typedef struct arena_block arena_block;
struct arena_block {
    size_t size;
    alignas(max_align_t) char data[];
};

灵活数组成员 (data[]) 为对象提供存储空间。

分配器

void *arena_alloc(arena *a) {
    if (a->avail == 0) {
        arena_block *block = arena_node2block(a->blocks);
        size_t const block_size = block == NULL ?
            a->block_size_min :
            block_size block_size_max / 2 ?
                block_size block_size_max;
        block = malloc(sizeof(arena_block) + block_size * a->obj_size);
        assert(block != NULL);
        a->avail = block->size = block_size;
        slist_push_front(&a->blocks, block->data);
    }
    return &a->blocks->data[--a->avail * a->obj_size];
}

如果 a->avail 为 0，则必须分配一个新块：

暂时忽略 arena_node2block，如果 block 为 NULL，说明没有现有块，因此从 a->block_size_min 开始。
否则将前一个块的大小翻倍（上限为 a->block_size_max）。

新分配的块会被压入块列表，分配器返回指向下一个空闲槽位的指针。

Allocation Logic

If there are no blocks, block_size will be block_size_min.
Otherwise, if doubling block_size won’t exceed block_size_max, then double it.
Otherwise, use block_size_max.

Then allocate memory for a block and block_size objects and push data — not block — onto the head of the linked list of blocks. This makes it simpler to calculate the address of any object in a block as it is done on the return line.

The size is “hidden” in memory before the object pointed to by slist’s data. This trick is commonly used in allocators; it’s typically how malloc stores the size of a chunk of memory so that it can later be recovered.

Getting the Block Size

Given an slist* for a block, you can obtain the block’s size with arena_node2block:

static inline arena_block* arena_node2block( slist *node ) {
    return node == NULL ? NULL :
           (arena_block*)(node->data - offsetof(arena_block, data));
}

The expression node->data, being an array, “decays” into a pointer to its first element. In memory, size (as mentioned) is stored before that. You might think it’s just sizeof(size_t) bytes before node->data — and it could be. The problem is that there may be padding between size and data. To subtract the correct number of bytes that includes any padding, the standard macro offsetof can be used. In this case, it returns the offset of data within the structure, which includes any padding.

Deallocator Implementation

void arena_cleanup( arena *a, void (*free_fn)(void*) ) {
    for ( slist *node = a->blocks, *next_node;
          node != NULL; node = next_node ) {
        arena_block *const block = arena_node2block( node );
        if ( free_fn != NULL ) {
            for ( size_t i = a->avail; i size; ++i )
                (*free_fn)( &block->data[ i * a->obj_size ] );
            a->avail = 0;
        }
        free( block );
        next_node = node->next;
        free( node );
    }
    a->blocks = NULL;
}

The code is fairly straightforward. The only thing to remember is that the first block may not be full, so you should call free_fn only for the actual objects. Hence, we start iterating at a->avail rather than 0; then set a->avail to 0 for the remaining blocks that are all full.

结论

在对象管理中使用 arena 可以在一次性释放整个对象集合时简化内存管理。arena 还可能更高效，因为它们需要更少的 malloc 和 free 调用。除了这里展示的两种实现之外，还有许多可能的 arena 实现——在你的下一个项目中请记得考虑 arena。

使用 Arenas 实现更简易的对象管理

介绍

竞技场

Cdecl 的 Arena

将 AST 节点放入 arena

一次性释放所有 AST 节点

更复杂的 Arena

块的定义

分配器

Allocation Logic

Getting the Block Size

Deallocator Implementation

结论

相关文章

GSOC 轨迹：从第一行代码到最终提交

趁热打铁

用于构建自主 AI 队友的 Python SDK

使用 hooks 定制 Gemini CLI 以适配你的工作流