Multi-dimensional Arrays & Row-major Order: A Deep Dive
Source: Dev.to
Multi‑Dimensional Arrays and Row‑Major Ordering
Conceptual View
int matrix[3][4];
Think of it as a 2‑D grid:
[0][0] [0][1] [0][2] [0][3]
[1][0] [1][1] [1][2] [1][3]
[2][0] [2][1] [2][2] [2][3]
Physical View
Memory is linear (one‑dimensional). The CPU can only address memory sequentially, so the compiler maps a multi‑dimensional array onto a contiguous block of memory.
Row‑major order stores elements row by row; the right‑most index changes fastest.
| Memory Address | Value | Logical Position |
|---|---|---|
| 0x1000 | matrix[0][0] | |
| 0x1004 | matrix[0][1] | Row 0 contiguously |
| 0x1008 | matrix[0][2] | |
| 0x100C | matrix[0][3] | |
| 0x1010 | matrix[1][0] | Row 1 |
| 0x1014 | matrix[1][1] | |
| 0x1018 | matrix[1][2] | |
| 0x101C | matrix[1][3] | |
| 0x1020 | matrix[2][0] | Row 2 |
| 0x1024 | matrix[2][1] | |
| 0x1028 | matrix[2][2] | |
| 0x102C | matrix[2][3] |
Address Calculation
For an array type array[ROWS][COLS]:
address = base_address + (i * COLS + j) * sizeof(type)
i * COLS– skipiwhole rows.+ j– movejelements within the current row.* sizeof(type)– scale to bytes.
Example – int matrix[3][4] at 0x1000, element matrix[2][3]:
address = 0x1000 + (2 * 4 + 3) * 4
= 0x1000 + 11 * 4
= 0x102C
Three‑Dimensional Arrays
For type array[D1][D2][D3]:
address = base + (i * D2 * D3 + j * D3 + k) * sizeof(type)
i * D2 * D3– skipiwhole “planes”.j * D3– skipjrows inside the current plane.k– position inside the row.
General N‑Dimensional Formula
For array[i1][i2]...[iN] with dimensions [D1][D2]...[DN]:
offset = i1 * (D2*D3*...*DN) +
i2 * (D3*D4*...*DN) +
...
i(N‑1) * DN +
iN
address = base + offset * sizeof(type)
Compiler Translation Example
int arr[3][4][5];
int x = arr[1][2][3]; // becomes
int x = *(arr + (1*4*5 + 2*5 + 3));
Decay to Pointers
| Expression | Type | Meaning |
|---|---|---|
arr | int (*)[4] | pointer to an array of 4 ints |
arr[i] | int * | pointer to the first element of row i |
arr[i][j] | int | actual integer value |
arr + 1→ advances by one whole row (4 ints = 16 bytes).*(arr + 1) + 2→ points toarr[1][2].
Assembly (x86‑64) for arr[i][j]
; Assume:
; RAX = base address of arr
; RBX = i
; RCX = j
; COLS = compile‑time constant
mov rdx, rbx ; rdx = i
imul rdx, COLS ; rdx = i * COLS
add rdx, rcx ; rdx = i * COLS + j
shl rdx, 2 ; rdx = (i*COLS + j) * 4 ; sizeof(int)=4
add rdx, rax ; rdx = base + offset
mov eax, [rdx] ; load arr[i][j]
Modern compilers often collapse the arithmetic with LEA:
lea rdx, [rbx + rbx*4] ; rdx = i * 5 (if COLS = 5)
lea rdx, [rdx*4 + rax] ; combine scaling and base
mov eax, [rdx + rcx*4] ; load arr[i][j]
Row‑Major vs. Column‑Major Traversal
Row‑major (optimal in C):
int main(void) {
int arr[2][3] = {
{1, 2, 3},
{4, 5, 6}
};
printf("Base address: %p\n", (void *)arr);
for (int i = 0; i < 2; ++i) {
for (int j = 0; j < 3; ++j) {
printf("arr[%d][%d] = %d at address %p\n",
i, j, arr[i][j], (void *)&arr[i][j]);
}
}
/* Treat as a flat 1‑D array */
int *flat = (int *)arr;
printf("\nAs flat array:\n");
for (int k = 0; k < 6; ++k) {
printf("flat[%d] = %d at address %p\n",
k, flat[k], (void *)&flat[k]);
}
return 0;
}
Sample output
Base address: 0x7ffc8b3e1a40
arr[0][0] = 1 at address 0x7ffc8b3e1a40
arr[0][1] = 2 at address 0x7ffc8b3e1a44
arr[0][2] = 3 at address 0x7ffc8b3e1a48
arr[1][0] = 4 at address 0x7ffc8b3e1a4c
arr[1][1] = 5 at address 0x7ffc8b3e1a50
arr[1][2] = 6 at address 0x7ffc8b3e1a54
As flat array:
flat[0] = 1 at address 0x7ffc8b3e1a40
flat[1] = 2 at address 0x7ffc8b3e1a44
flat[2] = 3 at address 0x7ffc8b3e1a48
flat[3] = 4 at address 0x7ffc8b3e1a4c
flat[4] = 5 at address 0x7ffc8b3e1a50
flat[5] = 6 at address 0x7ffc8b3e1a54
Notice the addresses increase by sizeof(int) (4 bytes) sequentially, confirming row‑major layout.
One‑Dimensional Access Shortcut
int rows = 3, cols = 4;
int *arr = malloc(rows * cols * sizeof(int));
/* Access arr[i][j] as a flat array */
int value = arr[i * cols + j];
Dynamic 2‑D Array Allocation
/* Separate rows – each row is its own allocation */
int **arr = malloc(rows * sizeof(int *));
for (int i = 0; i < rows; i++) {
arr[i] = malloc(cols * sizeof(int));
}
/* Access naturally: arr[i][j] */
/* Contiguous block – one allocation for the whole matrix */
int **arr = malloc(rows * sizeof(int *));
arr[0] = malloc(rows * cols * sizeof(int)); /* one big chunk */
for (int i = 1; i < rows; i++) {
arr[i] = arr[0] + i * cols; /* point to the start of each row */
}
/* Cleanup */
free(arr[0]); /* free the big data block */
free(arr); /* free the array of row pointers */
Memory Layout & Addressing
-
Multi‑dimensional arrays are linearized in memory.
-
Row‑major order (the default in C) means the rightmost index varies fastest.
-
Address of
arr[i][j]is computed asbase_address + (i * COLS + j) * sizeof(element_type)
Cache, TLB, and Access Patterns
- Modern CPUs have hierarchical caches (L1, L2, L3). Sequential access maximises cache‑hit rates.
- Cache line example (64 bytes):
int= 4 bytes → 16ints per line.- Accessing
arr[0][0]bringsarr[0][0] … arr[0][15]into cache.
- The TLB caches virtual‑to‑physical address translations. Strided or random accesses can cause TLB thrashing.
- CPUs prefetch upcoming cache lines when they detect sequential patterns. Traversing a matrix in row‑major order (inner loop over columns) lets the hardware prefetch efficiently.
Key Takeaways
- Linearisation: Multi‑dimensional arrays are stored as a single contiguous block.
- Row‑major order: Rightmost index changes fastest.
- Address formula:
base + (i * COLS + j) * sizeof(type). - Traversal matters: Iterate in the storage order for optimal performance.
- True arrays vs. pointer‑to‑pointer: Different layouts → different trade‑offs (contiguity vs. flexible row sizes).
- Compiler magic:
arr[i][j]is automatically translated to pointer arithmetic. - Cache locality: Sequential accesses can be 10–100× faster than random accesses.
- Understanding these low‑level details helps you:
- Write faster code,
- Debug memory‑related bugs,
- Interface cleanly with hardware or other languages.