Multi-dimensional Arrays & Row-major Order: A Deep Dive

Published: 3 weeks ago (January 12, 2026 at 12:54 AM EST)

5 min read

Source: Dev.to

Multi‑Dimensional Arrays and Row‑Major Ordering

Conceptual View

int matrix[3][4];

Think of it as a 2‑D grid:

[0][0]  [0][1]  [0][2]  [0][3]
[1][0]  [1][1]  [1][2]  [1][3]
[2][0]  [2][1]  [2][2]  [2][3]

Physical View

Memory is linear (one‑dimensional). The CPU can only address memory sequentially, so the compiler maps a multi‑dimensional array onto a contiguous block of memory.

Row‑major order stores elements row by row; the right‑most index changes fastest.

Memory Address	Value	Logical Position
0x1000	matrix[0][0]
0x1004	matrix[0][1]	Row 0 contiguously
0x1008	matrix[0][2]
0x100C	matrix[0][3]
0x1010	matrix[1][0]	Row 1
0x1014	matrix[1][1]
0x1018	matrix[1][2]
0x101C	matrix[1][3]
0x1020	matrix[2][0]	Row 2
0x1024	matrix[2][1]
0x1028	matrix[2][2]
0x102C	matrix[2][3]

Address Calculation

For an array type array[ROWS][COLS]:

address = base_address + (i * COLS + j) * sizeof(type)

i * COLS – skip i whole rows.
+ j – move j elements within the current row.
* sizeof(type) – scale to bytes.

Example – int matrix[3][4] at 0x1000, element matrix[2][3]:

address = 0x1000 + (2 * 4 + 3) * 4
        = 0x1000 + 11 * 4
        = 0x102C

Three‑Dimensional Arrays

For type array[D1][D2][D3]:

address = base + (i * D2 * D3 + j * D3 + k) * sizeof(type)

i * D2 * D3 – skip i whole “planes”.
j * D3 – skip j rows inside the current plane.
k – position inside the row.

General N‑Dimensional Formula

For array[i1][i2]...[iN] with dimensions [D1][D2]...[DN]:

offset = i1 * (D2*D3*...*DN) +
         i2 * (D3*D4*...*DN) +
         ...
         i(N‑1) * DN +
         iN
address = base + offset * sizeof(type)

Compiler Translation Example

int arr[3][4][5];
int x = arr[1][2][3];   // becomes
int x = *(arr + (1*4*5 + 2*5 + 3));

Decay to Pointers

Expression	Type	Meaning
`arr`	`int (*)[4]`	pointer to an array of 4 `int`s
`arr[i]`	`int *`	pointer to the first element of row i
`arr[i][j]`	`int`	actual integer value

arr + 1 → advances by one whole row (4 ints = 16 bytes).
*(arr + 1) + 2 → points to arr[1][2].

Assembly (x86‑64) for `arr[i][j]`

; Assume:
;   RAX = base address of arr
;   RBX = i
;   RCX = j
;   COLS = compile‑time constant

mov    rdx, rbx           ; rdx = i
imul   rdx, COLS          ; rdx = i * COLS
add    rdx, rcx           ; rdx = i * COLS + j
shl    rdx, 2             ; rdx = (i*COLS + j) * 4   ; sizeof(int)=4
add    rdx, rax           ; rdx = base + offset
mov    eax, [rdx]         ; load arr[i][j]

Modern compilers often collapse the arithmetic with LEA:

lea    rdx, [rbx + rbx*4]   ; rdx = i * 5   (if COLS = 5)
lea    rdx, [rdx*4 + rax]   ; combine scaling and base
mov    eax, [rdx + rcx*4]   ; load arr[i][j]

Row‑Major vs. Column‑Major Traversal

Row‑major (optimal in C):

int main(void) {
    int arr[2][3] = {
        {1, 2, 3},
        {4, 5, 6}
    };

    printf("Base address: %p\n", (void *)arr);

    for (int i = 0; i < 2; ++i) {
        for (int j = 0; j < 3; ++j) {
            printf("arr[%d][%d] = %d at address %p\n",
                   i, j, arr[i][j], (void *)&arr[i][j]);
        }
    }

    /* Treat as a flat 1‑D array */
    int *flat = (int *)arr;
    printf("\nAs flat array:\n");
    for (int k = 0; k < 6; ++k) {
        printf("flat[%d] = %d at address %p\n",
               k, flat[k], (void *)&flat[k]);
    }

    return 0;
}

Sample output

Base address: 0x7ffc8b3e1a40
arr[0][0] = 1 at address 0x7ffc8b3e1a40
arr[0][1] = 2 at address 0x7ffc8b3e1a44
arr[0][2] = 3 at address 0x7ffc8b3e1a48
arr[1][0] = 4 at address 0x7ffc8b3e1a4c
arr[1][1] = 5 at address 0x7ffc8b3e1a50
arr[1][2] = 6 at address 0x7ffc8b3e1a54

As flat array:
flat[0] = 1 at address 0x7ffc8b3e1a40
flat[1] = 2 at address 0x7ffc8b3e1a44
flat[2] = 3 at address 0x7ffc8b3e1a48
flat[3] = 4 at address 0x7ffc8b3e1a4c
flat[4] = 5 at address 0x7ffc8b3e1a50
flat[5] = 6 at address 0x7ffc8b3e1a54

Notice the addresses increase by sizeof(int) (4 bytes) sequentially, confirming row‑major layout.

One‑Dimensional Access Shortcut

int rows = 3, cols = 4;
int *arr = malloc(rows * cols * sizeof(int));

/* Access arr[i][j] as a flat array */
int value = arr[i * cols + j];

Dynamic 2‑D Array Allocation

/* Separate rows – each row is its own allocation */
int **arr = malloc(rows * sizeof(int *));
for (int i = 0; i < rows; i++) {
    arr[i] = malloc(cols * sizeof(int));
}

/* Access naturally: arr[i][j] */

/* Contiguous block – one allocation for the whole matrix */
int **arr = malloc(rows * sizeof(int *));
arr[0] = malloc(rows * cols * sizeof(int));   /* one big chunk */
for (int i = 1; i < rows; i++) {
    arr[i] = arr[0] + i * cols;                /* point to the start of each row */
}

/* Cleanup */
free(arr[0]);   /* free the big data block */
free(arr);      /* free the array of row pointers */

Memory Layout & Addressing

Multi‑dimensional arrays are linearized in memory.
Row‑major order (the default in C) means the rightmost index varies fastest.

Address of arr[i][j] is computed as

base_address + (i * COLS + j) * sizeof(element_type)

Cache, TLB, and Access Patterns

Modern CPUs have hierarchical caches (L1, L2, L3). Sequential access maximises cache‑hit rates.
Cache line example (64 bytes):
- int = 4 bytes → 16 ints per line.
- Accessing arr[0][0] brings arr[0][0] … arr[0][15] into cache.
The TLB caches virtual‑to‑physical address translations. Strided or random accesses can cause TLB thrashing.
CPUs prefetch upcoming cache lines when they detect sequential patterns. Traversing a matrix in row‑major order (inner loop over columns) lets the hardware prefetch efficiently.

Key Takeaways

Linearisation: Multi‑dimensional arrays are stored as a single contiguous block.
Row‑major order: Rightmost index changes fastest.
Address formula: base + (i * COLS + j) * sizeof(type).
Traversal matters: Iterate in the storage order for optimal performance.
True arrays vs. pointer‑to‑pointer: Different layouts → different trade‑offs (contiguity vs. flexible row sizes).
Compiler magic: arr[i][j] is automatically translated to pointer arithmetic.
Cache locality: Sequential accesses can be 10–100× faster than random accesses.
Understanding these low‑level details helps you:
- Write faster code,
- Debug memory‑related bugs,
- Interface cleanly with hardware or other languages.

Multi-dimensional Arrays & Row-major Order: A Deep Dive

Multi‑Dimensional Arrays and Row‑Major Ordering

Conceptual View

Physical View

Address Calculation

Three‑Dimensional Arrays

General N‑Dimensional Formula

Compiler Translation Example

Decay to Pointers

Assembly (x86‑64) for `arr[i][j]`

Row‑Major vs. Column‑Major Traversal

One‑Dimensional Access Shortcut

Dynamic 2‑D Array Allocation

Memory Layout & Addressing

Cache, TLB, and Access Patterns

Key Takeaways

Related posts

Day 1: The Array Lie (Arrays vs Pointers in C)

10 Secret Tips That Make You Better at DSA

An array is a row of boxes. Each box holds a value. Each box has a number.

🧱 Beginner-Friendly Guide 'Maximal Rectangle' – LeetCode 85 (C++, Python, JavaScript)

Multi‑Dimensional Arrays and Row‑Major Ordering

Conceptual View

Physical View

Address Calculation

Three‑Dimensional Arrays

General N‑Dimensional Formula

Compiler Translation Example

Decay to Pointers

Assembly (x86‑64) for arr[i][j]

Row‑Major vs. Column‑Major Traversal

One‑Dimensional Access Shortcut

Dynamic 2‑D Array Allocation

Memory Layout & Addressing

Cache, TLB, and Access Patterns

Key Takeaways

Related posts

Day 1: The Array Lie (Arrays vs Pointers in C)

10 Secret Tips That Make You Better at DSA

An array is a row of boxes. Each box holds a value. Each box has a number.

🧱 Beginner-Friendly Guide 'Maximal Rectangle' – LeetCode 85 (C++, Python, JavaScript)

Assembly (x86‑64) for `arr[i][j]`