Multi-dimensional Arrays & Row-major Order: A Deep Dive

Published: (January 12, 2026 at 12:54 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

Multi‑Dimensional Arrays and Row‑Major Ordering

Conceptual View

int matrix[3][4];

Think of it as a 2‑D grid:

[0][0]  [0][1]  [0][2]  [0][3]
[1][0]  [1][1]  [1][2]  [1][3]
[2][0]  [2][1]  [2][2]  [2][3]

Physical View

Memory is linear (one‑dimensional). The CPU can only address memory sequentially, so the compiler maps a multi‑dimensional array onto a contiguous block of memory.

Row‑major order stores elements row by row; the right‑most index changes fastest.

Memory AddressValueLogical Position
0x1000matrix[0][0]
0x1004matrix[0][1]Row 0 contiguously
0x1008matrix[0][2]
0x100Cmatrix[0][3]
0x1010matrix[1][0]Row 1
0x1014matrix[1][1]
0x1018matrix[1][2]
0x101Cmatrix[1][3]
0x1020matrix[2][0]Row 2
0x1024matrix[2][1]
0x1028matrix[2][2]
0x102Cmatrix[2][3]

Address Calculation

For an array type array[ROWS][COLS]:

address = base_address + (i * COLS + j) * sizeof(type)
  • i * COLS – skip i whole rows.
  • + j – move j elements within the current row.
  • * sizeof(type) – scale to bytes.

Exampleint matrix[3][4] at 0x1000, element matrix[2][3]:

address = 0x1000 + (2 * 4 + 3) * 4
        = 0x1000 + 11 * 4
        = 0x102C

Three‑Dimensional Arrays

For type array[D1][D2][D3]:

address = base + (i * D2 * D3 + j * D3 + k) * sizeof(type)
  • i * D2 * D3 – skip i whole “planes”.
  • j * D3 – skip j rows inside the current plane.
  • k – position inside the row.

General N‑Dimensional Formula

For array[i1][i2]...[iN] with dimensions [D1][D2]...[DN]:

offset = i1 * (D2*D3*...*DN) +
         i2 * (D3*D4*...*DN) +
         ...
         i(N‑1) * DN +
         iN
address = base + offset * sizeof(type)

Compiler Translation Example

int arr[3][4][5];
int x = arr[1][2][3];   // becomes
int x = *(arr + (1*4*5 + 2*5 + 3));

Decay to Pointers

ExpressionTypeMeaning
arrint (*)[4]pointer to an array of 4 ints
arr[i]int *pointer to the first element of row i
arr[i][j]intactual integer value
  • arr + 1 → advances by one whole row (4 ints = 16 bytes).
  • *(arr + 1) + 2 → points to arr[1][2].

Assembly (x86‑64) for arr[i][j]

; Assume:
;   RAX = base address of arr
;   RBX = i
;   RCX = j
;   COLS = compile‑time constant

mov    rdx, rbx           ; rdx = i
imul   rdx, COLS          ; rdx = i * COLS
add    rdx, rcx           ; rdx = i * COLS + j
shl    rdx, 2             ; rdx = (i*COLS + j) * 4   ; sizeof(int)=4
add    rdx, rax           ; rdx = base + offset
mov    eax, [rdx]         ; load arr[i][j]

Modern compilers often collapse the arithmetic with LEA:

lea    rdx, [rbx + rbx*4]   ; rdx = i * 5   (if COLS = 5)
lea    rdx, [rdx*4 + rax]   ; combine scaling and base
mov    eax, [rdx + rcx*4]   ; load arr[i][j]

Row‑Major vs. Column‑Major Traversal

Row‑major (optimal in C):

int main(void) {
    int arr[2][3] = {
        {1, 2, 3},
        {4, 5, 6}
    };

    printf("Base address: %p\n", (void *)arr);

    for (int i = 0; i < 2; ++i) {
        for (int j = 0; j < 3; ++j) {
            printf("arr[%d][%d] = %d at address %p\n",
                   i, j, arr[i][j], (void *)&arr[i][j]);
        }
    }

    /* Treat as a flat 1‑D array */
    int *flat = (int *)arr;
    printf("\nAs flat array:\n");
    for (int k = 0; k < 6; ++k) {
        printf("flat[%d] = %d at address %p\n",
               k, flat[k], (void *)&flat[k]);
    }

    return 0;
}

Sample output

Base address: 0x7ffc8b3e1a40
arr[0][0] = 1 at address 0x7ffc8b3e1a40
arr[0][1] = 2 at address 0x7ffc8b3e1a44
arr[0][2] = 3 at address 0x7ffc8b3e1a48
arr[1][0] = 4 at address 0x7ffc8b3e1a4c
arr[1][1] = 5 at address 0x7ffc8b3e1a50
arr[1][2] = 6 at address 0x7ffc8b3e1a54

As flat array:
flat[0] = 1 at address 0x7ffc8b3e1a40
flat[1] = 2 at address 0x7ffc8b3e1a44
flat[2] = 3 at address 0x7ffc8b3e1a48
flat[3] = 4 at address 0x7ffc8b3e1a4c
flat[4] = 5 at address 0x7ffc8b3e1a50
flat[5] = 6 at address 0x7ffc8b3e1a54

Notice the addresses increase by sizeof(int) (4 bytes) sequentially, confirming row‑major layout.

One‑Dimensional Access Shortcut

int rows = 3, cols = 4;
int *arr = malloc(rows * cols * sizeof(int));

/* Access arr[i][j] as a flat array */
int value = arr[i * cols + j];

Dynamic 2‑D Array Allocation

/* Separate rows – each row is its own allocation */
int **arr = malloc(rows * sizeof(int *));
for (int i = 0; i < rows; i++) {
    arr[i] = malloc(cols * sizeof(int));
}

/* Access naturally: arr[i][j] */
/* Contiguous block – one allocation for the whole matrix */
int **arr = malloc(rows * sizeof(int *));
arr[0] = malloc(rows * cols * sizeof(int));   /* one big chunk */
for (int i = 1; i < rows; i++) {
    arr[i] = arr[0] + i * cols;                /* point to the start of each row */
}

/* Cleanup */
free(arr[0]);   /* free the big data block */
free(arr);      /* free the array of row pointers */

Memory Layout & Addressing

  • Multi‑dimensional arrays are linearized in memory.

  • Row‑major order (the default in C) means the rightmost index varies fastest.

  • Address of arr[i][j] is computed as

    base_address + (i * COLS + j) * sizeof(element_type)

Cache, TLB, and Access Patterns

  • Modern CPUs have hierarchical caches (L1, L2, L3). Sequential access maximises cache‑hit rates.
  • Cache line example (64 bytes):
    • int = 4 bytes → 16 ints per line.
    • Accessing arr[0][0] brings arr[0][0] … arr[0][15] into cache.
  • The TLB caches virtual‑to‑physical address translations. Strided or random accesses can cause TLB thrashing.
  • CPUs prefetch upcoming cache lines when they detect sequential patterns. Traversing a matrix in row‑major order (inner loop over columns) lets the hardware prefetch efficiently.

Key Takeaways

  • Linearisation: Multi‑dimensional arrays are stored as a single contiguous block.
  • Row‑major order: Rightmost index changes fastest.
  • Address formula: base + (i * COLS + j) * sizeof(type).
  • Traversal matters: Iterate in the storage order for optimal performance.
  • True arrays vs. pointer‑to‑pointer: Different layouts → different trade‑offs (contiguity vs. flexible row sizes).
  • Compiler magic: arr[i][j] is automatically translated to pointer arithmetic.
  • Cache locality: Sequential accesses can be 10–100× faster than random accesses.
  • Understanding these low‑level details helps you:
    • Write faster code,
    • Debug memory‑related bugs,
    • Interface cleanly with hardware or other languages.
Back to Blog

Related posts

Read more »

🚂 Arrays Explained Like You're 5

The Train Imagine a train with numbered compartments: 🚂 Car 0 Car 1 Car 2 Car 3 Car 4 Each car has a number starting from 0! and can hold one thing. Arrays ar...