Incremental Backup with PostgreSQL 17

Published: (March 20, 2026 at 04:09 AM EDT)
8 min read
Source: Dev.to

Source: Dev.to

With PostgreSQL 17 incremental backups wer introduced in a built in way. This article describes the result of a workshop dedicated to study this new feature.

We used the following docker composition to simulate the a cluster with traffic that we want backed up. For this we created the following docker-compose.yml with several containers: services: postgres_main: image: postgres:17 environment: POSTGRES_PASSWORD: postgres POSTGRES_USER: postgres POSTGRES_DB: testdb PGDATABASE: testdb volumes: - ./docker-entrypoint-initdb.d:/docker-entrypoint-initdb.d - pg_data:/var/lib/postgresql/data - wal_archive:/mnt/wal_archive - full_backup:/mnt/full_backup - incremental_backup:/mnt/incremental_backup command: > postgres -c archive_mode=on -c archive_command=‘cp %p /mnt/wal_archive/%f’ -c summarize_wal=on

postgres_restore: image: postgres:17 profiles: - restore environment: POSTGRES_PASSWORD: postgres POSTGRES_USER: postgres POSTGRES_DB: testdb volumes: - pg_data_restore:/var/lib/postgresql/data - wal_archive:/mnt/wal_archive - full_backup:/mnt/full_backup - incremental_backup:/mnt/incremental_backup command: > postgres -c restore_command=‘cp /mnt/wal_archive/%f %p’

cli: image: cli build: . stop_grace_period: 1s environment: PGPASSWORD: postgres PGUSER: postgres PGDATABASE: testdb PGHOST: postgres_main volumes: - pg_data_restore:/mnt/data/restore - pg_data:/var/lib/postgresql/data - wal_archive:/mnt/wal_archive - full_backup:/mnt/full_backup - incremental_backup:/mnt/incremental_backup - ./checksum.py:/usr/local/bin/checksum.py entrypoint: [“/bin/sh”, “-c”] command: - | chown -R 999 /mnt/wal_archive chown -R 999 /mnt/full_backup chown -R 999 /mnt/incremental_backup chown -R 999 /mnt/data/restore sleep infinity

volumes: pg_data: pg_data_restore: wal_archive: full_backup: incremental_backup:

Notes about the docker compose file postgres_main contains the database we wish to backup postgres_restore is the database where we want to restore the database cli is to prepare the data for restoration, it enables us to connect to all clusters. ⚠️ Make sure to use PG17!

Take a full backup as a starting point Take an incremental backup Take another incremental backup Repeat step 3 The first step, consists of creating an initial full backup with the following command: pg_basebackup —pgdata=/mnt/full_backup

pg_basebackup command

The target directory is a mount point for the full backup, it’s also shared between containers (via the volumes). By default pgdata specifies the target directory where the backup will be stored. In this case the output will be written to /mnt/full_backup

Output of the command

pg_basebackup will create a bunch of files, we will focus on the 2 important ones: backup_label: this is a legacy description of the backup and here is an extract START WAL LOCATION: 0/4000028 (file 000000010000000000000004) CHECKPOINT LOCATION: 0/4000080 BACKUP METHOD: streamed BACKUP FROM: primary START TIME: 2025-04-24 09:08:31 UTC LABEL: pg_basebackup base backup START TIMELINE: 1

This file indicates the location in the WAL where the backup starts, and so is the checkpoint location. The other information speak of themselves. backup_manifest: (available since pg 13) is linked to the feature we are currently talking about - the incremental backup. This file will serve as a reference to determiner which files should be included in the incremental backup. Don’t hesitate to order some training and/or feel free to checkout the documentation [https://www.postgresql.org/docs/current/backup-manifest-files.html] Here is an extract of this new backup_manifest: { “Path”: “base/16384/3766”, “Size”: 16384, “Last-Modified”: “2025-04-24 08:56:45 GMT”, “Checksum-Algorithm”: “CRC32C”, “Checksum”: “3c0ea625” },

For this specific “extract” of the file ‘base/16384/3766’ we have: The checksum, that is the fingerprint 3c0ea625 The last modification date 2025-04-24 08:56:45 GMT We can check this fingerprint by ourselves with a short snippet: Prerequisites: python3 pip install crc32c (you may need –fix-broken-packages)

#!/usr/bin/env python3

import sys import crc32c def main(): if len(sys.argv) != 2: print(f”Usage: {sys.argv[0]} ”) sys.exit(1)

filename = sys.argv[1]

try:
    with open(filename, 'rb') as f:
        data = f.read()
except Exception as e:
    print(f"Failed to read file: {e}")
    sys.exit(1)

checksum = crc32c.crc32c(data)

print(f"CRC32C (normal) : 0x{checksum:08x}")

le_bytes = checksum.to_bytes(4, byteorder='big')[::-1]
print(f"CRC32C (little-endian) : 0x{le_bytes.hex()}")

if name == “main”: main()

While no traffic is recorded on the pg cluster, the data will remain the same, as shown here. Execute the checksum python script (file is in the path) with the parameter (file) NOTE: You can find the file where the data of a specific table is stored with the following query: SELECT relname, ‘base/’ || pg_database.oid || ’/’ || relfilenode AS filename, pg_database.oid AS db_oid, pg_database.datname AS database, nspname AS schema FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace JOIN pg_database ON pg_database.oid = pg_database.oid WHERE relfilenode IS NOT NULL AND relname LIKE ‘pgbench%’;

Which in this case returns the output:

relname filename database schema

pgbench_accounts base/16384/16397 testdb public

pgbench_accounts_pkey base/16384/16405 testdb public

pgbench_branches base/16384/16398 testdb public

pgbench_branches_pkey base/16384/16401 testdb public

pgbench_history base/16384/16399 testdb public

pgbench_tellers ba

se/16384/16400 testdb public

pgbench_tellers_pkey base/16384/16403 testdb public

So based on this we need to take a look at the file base/16384/16397 to determine checksum for the pgbench_accounts relation Now execute the python script: ./checksum.py /var/lib/docker/volumes/demo-postgres-backup-incremental_pg_data/_data/base/16384/16397

0x57679e47 # CRC32C

{ “Path”: “base/16384/16397”, “Size”: 536870912, “Last-Modified”: “2025-04-24 11:44:14 GMT”, “Checksum-Algorithm”: “CRC32C”, “Checksum”: “57679e47”},

To se a change on the checksum let’s make some changes: We update a column in the pgbench_account table UPDATE pgbench_accounts SET abalance = abalance + 10;

Lets, check the fingerprint again: ./checksum.py /var/lib/docker/volumes/demo-postgres-backup-incremental_pg_data/_data/base/16384/16397

0x06cc374f # CRC32C

{ “Path”: “base/16384/16397”, “Size”: 1073741824, “Last-Modified”: “2025-04-24 12:37:28 GMT”, “Checksum-Algorithm”: “CRC32C”, “Checksum”: “06cc374f” },

‼️When using a long update query, we can inspect the datadir and notice that the file is modified (checksum will be different), even if the transaction is not yet committed. If we cancel the query, even if the data is not modified from the logical point (transaction rollback), the data on the disk will contain the uncommitted modifications. Due to the visibility map those modifications are not visible by any transaction but the file is modified, and the checksum will be different ⚠️ Wal summarize need to be activated (see our docker-compose) The first increment contains the diff from the full backup. Then the second increment references the previous one and should only contain the diff from the first incremental. To set this up we will use the following command pg_basebackup —checkpoint=fast —incremental=/mnt/full_backup/backup_manifest —pgdata=/mnt/incremental_backup/0/

—checkpoint=fast is set in order not wait for the next checkpoint —incremental is where the magic appears, it must point to the backup_manifest of the last increment or full backup and is the origin of the diff — pgdata specifies the destination directory where the incremental backup will be stored The creation of a new increment can be repeated multiple times. Where each increment contains only a copy of the blocks/pages that have changed since the last given increment. Now let’s put the incremental backup back together with the full backup and begin the database restoration process. pg_combinebackup to merge the full backup with the incremental backups. pg_combinebackup -d -o /mnt/data/restore /mnt/full_backup /mnt/incremental_backup/0 /mnt/incremental_backup/1

This command will give us a full backup in /mnt/data/restore. So now we can start a restored database from the combined backup. In terms of the size of the back up we can see, that the full backup is much larger than the the increment:

Size Item

1.6G full_backup

25M incremental_backup

And the more changes there are in the database the larger is the incremental backup:

Size Changes Comment

25M 0

25M 1 change of 2 records in a partitioned table

28M 2 change of 1000 records in a partitioned table

Lets talk about the impact of the recovery time objective (RTO). This is the time that is needed to rebuild a full backup with all the increments and recreate a recovery database. It is possible to to have different scenarios: One full backup once a week and one incremental backup every day. One nightly full backup and incremental backups on an hourly rate. Note: The second scenery is only relevant for a database with a lot of traffic.

The most consuming part is the reply of the WAL file, this means that we need to reduce the number of WAL to replay. One way to do this is to make the last incremental backup as close as possible to the recovery target. The second scenario is a good candidate to cover this. With incremental backups it is possible to handle this in a way so that not all of the data in the Database is transferred every hour but just the tables that contain modification in comparison to the last incremental backup.

Another independent scenery that could be imagined is that we make use of the increments and the full backup to create a new füll backup once in a while. Then on the next increment will be based on this new full backup. In this way backup traffic on the cluster can be reduced. In the blog pst we have analysed how to use incremental backups and how to set them back together to create a new folder so that we can start a restoration of the original database. On the side the checksum and the manifest files that are used for incremental backups are explained and analysed. As an out come we see is that by using incremental backups the subsequent backup gets smaller since only what changed since the last incremental run is saved.

0 views
Back to Blog

Related posts

Read more »