Understanding TimescaleDB Background Workers and Jobs
Source: Dev.to
The Mechanics of Background Workers
Every time you call add_compression_policy(), add_retention_policy(), or add_continuous_aggregate_policy(), TimescaleDB registers a scheduled job. Each job runs in a PostgreSQL background worker — an independent process that executes outside of any client connection.
The number of available background workers is controlled by two settings:
| Setting | Description | Default |
|---|---|---|
timescaledb.max_background_workers | Ceiling for TimescaleDB’s own scheduler | 8‑16 |
max_worker_processes | PostgreSQL’s global limit shared by all extensions, parallel queries, and logical replication | — |
You can inspect your current configuration with:
SELECT name, setting, unit
FROM pg_settings
WHERE name IN (
'timescaledb.max_background_workers',
'max_worker_processes',
'max_parallel_workers'
)
ORDER BY name;
And list all active jobs with:
SELECT
job_id,
proc_name,
hypertable_name,
schedule_interval,
scheduled AS is_active
FROM timescaledb_information.jobs
WHERE scheduled = true
ORDER BY proc_name, hypertable_name;
How Exhaustion Happens
The math is straightforward and unforgiving.
- Each hypertable with compression, retention, and a continuous aggregate creates three jobs.
- Eight hypertables → 24 jobs.
- Add TimescaleDB’s internal maintenance tasks, and you can reach well over 100 scheduled jobs.
A default max_background_workers of 16 was never designed for that load.
When a job cannot acquire a worker, it does not raise an error. It simply waits in the scheduler’s queue. If a free worker never opens within the schedule interval, the next invocation stacks behind the current one. Over hours, a backlog forms and compounds with every cycle.
Worst case: a job’s execution time exceeds its schedule interval.
A compression job that takes 15 minutes but is scheduled every 10 minutes permanently occupies a worker slot and can never catch up. Each cycle adds another queued invocation, starving other job types.
Diagnosing the Problem
Compare job count to worker limit
WITH worker_config AS (
SELECT current_setting('timescaledb.max_background_workers')::int AS max_workers
),
active_jobs AS (
SELECT count(*) AS total_scheduled_jobs
FROM timescaledb_information.jobs
WHERE scheduled = true
)
SELECT
wc.max_workers,
aj.total_scheduled_jobs,
CASE
WHEN aj.total_scheduled_jobs = 0 THEN 'FAILING'
WHEN js.total_runs = 0 THEN 'NEVER RUN -- likely queued'
ELSE 'OK'
END AS health_status
FROM timescaledb_information.jobs j
JOIN timescaledb_information.job_stats js ON j.job_id = js.job_id
ORDER BY js.total_failures DESC, j.proc_name;
Key warning signs
total_runs = 0– The job was registered but never acquired a worker (pure queue starvation).- Rising
consecutive_failures– The worker was acquired but the job failed (often from lock contention or OOM during compression). last_run_duration>schedule_interval– A job that can never finish before its next invocation permanently blocks a worker slot.
Right‑Sizing Your Worker Pool
The formula is simple:
total_policies + 2 (internal jobs) = minimum max_background_workers
Compute the exact recommendation with:
WITH policy_count AS (
SELECT count(*) AS total_jobs
FROM timescaledb_information.jobs
WHERE scheduled = true
)
SELECT
total_jobs,
total_jobs + 2 AS recommended_workers,
'ALTER SYSTEM SET timescaledb.max_background_workers = '
|| (total_jobs + 2) AS sql_to_run
FROM policy_count;
Apply the change:
ALTER SYSTEM SET timescaledb.max_background_workers = 28;
ALTER SYSTEM SET max_worker_processes = 32;
-- Requires a full PostgreSQL restart – pg_reload_conf() is NOT sufficient
Over‑provisioning is cheap. Each idle background worker consumes approximately 5‑10 MB of memory and zero CPU. Setting max_background_workers to 32 or 64 on a server running 20 jobs carries no measurable performance penalty. Under‑provisioning, on the other hand, silently breaks your entire automation pipeline.
A Note on PostgreSQL 18 + TimescaleDB 2.24
If you are running PostgreSQL 18 with TimescaleDB 2.24, be aware that custom functions registered via add_job() fail with “cache lookup failed for function” errors. Background workers on this version combination cannot resolve public‑schema functions. The workaround is to use system cron for any custom scheduled tasks, while letting TimescaleDB handle its built‑in policies (compression, retention, aggregate refresh) normally.
Prevention Checklist
-
Count jobs after every new hypertable.
Each hypertable with full policies adds 3 jobs. Update worker settings proactively. -
Monitor
total_failuresandconsecutive_failuresregularly.
Querytimescaledb_information.job_statsweekly or set up automated monitoring. -
Verify job duration stays below schedule interval.
Iflast_run_durationapproachesschedule_interval, consider:- increasing the interval,
- reducing chunk size, or
- adding workers.
-
Set
max_worker_processeshigher thanmax_background_workers.
Leave room for parallel queries and logical replication. -
Remember: both settings require a full PostgreSQL restart.
Plan changes during maintenance windows. -
Treat worker sizing as part of hypertable setup.
Add a policy → add a worker. Never treat it as an afterthought.
Background worker exhaustion is entirely preventable. The fix takes one SQL statement and a restart. The hard part is knowing to look for it before your automation silently stops working.