Build Log: Shipping a Lean Python Telemetry Agent (CPU, Memory, Disk)

Published: (April 8, 2026 at 05:29 AM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Build Log (April 8, 2026)

Implemented the first production‑ready telemetry collectors for heka‑insights‑agent and wired them into the main polling loop.

  • Added an optimized CPUCollector in src/collectors/cpu.py
  • Added a MemoryCollector in src/collectors/memory.py
  • Added a DiskCollector in src/collectors/disk.py
  • Integrated all collectors into src/main.py with a shared loop
  • Added environment‑based poll‑interval support via CPU_POLL_INTERVAL_SECONDS
  • Added python-dotenv to requirements.txt

CPU Collector Design

The CPU collector is built around psutil.cpu_times(...) snapshots and delta math (single source), rather than calling both cpu_percent and cpu_times_percent each cycle.

Key design points

  • No thread offloading (to_thread) for this workload
  • First cycle acts as a warm‑up by design
  • Supports basic and detailed output modes
  • Optional per‑core output
  • Uses MonotonicTicker to keep a fixed cadence without drift

Implementation highlights

# src/collectors/cpu.py (excerpt)
cpu_times_snapshot = psutil.cpu_times()
# delta calculation performed on subsequent snapshots

Memory Collection

Memory collection is intentionally lightweight:

  • One call each to psutil.virtual_memory() and psutil.swap_memory()
  • Basic mode returns a compact set of key fields
  • Detailed mode returns the full psutil fields
  • Raw byte values are preserved (server‑side compute handles transformations)
# src/collectors/memory.py (excerpt)
mem = psutil.virtual_memory()
swap = psutil.swap_memory()

Disk Collection

For disk telemetry, cumulative I/O counters (not rates) are collected because central compute is performed server‑side.

  • Uses psutil.disk_io_counters(perdisk=True)
  • Returns aggregate and per‑disk counters
  • Filters to physical devices only; excludes partitions from the per‑disk payload
  • Added a device‑name cache with periodic refresh to reduce repeated filtering overhead
# src/collectors/disk.py (excerpt)
disk_io = psutil.disk_io_counters(perdisk=True)

Profiling Summary

Ran a 120‑second profiling session and examined both process stats and cProfile output.

Key findings

  • Agent CPU cost is very low (near‑idle for this polling interval)
  • Max RSS ≈ 15 MB
  • Runtime is dominated by intentional sleep (expected)
  • Collector costs are small; disk collection is the heaviest of the three

Optimizations applied

  • Cached the physical‑device list to avoid filtering every cycle
  • Kept the output shape unchanged (disk_io + disk_io_perdisk)

The agent now has a clean baseline telemetry pipeline with low overhead and clear extension points for transport/shipping.


Next Planned Work

  • Add payload shipping to a backend endpoint
  • Implement bounded retry/backoff logic
  • Write collector‑focused tests

Project Overview

heka‑insights‑agent – a lightweight agent for collecting essential Linux system telemetry and shipping it to a configurable backend.

View on GitHub

0 views
Back to Blog

Related posts

Read more »