Building a Multi-Tenant Observability Platform with SigNoz + OneUptime

Published: (January 15, 2026 at 01:35 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Architecture Overview

Modern SaaS teams need deep observability without sacrificing tenant isolation or compliance. This post explains how we built a multi‑tenant monitoring platform that routes logs, metrics, and traces to isolated SigNoz and OneUptime stacks, enforces strong security controls, and aligns with SOC 2 and ISO 27001 practices. The result: each customer gets a dedicated monitoring experience while we keep the operational footprint lean and repeatable.

We designed a hub‑and‑spoke model:

  • Central monitoring VM hosts the observability stack.
  • Each tenant has either:
    • a fully isolated SigNoz stack (frontend, query, collector, ClickHouse), or
    • a shared stack with strict routing based on a tenant identifier (for lightweight tenants).
  • Each application VM runs an OpenTelemetry (OTEL) Collector that:
    • tails PM2 logs,
    • receives OTLP traces/metrics,
    • forwards everything to the monitoring VM.

This gives a consistent ingestion pipeline while allowing isolation‑by‑default where needed.

Ingestion Pipeline

Tenant Segregation Strategy

We support two isolation modes:

  1. Full isolation per tenant

    • Dedicated SigNoz stack per tenant
    • Separate ClickHouse instance
    • Separate OTEL collector upstream
    • Strongest data isolation
  2. Logical isolation on a shared stack

    • Single SigNoz + ClickHouse instance
    • Routing by business_id (header + resource attribute)
    • Suitable for smaller tenants

Default – full isolation for regulated or high‑traffic customers.

Key routing headers

  • x-business-id for SigNoz
  • x-oneuptime-token for OneUptime

Provisioning and Hardening the Monitoring VM

We treat the monitoring VM as a controlled production system:

  • SSH keys only, no password auth
  • Minimal inbound ports: 22, 80, 443, 4317, 4318
  • Nginx as a single TLS ingress
  • Docker Compose for immutable service layout

Example provisioning steps (high‑level)

# SSH key‑based access only
az vm user update \
  --resource-group  \
  --name  \
  --username  \
  --ssh-key-value ""

# Open required ports (restrict SSH to trusted IPs)
az network nsg rule create ... \
  --destination-port-ranges 22 80 443 4317 4318

Multi‑Tenant Routing at the Edge

We use Nginx maps to route traffic by hostname for both UI and OTLP ingestion:

map $host $signoz_collector_upstream {
    signoz.tenant-a.example  signoz-otel-collector-tenant-a;
    signoz.tenant-b.example  signoz-otel-collector-tenant-b;
    default                  signoz-otel-collector-default;
}

server {
    listen 4318;
    location / {
        proxy_pass http://$signoz_collector_upstream;
    }
}

This gives clean DNS‑based tenant routing while keeping a single IP address.

Collector Configuration: Logs, Traces, Metrics

Each tenant VM runs an OTEL Collector with filelog + OTLP. We parse PM2 logs (JSON wrapper), normalize severity, and attach resource fields for fast filtering in SigNoz.

Core fields we enforce

  • severity_text (info / warn / error)
  • service.name
  • deployment.environment
  • host.name
  • business_id

Minimal config excerpt

processors:
  resourcedetection:
    detectors: [system]

  resource:
    attributes:
      - key: business_id
        value: ${env:BUSINESS_ID}
        action: upsert

  transform/logs:
    log_statements:
      - context: log
        statements:
          - set(severity_text, attributes["severity"])
            where attributes["severity"] != nil

These enrichments make severity_text, service.name, and host.name searchable immediately in SigNoz.

Client‑Side Integration (Apps)

We use a consistent OTEL pattern across backend, web, and agent services:

  • Backend – OTLP exporter for traces
  • Web – Browser traces forwarded to backend (which re‑exports)
  • Agents – OTEL SDK configured with OTEL_EXPORTER_OTLP_ENDPOINT

Typical environment variables

BUSINESS_ID=tenant-a
SIGNOZ_ENDPOINT=http://signoz.tenant-a.example:4318
ONEUPTIME_ENDPOINT=http://status.tenant-a.example:4318
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://127.0.0.1:4318/v1/traces
DEPLOY_ENV=production

DNS and TLS (Public UX)

Each tenant gets its own subdomains:

  • signoz.<tenant>.example
  • status.<tenant>.example

TLS termination happens at Nginx with real certificates (ACME/Let’s Encrypt):

sudo certbot --nginx \
  -d signoz.tenant-a.example \
  -d status.tenant-a.example

We keep per‑tenant TLS policies aligned with strong ciphers and HSTS.

Verification and Observability QA

We validate the pipeline with:

  • OTEL health endpoint (/health on collector)
  • Test traffic from backend services
  • ClickHouse queries to confirm log attributes
  • SigNoz filters for severity_text, service.name, host.name

Example ClickHouse check (internal)

SELECT severity_text, count()
FROM signoz_logs.logs_v2
WHERE resources_string['business_id'] = 'tenant-a'
  AND timestamp >= now() - INTERVAL 15 MINUTE
GROUP BY severity_text;

Security and Compliance (SOC 2 + ISO 27001)

Controls aligned with SOC 2 and ISO 27001:

  • Access control – SSH keys only, least‑privilege IAM, MFA on cloud console.
  • Network segmentation – minimal open ports; SSH restricted by source IP.
  • Secrets management – runtime secrets stored in a vault, never in code.
  • Encryption in transit – TLS everywhere, no plaintext traffic.
  • Audit logging – all admin actions logged and retained per compliance windows.
  • Patch management – automated OS and container image updates with CVE scanning.

Security & Operational Controls

  • Plaintext endpoints exposed.
  • Encryption at rest: Disk encryption enabled on VMs and DB volumes.
  • Audit trails: System logs retained; infrastructure changes tracked in code.
  • Change management: All configuration stored in repositories; change reviews required before deployment.
  • Monitoring and alerting: OneUptime for SLOs and uptime checks.
  • Incident response: Documented procedures, retention, and escalation.
  • Backup strategy: ClickHouse backup policies per tenant.

Repeatability: Infra + Tenant Config as Code

We split configuration by responsibility:

  • Monitoring services repo: All infrastructure and Nginx routing.
  • Tenant repos: OTEL collector configuration and deployment hooks.

That means a new VM can be rebuilt with:

  1. Pull the monitoring repo and run:

    docker compose up -d
  2. Update DNS + TLS.

  3. Run tenant deployment scripts to install the collector and environment.

Final Takeaways

This architecture gives us the best of both worlds:

  • Strong tenant isolation for compliance‑focused clients.
  • Shared operations processes and standard configuration.
  • Fast log filtering (severity / service / env / host) for high signal‑to‑noise debugging.
  • A repeatable, audited deployment flow suitable for SOC 2 and ISO 27001 requirements.
Back to Blog

Related posts

Read more »

Rapg: TUI-based Secret Manager

We've all been there. You join a new project, and the first thing you hear is: > 'Check the pinned message in Slack for the .env file.' Or you have several .env...

Technology is an Enabler, not a Saviour

Why clarity of thinking matters more than the tools you use Technology is often treated as a magic switch—flip it on, and everything improves. New software, pl...