Building a Multi-Tenant Observability Platform with SigNoz + OneUptime

Published: 4 days ago (January 15, 2026 at 01:35 PM EST)

4 min read

Source: Dev.to

Architecture Overview

Modern SaaS teams need deep observability without sacrificing tenant isolation or compliance. This post explains how we built a multi‑tenant monitoring platform that routes logs, metrics, and traces to isolated SigNoz and OneUptime stacks, enforces strong security controls, and aligns with SOC 2 and ISO 27001 practices. The result: each customer gets a dedicated monitoring experience while we keep the operational footprint lean and repeatable.

We designed a hub‑and‑spoke model:

Central monitoring VM hosts the observability stack.
Each tenant has either:
- a fully isolated SigNoz stack (frontend, query, collector, ClickHouse), or
- a shared stack with strict routing based on a tenant identifier (for lightweight tenants).
Each application VM runs an OpenTelemetry (OTEL) Collector that:
- tails PM2 logs,
- receives OTLP traces/metrics,
- forwards everything to the monitoring VM.

This gives a consistent ingestion pipeline while allowing isolation‑by‑default where needed.

Ingestion Pipeline

Tenant Segregation Strategy

We support two isolation modes:

Full isolation per tenant
- Dedicated SigNoz stack per tenant
- Separate ClickHouse instance
- Separate OTEL collector upstream
- Strongest data isolation
Logical isolation on a shared stack
- Single SigNoz + ClickHouse instance
- Routing by business_id (header + resource attribute)
- Suitable for smaller tenants

Default – full isolation for regulated or high‑traffic customers.

Key routing headers

x-business-id for SigNoz
x-oneuptime-token for OneUptime

Provisioning and Hardening the Monitoring VM

We treat the monitoring VM as a controlled production system:

SSH keys only, no password auth
Minimal inbound ports: 22, 80, 443, 4317, 4318
Nginx as a single TLS ingress
Docker Compose for immutable service layout

Example provisioning steps (high‑level)

# SSH key‑based access only
az vm user update \
  --resource-group  \
  --name  \
  --username  \
  --ssh-key-value ""

# Open required ports (restrict SSH to trusted IPs)
az network nsg rule create ... \
  --destination-port-ranges 22 80 443 4317 4318

Multi‑Tenant Routing at the Edge

We use Nginx maps to route traffic by hostname for both UI and OTLP ingestion:

map $host $signoz_collector_upstream {
    signoz.tenant-a.example  signoz-otel-collector-tenant-a;
    signoz.tenant-b.example  signoz-otel-collector-tenant-b;
    default                  signoz-otel-collector-default;
}

server {
    listen 4318;
    location / {
        proxy_pass http://$signoz_collector_upstream;
    }
}

This gives clean DNS‑based tenant routing while keeping a single IP address.

Collector Configuration: Logs, Traces, Metrics

Each tenant VM runs an OTEL Collector with filelog + OTLP. We parse PM2 logs (JSON wrapper), normalize severity, and attach resource fields for fast filtering in SigNoz.

Core fields we enforce

severity_text (info / warn / error)
service.name
deployment.environment
host.name
business_id

Minimal config excerpt

processors:
  resourcedetection:
    detectors: [system]

  resource:
    attributes:
      - key: business_id
        value: ${env:BUSINESS_ID}
        action: upsert

  transform/logs:
    log_statements:
      - context: log
        statements:
          - set(severity_text, attributes["severity"])
            where attributes["severity"] != nil

These enrichments make severity_text, service.name, and host.name searchable immediately in SigNoz.

Client‑Side Integration (Apps)

We use a consistent OTEL pattern across backend, web, and agent services:

Backend – OTLP exporter for traces
Web – Browser traces forwarded to backend (which re‑exports)
Agents – OTEL SDK configured with OTEL_EXPORTER_OTLP_ENDPOINT

Typical environment variables

BUSINESS_ID=tenant-a
SIGNOZ_ENDPOINT=http://signoz.tenant-a.example:4318
ONEUPTIME_ENDPOINT=http://status.tenant-a.example:4318
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://127.0.0.1:4318/v1/traces
DEPLOY_ENV=production

DNS and TLS (Public UX)

Each tenant gets its own subdomains:

signoz.<tenant>.example
status.<tenant>.example

TLS termination happens at Nginx with real certificates (ACME/Let’s Encrypt):

sudo certbot --nginx \
  -d signoz.tenant-a.example \
  -d status.tenant-a.example

We keep per‑tenant TLS policies aligned with strong ciphers and HSTS.

Verification and Observability QA

We validate the pipeline with:

OTEL health endpoint (/health on collector)
Test traffic from backend services
ClickHouse queries to confirm log attributes
SigNoz filters for severity_text, service.name, host.name

Example ClickHouse check (internal)

SELECT severity_text, count()
FROM signoz_logs.logs_v2
WHERE resources_string['business_id'] = 'tenant-a'
  AND timestamp >= now() - INTERVAL 15 MINUTE
GROUP BY severity_text;

Security and Compliance (SOC 2 + ISO 27001)

Controls aligned with SOC 2 and ISO 27001:

Access control – SSH keys only, least‑privilege IAM, MFA on cloud console.
Network segmentation – minimal open ports; SSH restricted by source IP.
Secrets management – runtime secrets stored in a vault, never in code.
Encryption in transit – TLS everywhere, no plaintext traffic.
Audit logging – all admin actions logged and retained per compliance windows.
Patch management – automated OS and container image updates with CVE scanning.

Security & Operational Controls

Plaintext endpoints exposed.
Encryption at rest: Disk encryption enabled on VMs and DB volumes.
Audit trails: System logs retained; infrastructure changes tracked in code.
Change management: All configuration stored in repositories; change reviews required before deployment.
Monitoring and alerting: OneUptime for SLOs and uptime checks.
Incident response: Documented procedures, retention, and escalation.
Backup strategy: ClickHouse backup policies per tenant.

Repeatability: Infra + Tenant Config as Code

We split configuration by responsibility:

Monitoring services repo: All infrastructure and Nginx routing.
Tenant repos: OTEL collector configuration and deployment hooks.

That means a new VM can be rebuilt with:

Pull the monitoring repo and run:
```
docker compose up -d
```
Update DNS + TLS.
Run tenant deployment scripts to install the collector and environment.

Final Takeaways

This architecture gives us the best of both worlds:

Strong tenant isolation for compliance‑focused clients.
Shared operations processes and standard configuration.
Fast log filtering (severity / service / env / host) for high signal‑to‑noise debugging.
A repeatable, audited deployment flow suitable for SOC 2 and ISO 27001 requirements.

Building a Multi-Tenant Observability Platform with SigNoz + OneUptime

Architecture Overview

Tenant Segregation Strategy

Provisioning and Hardening the Monitoring VM

Example provisioning steps (high‑level)

Multi‑Tenant Routing at the Edge

Collector Configuration: Logs, Traces, Metrics

Core fields we enforce

Minimal config excerpt

Client‑Side Integration (Apps)

Typical environment variables

DNS and TLS (Public UX)

Verification and Observability QA

Example ClickHouse check (internal)

Security and Compliance (SOC 2 + ISO 27001)

Security & Operational Controls

Repeatability: Infra + Tenant Config as Code

Final Takeaways

Related posts

Rapg: TUI-based Secret Manager

Quick Data Recovery using Snapshots - Amazon FSx for NetApp ONTAP

Technology is an Enabler, not a Saviour

Industry Survey: Faster Coding, Slower Debugging

Architecture Overview

Tenant Segregation Strategy

Provisioning and Hardening the Monitoring VM

Example provisioning steps (high‑level)

Multi‑Tenant Routing at the Edge

Collector Configuration: Logs, Traces, Metrics

Core fields we enforce

Minimal config excerpt

Client‑Side Integration (Apps)

Typical environment variables

DNS and TLS (Public UX)

Verification and Observability QA

Example ClickHouse check (internal)

Security and Compliance (SOC 2 + ISO 27001)

Security & Operational Controls

Repeatability: Infra + Tenant Config as Code

Final Takeaways

Related posts

Rapg: TUI-based Secret Manager

Quick Data Recovery using Snapshots - Amazon FSx for NetApp ONTAP

Technology is an Enabler, not a Saviour

Industry Survey: Faster Coding, Slower Debugging

Security and Compliance (SOC 2 + ISO 27001)