Building a Multi-Tenant Observability Platform with SigNoz + OneUptime
Source: Dev.to
Architecture Overview
Modern SaaS teams need deep observability without sacrificing tenant isolation or compliance. This post explains how we built a multi‑tenant monitoring platform that routes logs, metrics, and traces to isolated SigNoz and OneUptime stacks, enforces strong security controls, and aligns with SOC 2 and ISO 27001 practices. The result: each customer gets a dedicated monitoring experience while we keep the operational footprint lean and repeatable.
We designed a hub‑and‑spoke model:
- Central monitoring VM hosts the observability stack.
- Each tenant has either:
- a fully isolated SigNoz stack (frontend, query, collector, ClickHouse), or
- a shared stack with strict routing based on a tenant identifier (for lightweight tenants).
- Each application VM runs an OpenTelemetry (OTEL) Collector that:
- tails PM2 logs,
- receives OTLP traces/metrics,
- forwards everything to the monitoring VM.
This gives a consistent ingestion pipeline while allowing isolation‑by‑default where needed.

Tenant Segregation Strategy
We support two isolation modes:
-
Full isolation per tenant
- Dedicated SigNoz stack per tenant
- Separate ClickHouse instance
- Separate OTEL collector upstream
- Strongest data isolation
-
Logical isolation on a shared stack
- Single SigNoz + ClickHouse instance
- Routing by
business_id(header + resource attribute) - Suitable for smaller tenants
Default – full isolation for regulated or high‑traffic customers.
Key routing headers
x-business-idfor SigNozx-oneuptime-tokenfor OneUptime
Provisioning and Hardening the Monitoring VM
We treat the monitoring VM as a controlled production system:
- SSH keys only, no password auth
- Minimal inbound ports:
22, 80, 443, 4317, 4318 - Nginx as a single TLS ingress
- Docker Compose for immutable service layout
Example provisioning steps (high‑level)
# SSH key‑based access only
az vm user update \
--resource-group \
--name \
--username \
--ssh-key-value ""
# Open required ports (restrict SSH to trusted IPs)
az network nsg rule create ... \
--destination-port-ranges 22 80 443 4317 4318
Multi‑Tenant Routing at the Edge
We use Nginx maps to route traffic by hostname for both UI and OTLP ingestion:
map $host $signoz_collector_upstream {
signoz.tenant-a.example signoz-otel-collector-tenant-a;
signoz.tenant-b.example signoz-otel-collector-tenant-b;
default signoz-otel-collector-default;
}
server {
listen 4318;
location / {
proxy_pass http://$signoz_collector_upstream;
}
}
This gives clean DNS‑based tenant routing while keeping a single IP address.
Collector Configuration: Logs, Traces, Metrics
Each tenant VM runs an OTEL Collector with filelog + OTLP. We parse PM2 logs (JSON wrapper), normalize severity, and attach resource fields for fast filtering in SigNoz.
Core fields we enforce
severity_text(info / warn / error)service.namedeployment.environmenthost.namebusiness_id
Minimal config excerpt
processors:
resourcedetection:
detectors: [system]
resource:
attributes:
- key: business_id
value: ${env:BUSINESS_ID}
action: upsert
transform/logs:
log_statements:
- context: log
statements:
- set(severity_text, attributes["severity"])
where attributes["severity"] != nil
These enrichments make severity_text, service.name, and host.name searchable immediately in SigNoz.
Client‑Side Integration (Apps)
We use a consistent OTEL pattern across backend, web, and agent services:
- Backend – OTLP exporter for traces
- Web – Browser traces forwarded to backend (which re‑exports)
- Agents – OTEL SDK configured with
OTEL_EXPORTER_OTLP_ENDPOINT
Typical environment variables
BUSINESS_ID=tenant-a
SIGNOZ_ENDPOINT=http://signoz.tenant-a.example:4318
ONEUPTIME_ENDPOINT=http://status.tenant-a.example:4318
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://127.0.0.1:4318/v1/traces
DEPLOY_ENV=production
DNS and TLS (Public UX)
Each tenant gets its own subdomains:
signoz.<tenant>.examplestatus.<tenant>.example
TLS termination happens at Nginx with real certificates (ACME/Let’s Encrypt):
sudo certbot --nginx \
-d signoz.tenant-a.example \
-d status.tenant-a.example
We keep per‑tenant TLS policies aligned with strong ciphers and HSTS.
Verification and Observability QA
We validate the pipeline with:
- OTEL health endpoint (
/healthon collector) - Test traffic from backend services
- ClickHouse queries to confirm log attributes
- SigNoz filters for
severity_text,service.name,host.name
Example ClickHouse check (internal)
SELECT severity_text, count()
FROM signoz_logs.logs_v2
WHERE resources_string['business_id'] = 'tenant-a'
AND timestamp >= now() - INTERVAL 15 MINUTE
GROUP BY severity_text;
Security and Compliance (SOC 2 + ISO 27001)
Controls aligned with SOC 2 and ISO 27001:
- Access control – SSH keys only, least‑privilege IAM, MFA on cloud console.
- Network segmentation – minimal open ports; SSH restricted by source IP.
- Secrets management – runtime secrets stored in a vault, never in code.
- Encryption in transit – TLS everywhere, no plaintext traffic.
- Audit logging – all admin actions logged and retained per compliance windows.
- Patch management – automated OS and container image updates with CVE scanning.
Security & Operational Controls
- Plaintext endpoints exposed.
- Encryption at rest: Disk encryption enabled on VMs and DB volumes.
- Audit trails: System logs retained; infrastructure changes tracked in code.
- Change management: All configuration stored in repositories; change reviews required before deployment.
- Monitoring and alerting: OneUptime for SLOs and uptime checks.
- Incident response: Documented procedures, retention, and escalation.
- Backup strategy: ClickHouse backup policies per tenant.
Repeatability: Infra + Tenant Config as Code
We split configuration by responsibility:
- Monitoring services repo: All infrastructure and Nginx routing.
- Tenant repos: OTEL collector configuration and deployment hooks.
That means a new VM can be rebuilt with:
-
Pull the monitoring repo and run:
docker compose up -d -
Update DNS + TLS.
-
Run tenant deployment scripts to install the collector and environment.
Final Takeaways
This architecture gives us the best of both worlds:
- Strong tenant isolation for compliance‑focused clients.
- Shared operations processes and standard configuration.
- Fast log filtering (severity / service / env / host) for high signal‑to‑noise debugging.
- A repeatable, audited deployment flow suitable for SOC 2 and ISO 27001 requirements.