[Paper] Stateless Snowflake: A Cloud-Agnostic Distributed ID Generator Using Network-Derived Identity
Source: arXiv - 2512.11643v1
Overview
The paper introduces Stateless Snowflake, a cloud‑agnostic ID generation protocol that removes the need for manually assigned or centrally coordinated worker IDs—an Achilles’ heel of classic Snowflake generators. By extracting a node’s uniqueness from its container’s private IPv4 address, the design works seamlessly in modern, autoscaling environments like Kubernetes, delivering high‑throughput, k‑ordered IDs without any external coordination service.
Key Contributions
- Network‑derived identity: Uses the container’s private IP address as a deterministic source of uniqueness, eliminating the need for static worker IDs or ZooKeeper‑style coordination.
- Modified bit layout (1‑41‑16‑6): Allocates 16 bits for the IP‑derived entropy while preserving monotonic timestamps and sequence counters, enabling up to 64 K IDs per millisecond per node.
- Cloud‑agnostic implementation: Validated on AWS, GCP, and Azure, proving the approach works across major public clouds without cloud‑specific tweaks.
- Stateless microservice friendliness: The generator can be packaged as a lightweight library or sidecar that requires no persistent state, fitting naturally into container‑native deployment pipelines.
- Performance parity with stateful generators: Achieves ~31 K IDs/sec on a 3‑node cluster, comparable to traditional Snowflake while offering virtually unlimited horizontal scaling.
Methodology
- Deriving uniqueness – When a container starts, the library reads its private IPv4 address (e.g.,
10.0.3.5). The address is hashed and truncated to 16 bits, providing a node‑specific identifier that is guaranteed to be unique within the same VPC/subnet. - Bit allocation – The 64‑bit Snowflake ID is split as follows:
- 1 sign bit (always 0)
- 41 bits for a millisecond‑precision timestamp (relative to a custom epoch)
- 16 bits for the network‑derived node ID
- 6 bits for a per‑millisecond sequence counter (max 64 IDs per ms per node)
- Generation flow – On each request, the generator:
- Reads the current timestamp.
- If the timestamp matches the previous call, increments the 6‑bit sequence (rolling over to the next millisecond when exhausted).
- Concatenates the three fields into a 64‑bit integer and returns it.
- Statelessness – No external storage or coordination is required; the only state kept in memory is the last timestamp and sequence counter, both of which reset automatically when the process restarts.
- Evaluation setup – The authors deployed the generator as a sidecar in Kubernetes clusters on AWS (EKS), GCP (GKE), and Azure (AKS). They measured throughput (transactions per second, TPS) and latency under varying pod counts and network loads, comparing against a classic Snowflake implementation backed by ZooKeeper.
Results & Findings
| Environment | Nodes | Peak Throughput (TPS) | Avg Latency (µs) |
|---|---|---|---|
| AWS (EKS) | 3 | 31,200 | 45 |
| GCP (GKE) | 3 | 30,800 | 48 |
| Azure (AKS) | 3 | 31,050 | 46 |
- Throughput ceiling: The theoretical maximum per node (≈64 K TPS) is never reached in practice because network I/O and container scheduling dominate latency.
- Scalability: Adding more nodes linearly increases aggregate TPS, confirming the “effectively unbounded” horizontal scalability claim.
- Monotonicity: IDs remain k‑ordered across the entire cluster, even when pods are recreated or rescheduled, thanks to the deterministic IP‑derived node component.
- Operational simplicity: No external coordination service was required, reducing deployment complexity and failure surface.
Practical Implications
- Zero‑ops ID service: Teams can embed the generator directly into microservices or run it as a sidecar without provisioning ZooKeeper, etcd, or Consul.
- Seamless autoscaling: As pods scale up/down, each new instance automatically obtains a unique node ID from its IP, eliminating race conditions during rapid scaling events.
- Cost reduction: Removing a coordination layer cuts infrastructure spend and simplifies cloud‑agnostic CI/CD pipelines.
- Compatibility with existing Snowflake IDs: The 64‑bit format and k‑ordering mean downstream systems (databases, message queues, tracing tools) can continue using the same ID parsing logic.
- Edge & hybrid deployments: Since the method only needs a private IP, it works equally well on on‑prem VMs, edge devices, or multi‑cloud clusters, supporting truly distributed architectures.
Limitations & Future Work
- IP address collisions: The approach assumes unique private IPs within the same subnet; overlapping CIDR ranges across clusters could cause collisions and would need additional namespace handling.
- Sequence space: With only 6 bits for the per‑millisecond counter, a single node cannot exceed 64 IDs per millisecond; extremely bursty workloads might hit this ceiling.
- Clock synchronization: Like all Snowflake variants, the system relies on loosely synchronized clocks; large clock skews could break monotonicity.
- Future directions: The authors suggest exploring richer entropy sources (e.g., MAC address + IP hash) to expand the node‑ID space, integrating lightweight clock‑drift detection, and evaluating performance under massive (>10 k) node clusters.
Authors
- Manideep Reddy Chinthareddy
Paper Information
- arXiv ID: 2512.11643v1
- Categories: cs.DC
- Published: December 12, 2025
- PDF: Download PDF