[Paper] Stateless Snowflake: A Cloud-Agnostic Distributed ID Generator Using Network-Derived Identity

Published: 1 month ago (December 12, 2025 at 10:21 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.11643v1

Overview

The paper introduces Stateless Snowflake, a cloud‑agnostic ID generation protocol that removes the need for manually assigned or centrally coordinated worker IDs—an Achilles’ heel of classic Snowflake generators. By extracting a node’s uniqueness from its container’s private IPv4 address, the design works seamlessly in modern, autoscaling environments like Kubernetes, delivering high‑throughput, k‑ordered IDs without any external coordination service.

Key Contributions

Network‑derived identity: Uses the container’s private IP address as a deterministic source of uniqueness, eliminating the need for static worker IDs or ZooKeeper‑style coordination.
Modified bit layout (1‑41‑16‑6): Allocates 16 bits for the IP‑derived entropy while preserving monotonic timestamps and sequence counters, enabling up to 64 K IDs per millisecond per node.
Cloud‑agnostic implementation: Validated on AWS, GCP, and Azure, proving the approach works across major public clouds without cloud‑specific tweaks.
Stateless microservice friendliness: The generator can be packaged as a lightweight library or sidecar that requires no persistent state, fitting naturally into container‑native deployment pipelines.
Performance parity with stateful generators: Achieves ~31 K IDs/sec on a 3‑node cluster, comparable to traditional Snowflake while offering virtually unlimited horizontal scaling.

Methodology

Deriving uniqueness – When a container starts, the library reads its private IPv4 address (e.g., 10.0.3.5). The address is hashed and truncated to 16 bits, providing a node‑specific identifier that is guaranteed to be unique within the same VPC/subnet.
Bit allocation – The 64‑bit Snowflake ID is split as follows:
- 1 sign bit (always 0)
- 41 bits for a millisecond‑precision timestamp (relative to a custom epoch)
- 16 bits for the network‑derived node ID
- 6 bits for a per‑millisecond sequence counter (max 64 IDs per ms per node)
Generation flow – On each request, the generator:
- Reads the current timestamp.
- If the timestamp matches the previous call, increments the 6‑bit sequence (rolling over to the next millisecond when exhausted).
- Concatenates the three fields into a 64‑bit integer and returns it.
Statelessness – No external storage or coordination is required; the only state kept in memory is the last timestamp and sequence counter, both of which reset automatically when the process restarts.
Evaluation setup – The authors deployed the generator as a sidecar in Kubernetes clusters on AWS (EKS), GCP (GKE), and Azure (AKS). They measured throughput (transactions per second, TPS) and latency under varying pod counts and network loads, comparing against a classic Snowflake implementation backed by ZooKeeper.

Results & Findings

Environment	Nodes	Peak Throughput (TPS)	Avg Latency (µs)
AWS (EKS)	3	31,200	45
GCP (GKE)	3	30,800	48
Azure (AKS)	3	31,050	46

Throughput ceiling: The theoretical maximum per node (≈64 K TPS) is never reached in practice because network I/O and container scheduling dominate latency.
Scalability: Adding more nodes linearly increases aggregate TPS, confirming the “effectively unbounded” horizontal scalability claim.
Monotonicity: IDs remain k‑ordered across the entire cluster, even when pods are recreated or rescheduled, thanks to the deterministic IP‑derived node component.
Operational simplicity: No external coordination service was required, reducing deployment complexity and failure surface.

Practical Implications

Zero‑ops ID service: Teams can embed the generator directly into microservices or run it as a sidecar without provisioning ZooKeeper, etcd, or Consul.
Seamless autoscaling: As pods scale up/down, each new instance automatically obtains a unique node ID from its IP, eliminating race conditions during rapid scaling events.
Cost reduction: Removing a coordination layer cuts infrastructure spend and simplifies cloud‑agnostic CI/CD pipelines.
Compatibility with existing Snowflake IDs: The 64‑bit format and k‑ordering mean downstream systems (databases, message queues, tracing tools) can continue using the same ID parsing logic.
Edge & hybrid deployments: Since the method only needs a private IP, it works equally well on on‑prem VMs, edge devices, or multi‑cloud clusters, supporting truly distributed architectures.

Limitations & Future Work

IP address collisions: The approach assumes unique private IPs within the same subnet; overlapping CIDR ranges across clusters could cause collisions and would need additional namespace handling.
Sequence space: With only 6 bits for the per‑millisecond counter, a single node cannot exceed 64 IDs per millisecond; extremely bursty workloads might hit this ceiling.
Clock synchronization: Like all Snowflake variants, the system relies on loosely synchronized clocks; large clock skews could break monotonicity.
Future directions: The authors suggest exploring richer entropy sources (e.g., MAC address + IP hash) to expand the node‑ID space, integrating lightweight clock‑drift detection, and evaluating performance under massive (>10 k) node clusters.

Authors

Manideep Reddy Chinthareddy

Paper Information

arXiv ID: 2512.11643v1
Categories: cs.DC
Published: December 12, 2025
PDF: Download PDF

[Paper] Stateless Snowflake: A Cloud-Agnostic Distributed ID Generator Using Network-Derived Identity

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Hypergraph based Multi-Party Payment Channel

[Paper] FirecREST v2: lessons learned from redesigning an API for scalable HPC resource access

[Paper] Enhanced Pruning for Distributed Closeness Centrality under Multi-Packet Messaging

[Paper] RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training