[Paper] Stateless Snowflake: A Cloud-Agnostic Distributed ID Generator Using Network-Derived Identity

Published: (December 12, 2025 at 10:21 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.11643v1

Overview

The paper introduces Stateless Snowflake, a cloud‑agnostic ID generation protocol that removes the need for manually assigned or centrally coordinated worker IDs—an Achilles’ heel of classic Snowflake generators. By extracting a node’s uniqueness from its container’s private IPv4 address, the design works seamlessly in modern, autoscaling environments like Kubernetes, delivering high‑throughput, k‑ordered IDs without any external coordination service.

Key Contributions

  • Network‑derived identity: Uses the container’s private IP address as a deterministic source of uniqueness, eliminating the need for static worker IDs or ZooKeeper‑style coordination.
  • Modified bit layout (1‑41‑16‑6): Allocates 16 bits for the IP‑derived entropy while preserving monotonic timestamps and sequence counters, enabling up to 64 K IDs per millisecond per node.
  • Cloud‑agnostic implementation: Validated on AWS, GCP, and Azure, proving the approach works across major public clouds without cloud‑specific tweaks.
  • Stateless microservice friendliness: The generator can be packaged as a lightweight library or sidecar that requires no persistent state, fitting naturally into container‑native deployment pipelines.
  • Performance parity with stateful generators: Achieves ~31 K IDs/sec on a 3‑node cluster, comparable to traditional Snowflake while offering virtually unlimited horizontal scaling.

Methodology

  1. Deriving uniqueness – When a container starts, the library reads its private IPv4 address (e.g., 10.0.3.5). The address is hashed and truncated to 16 bits, providing a node‑specific identifier that is guaranteed to be unique within the same VPC/subnet.
  2. Bit allocation – The 64‑bit Snowflake ID is split as follows:
    • 1 sign bit (always 0)
    • 41 bits for a millisecond‑precision timestamp (relative to a custom epoch)
    • 16 bits for the network‑derived node ID
    • 6 bits for a per‑millisecond sequence counter (max 64 IDs per ms per node)
  3. Generation flow – On each request, the generator:
    • Reads the current timestamp.
    • If the timestamp matches the previous call, increments the 6‑bit sequence (rolling over to the next millisecond when exhausted).
    • Concatenates the three fields into a 64‑bit integer and returns it.
  4. Statelessness – No external storage or coordination is required; the only state kept in memory is the last timestamp and sequence counter, both of which reset automatically when the process restarts.
  5. Evaluation setup – The authors deployed the generator as a sidecar in Kubernetes clusters on AWS (EKS), GCP (GKE), and Azure (AKS). They measured throughput (transactions per second, TPS) and latency under varying pod counts and network loads, comparing against a classic Snowflake implementation backed by ZooKeeper.

Results & Findings

EnvironmentNodesPeak Throughput (TPS)Avg Latency (µs)
AWS (EKS)331,20045
GCP (GKE)330,80048
Azure (AKS)331,05046
  • Throughput ceiling: The theoretical maximum per node (≈64 K TPS) is never reached in practice because network I/O and container scheduling dominate latency.
  • Scalability: Adding more nodes linearly increases aggregate TPS, confirming the “effectively unbounded” horizontal scalability claim.
  • Monotonicity: IDs remain k‑ordered across the entire cluster, even when pods are recreated or rescheduled, thanks to the deterministic IP‑derived node component.
  • Operational simplicity: No external coordination service was required, reducing deployment complexity and failure surface.

Practical Implications

  • Zero‑ops ID service: Teams can embed the generator directly into microservices or run it as a sidecar without provisioning ZooKeeper, etcd, or Consul.
  • Seamless autoscaling: As pods scale up/down, each new instance automatically obtains a unique node ID from its IP, eliminating race conditions during rapid scaling events.
  • Cost reduction: Removing a coordination layer cuts infrastructure spend and simplifies cloud‑agnostic CI/CD pipelines.
  • Compatibility with existing Snowflake IDs: The 64‑bit format and k‑ordering mean downstream systems (databases, message queues, tracing tools) can continue using the same ID parsing logic.
  • Edge & hybrid deployments: Since the method only needs a private IP, it works equally well on on‑prem VMs, edge devices, or multi‑cloud clusters, supporting truly distributed architectures.

Limitations & Future Work

  • IP address collisions: The approach assumes unique private IPs within the same subnet; overlapping CIDR ranges across clusters could cause collisions and would need additional namespace handling.
  • Sequence space: With only 6 bits for the per‑millisecond counter, a single node cannot exceed 64 IDs per millisecond; extremely bursty workloads might hit this ceiling.
  • Clock synchronization: Like all Snowflake variants, the system relies on loosely synchronized clocks; large clock skews could break monotonicity.
  • Future directions: The authors suggest exploring richer entropy sources (e.g., MAC address + IP hash) to expand the node‑ID space, integrating lightweight clock‑drift detection, and evaluating performance under massive (>10 k) node clusters.

Authors

  • Manideep Reddy Chinthareddy

Paper Information

  • arXiv ID: 2512.11643v1
  • Categories: cs.DC
  • Published: December 12, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »