Avoid UUIDv4 Primary Keys
Source: Hacker News
Introduction
Over the last decade, when working on databases with UUID Version 41 as the primary key data type, these databases have usually had bad performance and excessive I/O.
UUID is a native data type in PostgreSQL and can be stored as binary data. Various versions are defined in the RFC. Version 4 has mostly random bits, obfuscating information like when the value was created or where it was generated.
Version 4 UUIDs are easy to generate in PostgreSQL using the gen_random_uuid()2 function since version 13 (released in 2020).
I’ve learned there are misconceptions about UUID Version 4, and sometimes these are the reasons users pick this data type.
Because of the poor performance, misconceptions, and available alternatives, I’ve come around to a simple position: Avoid UUID Version 4 for primary keys.
My more controversial take is to avoid UUIDs in general, but I understand there are some legitimate reasons for them without practical alternatives.
As a database enthusiast, I wanted to have an articulated position on this classic “Integer vs. UUID” debate.
Among database folks, debating this may be tired and clichéd. However, from my consulting work I still see databases using UUID v4 in 2024 and 2025, and the issues discussed here remain relevant.
Let’s dig in.
UUID context for this post
- UUIDs (or GUID in Microsoft terminology)3 are long strings of 36 characters (32 digits and 4 hyphens), stored as 128‑bit (16‑byte) values using the binary
uuiddata type in PostgreSQL. - The RFC documents how the 128 bits are set.
- The bits for UUID Version 4 are mostly random values.
- UUID Version 7 includes a timestamp in the first 48 bits, which works much better with database indexes compared with random values.
Although unreleased as of this writing and pulled from PostgreSQL 17 previously, UUID v7 is part of PostgreSQL 184 scheduled for release in the fall of 2025.
Scope of web app usage and their scale
The kinds of web applications in scope are monolithic web apps that use PostgreSQL as their primary OLTP database. Typical categories include social media, e‑commerce, click tracking, and business‑process automation.
The performance issues discussed are related to inefficient storage and retrieval, so they apply across these categories.
Randomness is the issue
The core issue with UUID Version 4, given that 122 bits are “random or pseudo‑randomly generated values”1 and primary keys are backed by indexes, is the impact on inserts and retrieval of individual items or ranges of values from the index.
Random values don’t have natural sorting like integers or lexicographic sorting like character strings. UUID v4s do have “byte ordering,” but this has no useful meaning for how they’re accessed.
Why choose UUIDs at all? Generating values from one or more client applications
One use case for UUIDs is when there’s a need to generate an identifier on a client or from multiple services, then pass it to PostgreSQL for persistence.
In a microservices architecture where each service has its own database, the ability to generate identifiers without collisions is a use case for UUIDs. The UUID could also identify the source database later, unlike a simple integer.
For collision avoidance (see the HN discussion5), we can’t practically make the same guarantee with sequence‑backed integers. Hacks exist—e.g., using even and odd integers on two instances or allocating distinct ranges within the int8 space—but they add complexity.
Alternative identifiers such as composite primary keys (CPKs) exist, yet a pair of values still may not uniquely identify a particular row across services.
Wikipedia describes the collision probability as:
The number of random version‑4 UUIDs which need to be generated in order to have a 50 % probability of one collision: 2.71 quintillion
Generating 1 billion UUIDs per second for about 86 years.
Misconceptions: UUIDs are secure
One misconception is that UUIDs are secure. The RFC explicitly states they should not be considered secure “capabilities.”
RFC 4122, §6 – Security Considerations
Do not assume that UUIDs are hard to guess; they should not be used as security capabilities.
Creating obfuscated values using integers
While UUID v4s obfuscate their creation time, the values can’t be ordered to see when they were created relative to each other. Similar properties can be achieved with integers with a little more work.
One option is to generate a pseudo‑random code from an integer, then expose that value externally while still using integers internally.
The full details are described in Short alphanumeric pseudo‑random identifiers in PostgreSQL6. A summary:
- Convert a decimal integer (e.g.,
2) into binary bits (e.g., a 4‑byte, 32‑bit integer:00000000 00000000 00000000 00000010). - Perform an exclusive‑OR (XOR) operation on all the bits using a secret key.
- Encode the resulting bits using a Base62 alphabet.
The obfuscated ID is stored in a generated column. Insertion‑order values might be 01Y9I, 01Y9L, then 01Y9K; alphabetically, the last two would be flipped (01Y9I first, then 01Y9K, 01Y9L).
Footnotes
-
RFC 4122 – A Universally Unique IDentifier (UUID) URN Namespace. ↩ ↩2
-
gen_random_uuid()function introduced in PostgreSQL 13. ↩ -
GUID – Globally Unique Identifier (Microsoft terminology). ↩
-
PostgreSQL 18 planned release (Fall 2025). ↩
-
Hacker News discussion on UUID collision probability. ↩
-
“Short alphanumeric pseudo random identifiers in PostgreSQL” – detailed implementation guide. ↩