You Can't Resize a Bloom Filter. Here's What To Do Instead.

Published: (December 28, 2025 at 11:34 AM EST)
7 min read
Source: Dev.to

Source: Dev.to

What Are Bloom Filters?

Bloom filters are a probabilistic data structure that use a bit array and hash functions to reduce the load on main databases.
You might wonder why not just use a caching system like Redis. The answer comes down to a trade‑off between accuracy and memory efficiency:

SystemAccuracyMemory Efficiency
Basic Redis (or similar caches)✅ Exact❌ Not size‑efficient
Bloom filter❓ Probabilistic (possible false positives)✅ Very size‑efficient

Planning a Bloom Filter

Before using a bloom filter you must decide:

  1. Size of the bit array (m) – once set, it cannot be changed.
  2. Number of hash functions (k) – also fixed after creation.

Both parameters depend on:

  • n – expected number of items to store.
  • p – acceptable false‑positive probability.

Choosing the Number of Hash Functions

The number of hash functions cannot be too low (causing many collisions) nor too high (filling the array quickly and also causing collisions).
The optimal value is given by the formula:

[ k = \frac{m}{n},\ln 2 ]

Computing Optimal Parameters in Go

package main

import (
	"math"
)

// CalculateOptimalParams returns the optimal bit‑array size (m) and
// number of hash functions (k) for a given item count (n) and false‑positive
// rate (p).
func CalculateOptimalParams(n int, p float64) (uint, uint) {
	// 1. Ideal bit‑array size (m)
	//    m = - (n * ln(p)) / (ln(2))²
	m := -float64(n) * math.Log(p) / math.Pow(math.Log(2), 2)

	// 2. Ideal number of hash functions (k)
	//    k = (m / n) * ln(2)
	k := (m / float64(n)) * math.Log(2)

	return uint(math.Ceil(m)), uint(math.Ceil(k))
}

func main() {
	n := 1000          // Expected number of items
	p := 0.01          // Desired false‑positive rate (1%)
	m, k := CalculateOptimalParams(n, p)

	// Print the recommended parameters
	println("Recommended Bit Array Size (m):", m)
	println("Recommended Hash Functions (k):", k)
}

Core Logic of a Bloom Filter

A bloom filter answers the question “Does this item possibly exist in the set?”

If the answer is definitely NO, we can skip a database lookup.
If the answer is maybe, we must verify against the real data source.

Example Scenario: Username Availability

  1. Traditional approach – Run a SQL query:

    SELECT 1 FROM users WHERE username = 'ayush' LIMIT 1;

    Even with an index, the query cost is roughly O(log n).

  2. Bloom‑filter approach
    Check the filter first.
    If it says NO, accept the username immediately.
    If it says maybe, fall back to the database query.

This can dramatically reduce database load.

How a Bloom Filter Works

A bloom filter consists of two components:

  1. Bit Array – An array of 0s and 1s (e.g., [0,0,0,0,0]).
  2. Hash Functions – Deterministic functions that map a value to an index in the bit array.

Step‑by‑Step Walkthrough

Assume a bit array of 10 bits, all initially 0.

StepActionResulting Bit Array
1. Add “Ayush”hash1("Ayush") = 2
hash2("Ayush") = 7
[0,0,1,0,0,0,0,1,0,0]
2. Check “Kumar”hash1("Kumar") = 3
hash2("Kumar") = 4
Bits 3 and 4 are 0definitely NOT present
3. Check “Anand” (a false positive)hash1("Anand") = 2
hash2("Anand") = 7
Both bits are 1maybe present (actually not) → fall back to DB

Go Implementation of a Simple Bloom Filter

package main

import (
	"fmt"
	"hash/fnv"
)

// BloomFilter holds the bit array and its size.
type BloomFilter struct {
	bitset []bool // 0/1 array
	size   int    // length of the array
}

// NewBloomFilter creates a filter with the given size.
func NewBloomFilter(size int) *BloomFilter {
	return &BloomFilter{
		bitset: make([]bool, size),
		size:   size,
	}
}

// hash1 – first hash function (FNV‑1a).
func (bf *BloomFilter) hash1(s string) int {
	h := fnv.New32a()
	h.Write([]byte(s))
	return int(h.Sum32()) % bf.size
}

// hash2 – second hash function (FNV‑1a with a salt).
func (bf *BloomFilter) hash2(s string) int {
	h := fnv.New32a()
	h.Write([]byte(s + "salt")) // simple salt to get a different hash
	return int(h.Sum32()) % bf.size
}

// Add inserts an item into the filter.
func (bf *BloomFilter) Add(item string) {
	i1 := bf.hash1(item)
	i2 := bf.hash2(item)
	bf.bitset[i1] = true
	bf.bitset[i2] = true
	fmt.Printf("Added: %s (indices %d, %d)\n", item, i1, i2)
}

// Exists checks whether an item *might* be in the set.
func (bf *BloomFilter) Exists(item string) bool {
	i1 := bf.hash1(item)
	i2 := bf.hash2(item)

	// Both bits must be true for a "maybe" answer.
	if bf.bitset[i1] && bf.bitset[i2] {
		return true // maybe
	}
	return false // definitely not
}

func main() {
	filter := NewBloomFilter(100)

	// 1. Add "Ayush"
	filter.Add("Ayush")

	// 2. Query "Ayush"
	fmt.Println("Does Ayush exist?", filter.Exists("Ayush")) // true (maybe)

	// 3. Query "Kumar"
	fmt.Println("Does Kumar exist?", filter.Exists("Kumar")) // false (definitely not)
}

Real‑World Uses

ServiceHow It Uses Bloom Filters
Medium.comAvoid recommending articles a user has already read. If the filter says “no”, the article is shown immediately; if “maybe”, a DB check confirms.
Google ChromeChecks URLs against a list of known malicious sites. A “no” result blocks the site instantly; a “maybe” triggers a deeper verification.
Cassandra, HBase, Redis (Bloom module)Reduce disk I/O for read‑heavy workloads by quickly ruling out non‑existent keys.
Distributed caches (e.g., Memcached)Prevent cache‑miss storms by pre‑filtering keys that are unlikely to be present.

Takeaways

  • Bloom filters trade a small, controllable false‑positive rate for massive memory savings and fast membership checks.
  • They are write‑only (no deletions) and static in size after creation.
  • Properly sizing m and k using the formulas above is crucial to achieving the desired error rate.

Feel free to adapt the Go code to your own projects, adjust the number of hash functions, or replace the simple FNV hashes with stronger alternatives (e.g., MurmurHash, SipHash) for better distribution. Happy filtering!

Bloom Filters in Practice

  • Web browsers – When a URL is too large or suspicious, Chrome stores it in a Bloom filter.

    • On a visit, Chrome checks the filter.
    • If the filter returns “maybe dangerous”, Chrome contacts Google’s servers for confirmation.
  • Databases (Cassandra / PostgreSQL) – Before scanning a hard‑disk block for a row, the engine checks a Bloom filter.

    • If the filter says the row is absent, the expensive disk I/O is avoided.

Scaling Bloom Filters

The size of a Bloom filter (array length m and number of hash functions k) is usually chosen based on an assumed maximum number of items n and a target false‑positive rate p.
When the actual data set grows beyond the original assumption, the filter must be scaled.

Scalable Bloom Filters (SBF)

Redis Stack implements stacked Bloom filters (also called Scalable Bloom Filters).
Each new filter added to the stack is larger and strictly more accurate than the previous one.

How the Stacking Algorithm Works

A. Write Rule – Adding Data

StepAction
1Identify the newest (active) filter (e.g., Filter 3).
2Insert the new element (e.g., “Ayush”) only into this active filter.
3Older filters (Filter 1, Filter 2) remain frozen – they are never modified.

Why? Frozen filters preserve their original false‑positive guarantees; only the newest filter handles fresh inserts.

B. Read Rule – Checking Data

To answer the query “Does Ayush exist?”:

  1. Check Filter 3 – result: No (Ayush isn’t in the newest filter).
  2. Check Filter 2 – result: No.
  3. Check Filter 1 – result: Yes (Ayush was added two months ago).

The final answer is the logical OR of all checks:

Result = F3.Exists() || F2.Exists() || F1.Exists()

Problems with Naïve Stacking

  • Every additional filter introduces its own false‑positive probability.
  • If each filter has a 1 % error rate, two filters give ≈ 2 % overall error, ten filters give ≈ 10 % error – an unacceptable increase.

How Redis Solves the Problem

Redis makes each successive filter tighter (lower false‑positive rate) than the one before it:

FilterAllowed Error Rate
Filter 10.1 %
Filter 20.01 % (10× stricter)
Filter 30.001 % (100× stricter)

Because later filters are dramatically more precise, even a stack of 20 filters keeps the overall false‑positive rate well below 1 %.

Takeaway

  • Bloom filters are widely used to avoid costly look‑ups.
  • When the data set grows, Scalable Bloom Filters (stacked, increasingly strict filters) let you maintain a low overall error rate without rebuilding the entire structure.
  • Redis’s implementation demonstrates that progressively tighter filters are the key to practical, low‑error scaling.
Back to Blog

Related posts

Read more »

Avoid UUIDv4 Primary Keys

Introduction Over the last decade, when working on databases with UUID Version 4^1 as the primary key data type, these databases have usually had bad performan...

Stop Writing APIs Like It's 2015

We're in 2025, and many codebases still treat APIs as simple “endpoints that return JSON.” If your API design hasn’t evolved past basic CRUD routes, you’re sacr...