You Can't Resize a Bloom Filter. Here's What To Do Instead.

Published: 1 month ago (December 28, 2025 at 11:34 AM EST)

7 min read

Source: Dev.to

What Are Bloom Filters?

Bloom filters are a probabilistic data structure that use a bit array and hash functions to reduce the load on main databases.
You might wonder why not just use a caching system like Redis. The answer comes down to a trade‑off between accuracy and memory efficiency:

System	Accuracy	Memory Efficiency
Basic Redis (or similar caches)	✅ Exact	❌ Not size‑efficient
Bloom filter	❓ Probabilistic (possible false positives)	✅ Very size‑efficient

Planning a Bloom Filter

Before using a bloom filter you must decide:

Size of the bit array (m) – once set, it cannot be changed.
Number of hash functions (k) – also fixed after creation.

Both parameters depend on:

n – expected number of items to store.
p – acceptable false‑positive probability.

Choosing the Number of Hash Functions

The number of hash functions cannot be too low (causing many collisions) nor too high (filling the array quickly and also causing collisions).
The optimal value is given by the formula:

[ k = \frac{m}{n},\ln 2 ]

Computing Optimal Parameters in Go

package main

import (
	"math"
)

// CalculateOptimalParams returns the optimal bit‑array size (m) and
// number of hash functions (k) for a given item count (n) and false‑positive
// rate (p).
func CalculateOptimalParams(n int, p float64) (uint, uint) {
	// 1. Ideal bit‑array size (m)
	//    m = - (n * ln(p)) / (ln(2))²
	m := -float64(n) * math.Log(p) / math.Pow(math.Log(2), 2)

	// 2. Ideal number of hash functions (k)
	//    k = (m / n) * ln(2)
	k := (m / float64(n)) * math.Log(2)

	return uint(math.Ceil(m)), uint(math.Ceil(k))
}

func main() {
	n := 1000          // Expected number of items
	p := 0.01          // Desired false‑positive rate (1%)
	m, k := CalculateOptimalParams(n, p)

	// Print the recommended parameters
	println("Recommended Bit Array Size (m):", m)
	println("Recommended Hash Functions (k):", k)
}

Core Logic of a Bloom Filter

A bloom filter answers the question “Does this item possibly exist in the set?”

If the answer is definitely NO, we can skip a database lookup.
If the answer is maybe, we must verify against the real data source.

Example Scenario: Username Availability

Traditional approach – Run a SQL query:
```
SELECT 1 FROM users WHERE username = 'ayush' LIMIT 1;
```
Even with an index, the query cost is roughly O(log n).
Bloom‑filter approach –
Check the filter first.
If it says NO, accept the username immediately.
If it says maybe, fall back to the database query.

This can dramatically reduce database load.

How a Bloom Filter Works

A bloom filter consists of two components:

Bit Array – An array of 0s and 1s (e.g., [0,0,0,0,0]).
Hash Functions – Deterministic functions that map a value to an index in the bit array.

Step‑by‑Step Walkthrough

Assume a bit array of 10 bits, all initially 0.

Step	Action	Resulting Bit Array
1. Add “Ayush”	`hash1("Ayush") = 2` `hash2("Ayush") = 7`	`[0,0,1,0,0,0,0,1,0,0]`
2. Check “Kumar”	`hash1("Kumar") = 3` `hash2("Kumar") = 4`	Bits 3 and 4 are `0` → definitely NOT present
3. Check “Anand” (a false positive)	`hash1("Anand") = 2` `hash2("Anand") = 7`	Both bits are `1` → maybe present (actually not) → fall back to DB

Go Implementation of a Simple Bloom Filter

package main

import (
	"fmt"
	"hash/fnv"
)

// BloomFilter holds the bit array and its size.
type BloomFilter struct {
	bitset []bool // 0/1 array
	size   int    // length of the array
}

// NewBloomFilter creates a filter with the given size.
func NewBloomFilter(size int) *BloomFilter {
	return &BloomFilter{
		bitset: make([]bool, size),
		size:   size,
	}
}

// hash1 – first hash function (FNV‑1a).
func (bf *BloomFilter) hash1(s string) int {
	h := fnv.New32a()
	h.Write([]byte(s))
	return int(h.Sum32()) % bf.size
}

// hash2 – second hash function (FNV‑1a with a salt).
func (bf *BloomFilter) hash2(s string) int {
	h := fnv.New32a()
	h.Write([]byte(s + "salt")) // simple salt to get a different hash
	return int(h.Sum32()) % bf.size
}

// Add inserts an item into the filter.
func (bf *BloomFilter) Add(item string) {
	i1 := bf.hash1(item)
	i2 := bf.hash2(item)
	bf.bitset[i1] = true
	bf.bitset[i2] = true
	fmt.Printf("Added: %s (indices %d, %d)\n", item, i1, i2)
}

// Exists checks whether an item *might* be in the set.
func (bf *BloomFilter) Exists(item string) bool {
	i1 := bf.hash1(item)
	i2 := bf.hash2(item)

	// Both bits must be true for a "maybe" answer.
	if bf.bitset[i1] && bf.bitset[i2] {
		return true // maybe
	}
	return false // definitely not
}

func main() {
	filter := NewBloomFilter(100)

	// 1. Add "Ayush"
	filter.Add("Ayush")

	// 2. Query "Ayush"
	fmt.Println("Does Ayush exist?", filter.Exists("Ayush")) // true (maybe)

	// 3. Query "Kumar"
	fmt.Println("Does Kumar exist?", filter.Exists("Kumar")) // false (definitely not)
}

Real‑World Uses

Service	How It Uses Bloom Filters
Medium.com	Avoid recommending articles a user has already read. If the filter says “no”, the article is shown immediately; if “maybe”, a DB check confirms.
Google Chrome	Checks URLs against a list of known malicious sites. A “no” result blocks the site instantly; a “maybe” triggers a deeper verification.
Cassandra, HBase, Redis (Bloom module)	Reduce disk I/O for read‑heavy workloads by quickly ruling out non‑existent keys.
Distributed caches (e.g., Memcached)	Prevent cache‑miss storms by pre‑filtering keys that are unlikely to be present.

Takeaways

Bloom filters trade a small, controllable false‑positive rate for massive memory savings and fast membership checks.
They are write‑only (no deletions) and static in size after creation.
Properly sizing m and k using the formulas above is crucial to achieving the desired error rate.

Feel free to adapt the Go code to your own projects, adjust the number of hash functions, or replace the simple FNV hashes with stronger alternatives (e.g., MurmurHash, SipHash) for better distribution. Happy filtering!

Bloom Filters in Practice

Web browsers – When a URL is too large or suspicious, Chrome stores it in a Bloom filter.
- On a visit, Chrome checks the filter.
- If the filter returns “maybe dangerous”, Chrome contacts Google’s servers for confirmation.
Databases (Cassandra / PostgreSQL) – Before scanning a hard‑disk block for a row, the engine checks a Bloom filter.
- If the filter says the row is absent, the expensive disk I/O is avoided.

Scaling Bloom Filters

The size of a Bloom filter (array length m and number of hash functions k) is usually chosen based on an assumed maximum number of items n and a target false‑positive rate p.
When the actual data set grows beyond the original assumption, the filter must be scaled.

Scalable Bloom Filters (SBF)

Redis Stack implements stacked Bloom filters (also called Scalable Bloom Filters).
Each new filter added to the stack is larger and strictly more accurate than the previous one.

How the Stacking Algorithm Works

A. Write Rule – Adding Data

Step	Action
1	Identify the newest (active) filter (e.g., Filter 3).
2	Insert the new element (e.g., “Ayush”) only into this active filter.
3	Older filters (Filter 1, Filter 2) remain frozen – they are never modified.

Why? Frozen filters preserve their original false‑positive guarantees; only the newest filter handles fresh inserts.

B. Read Rule – Checking Data

To answer the query “Does Ayush exist?”:

Check Filter 3 – result: No (Ayush isn’t in the newest filter).
Check Filter 2 – result: No.
Check Filter 1 – result: Yes (Ayush was added two months ago).

The final answer is the logical OR of all checks:

Result = F3.Exists() || F2.Exists() || F1.Exists()

Problems with Naïve Stacking

Every additional filter introduces its own false‑positive probability.
If each filter has a 1 % error rate, two filters give ≈ 2 % overall error, ten filters give ≈ 10 % error – an unacceptable increase.

How Redis Solves the Problem

Redis makes each successive filter tighter (lower false‑positive rate) than the one before it:

Filter	Allowed Error Rate
Filter 1	0.1 %
Filter 2	0.01 % (10× stricter)
Filter 3	0.001 % (100× stricter)
…	…

Because later filters are dramatically more precise, even a stack of 20 filters keeps the overall false‑positive rate well below 1 %.

Takeaway

Bloom filters are widely used to avoid costly look‑ups.
When the data set grows, Scalable Bloom Filters (stacked, increasingly strict filters) let you maintain a low overall error rate without rebuilding the entire structure.
Redis’s implementation demonstrates that progressively tighter filters are the key to practical, low‑error scaling.