How a 'Simple' QR Code Generator Ate All My RAM: A Tale of 50,000 QR Codes
Source: Dev.to
What Went Wrong
My initial script pre‑generated every QR code, cached them all in memory, and then assembled the PDF. It looked logical:
def generate_pdf(output_path: str, total: int = 50000):
ids = generate_unique_ids(total)
# Pre‑generate ALL QR codes in parallel for "speed"
print(f"Pre-generating {total} QR codes in parallel...")
num_workers = cpu_count()
# Split IDs into batches for parallel processing
batch_size = max(1, total // (num_workers * 4))
batches = [ids[i:i + batch_size] for i in range(0, len(ids), batch_size)]
# Generate QR codes in parallel using multiprocessing
qr_cache = {}
with Pool(num_workers) as pool:
results = list(tqdm(
pool.imap(generate_qr_batch, batches),
total=len(batches),
desc="Generating QR codes"
))
# Store ALL images in memory
for batch_result in results:
for uid, img_bytes in batch_result:
buf = io.BytesIO(img_bytes)
qr_cache[uid] = ImageReader(buf)
# NOW create the PDF using cached images
# ... PDF generation code ...
I was proud of the buzzwords: multiprocessing, parallel execution, batch processing.
When I ran it, the progress bars moved, CPU hit 100 % on all cores, and—then—my laptop (16 GB RAM) started choking:
2 GB...
4 GB...
8 GB...
12 GB...
The OOM killer terminated the process. No PDF, just a frozen machine and a hard‑earned lesson.
Why It Blew Up
| Item | Approx. Size |
|---|---|
| QR code (400 × 400 px) as PNG (compressed) | 15‑30 KB |
QR code as a PIL.Image object | ~500 KB – 1 MB |
| 50 000 QR codes × 500 KB ≈ | ~25 GB RAM |
| 50 000 PNG bytes × 20 KB ≈ | ~1 GB RAM |
+ ImageReader objects, BytesIO buffers, Python overhead, multiprocessing duplication | 2‑4 GB observed |
Even the compressed bytes would have exhausted my RAM, and the parallel workers duplicated data, pushing usage even higher.
The fundamental flaw: optimizing for speed while ignoring resource consumption.
The Simple Fix – Stream‑Based Processing
Instead of loading everything at once, process one PDF page at a time (30 QR codes per page). Keep only the current page’s images in memory.
def generate_pdf(output_path: str, total: int = 50000):
ids = generate_unique_ids(total)
total_pages = (total + PER_PAGE - 1) // PER_PAGE
# Create PDF canvas
c = canvas.Canvas(output_path, pagesize=A4)
# Process ONE PAGE at a time
for page_start in tqdm(range(0, total, PER_PAGE), desc="Generating PDF pages"):
page_ids = ids[page_start : page_start + PER_PAGE]
# Generate QR codes ONLY for this page
page_qr_cache = {}
for uid in page_ids:
img = make_qr_image(uid)
page_qr_cache[uid] = img_to_reader(img)
# Draw this page
for idx, uid in enumerate(page_ids):
# ... draw QR code to PDF ...
c.drawImage(page_qr_cache[uid], qr_x, qr_y, ...)
c.showPage()
# CRITICAL: Clear the cache after each page!
page_qr_cache.clear()
c.save()
Key changes
- Generate per‑page – only 30 QR codes reside in memory at any moment.
- Explicit cache clearing after each page.
- Removed multiprocessing – eliminates data duplication and simplifies the flow.
Trade‑offs: Original vs. Optimized
| Metric | Original (Parallel) | Optimized (Per‑Page) |
|---|---|---|
| Memory Usage | 2‑4 GB | 50‑100 MB |
| Speed | Faster (theoretically) | Slower (sequential) |
| Stability | Crashes on large datasets | Stable |
| Scalability | Limited by RAM | Limited by disk space |
Yes, the new version is slower. Generating 50 000 QR codes sequentially took 30‑45 minutes—still far better than a crash before completion. As the adage goes, a slow script that finishes is infinitely faster than a fast script that never finishes.
Takeaways
- Think at scale – a script that works for 100 items may explode at 10 000.
- Prefer streaming over bulk loading when dealing with large datasets.
- Measure memory, not just CPU; parallelism can amplify RAM usage.
- Simple is often best – removing unnecessary complexity (multiprocessing, massive caches) can make a script robust.
I ran the optimized version overnight. When I woke up, both PDF files (100 000 QR codes total) were ready, and my computer was still breathing easy.
If you ever find yourself tempted to “pre‑compute everything,” pause and ask: What happens when this scales 10×? 100×? 1000×?
The Memory‑Bomb Problem
When you scale up a script, memory issues can become catastrophic. Every object you create lives somewhere in memory, and image objects can be surprisingly large.
Example of a hidden memory hog
# This innocent‑looking line...
qr_cache[uid] = ImageReader(buf)
# ...executed 50,000 times becomes a memory bomb
Parallel processing is great for CPU‑bound tasks if you have enough memory for multiple workers. When each worker creates large objects, parallelism can actually multiply memory usage, making things worse. Sometimes a simple sequential loop is the right answer.
Tip: Python’s garbage collector is helpful, but not magical. If you keep references to large objects in a dictionary or list, that memory won’t be freed until you explicitly remove the references.
# This single line saved gigabytes of RAM
page_qr_cache.clear()
Use a progress bar
When running long‑running tasks, always add a progress bar. The tqdm library makes this trivial:
from tqdm import tqdm
for page_start in tqdm(
range(0, total, PER_PAGE),
desc="Generating PDF pages"
):
# ... your code ...
A progress bar gives you feedback on how long the task will take and helps you spot stalls.
Three Questions to Ask Before Scaling
- What’s the memory footprint per item?
- How many items will I process?
- Can I process items one at a time instead of all at once?
These questions are especially important for:
- Image processing: images are memory‑hungry.
- Data pipelines: large CSV/JSON files.
- API responses: paginating through thousands of records.
- File operations: reading/writing big files.
Pattern: stream when you can, batch when you must, and never load everything into memory unless you absolutely have to.
Practical Tips
Avoid building huge in‑memory lists
# Bad: creates a list of 50,000 items in memory
ids = [generate_id() for _ in range(50_000)]
# Better: generates one at a time
def id_generator(count):
for _ in range(count):
yield generate_id()
Process items incrementally
# Instead of processing all at once
for item in huge_list:
process(item)
# Process in manageable chunks
chunk_size = 100
for i in range(0, len(huge_list), chunk_size):
chunk = huge_list[i:i + chunk_size]
for item in chunk:
process(item)
# Clean up after each chunk
import gc
gc.collect() # Force garbage collection if needed
Monitor memory usage
import psutil, os
def get_memory_usage():
process = psutil.Process(os.getpid())
# Return MB
return process.memory_info().rss / 1024 / 1024
# In your loop
for i, item in enumerate(items):
process(item)
if i % 1_000 == 0:
print(f"Processed {i} items, Memory: {get_memory_usage():.1f} MB")
Set memory limits (Unix)
import resource
# Limit memory to 1 GB
resource.setrlimit(resource.RLIMIT_AS, (1_024 * 1_024 * 1_024, -1))
Takeaway
My “simple” QR‑code generator turned into a valuable lesson about resource management. The original code was clever—parallel processing, batch operations, caching—but clever code that doesn’t work is worse than simple code that does.
The final version:
- Generates 100 000 QR codes across two PDF files.
- Takes about one hour to run.
- Uses …
Remember: think about memory first, speed second. A slow script that finishes is infinitely more valuable than a fast script that crashes.
TL;DR
I tried to generate 50 000 QR codes by loading them all into memory at once. My computer ran out of RAM and crashed.
Fix: generate QR codes page‑by‑page (e.g., 30 at a time). It’s slower, but it works.
Lesson: always consider memory usage when working with data at scale.