Back Up Multiple Drives to Backblaze with Deduplication – Introducing b2-dedup

Published: 3 weeks ago (January 18, 2026 at 11:06 AM EST)

4 min read

Source: Dev.to

Introduction

Backing up terabytes of data across multiple drives, NAS boxes, or different computers? You want everything in one safe, off‑site place without paying a fortune for redundant uploads or duplicate storage.
That’s exactly why I built b2-dedup — a parallel, streaming deduplicating uploader tailored for Backblaze B2.

What You’ll Learn

Why Backblaze B2 is (in my opinion) the smartest choice for personal or large‑scale backups vs. AWS S3, Azure Blob, etc.
How deduplication across drives saves you time, bandwidth, and money
Step‑by‑step setup and usage of b2-dedup

Backblaze B2 vs. Other Cloud Storages (2025‑2026)

Service	Storage Cost (≈)	Egress Cost
B2	~$6 /TB / month (recently adjusted from $5/TB)	Free egress up to 3× your monthly stored average (unlimited free to many CDNs/partners like Cloudflare, Fastly, etc.)
AWS S3 Standard	~$23 /TB / month (first 50 TB tier)	$0.08–$0.09 / GB after free tier (restoring 1 TB ≈ $80)
Azure Blob Hot	Similar ballpark to S3 (~$18–$23 /TB)	Same as S3

Bottom line: B2 is roughly 1/4 to 1/5 the price of always‑hot, instantly accessible storage.

Other Gotchas

No upload fees, delete penalties, minimum file sizes, or hidden API‑call charges that sting on large backups (B2 keeps Class A calls free).
S3‑compatible API → works with rclone, restic, Veeam, etc.
No complex storage tiers/classes to accidentally get stuck in (unless you deliberately use Glacier/Archive for cold data).

For personal users, homelab hoarders, photographers/videographers, or small businesses doing off‑site backups, B2 wins on predictable low cost + sane egress.

Why a New Tool? – The Need for Cross‑Drive Deduplication

When you back up multiple drives (e.g., main PC SSD, external HDDs, media NAS), you often have tons of duplicate files — same photos, movies, installers, OS images across machines.

Standard tools (rclone, Duplicati, etc.) usually deduplicate within one backup job, but not across entirely separate sources.
b2-dedup fixes that:

Uses a local SQLite DB (~/b2_dedup.db) to remember SHA‑256 hashes of every file it has ever seen.
When you point it at Drive #2, it skips anything already uploaded from Drive #1.
Parallel uploads (default 10 workers, tunable) + streaming chunked uploads → low memory, high speed.
Resumable — interrupted jobs pick up where they left off.
Scan‑only / dry‑run modes for safety.

Result: One B2 bucket, many “drive‑name” prefixes (e.g., PC2025/, MediaNAS/, Laptop/) — but real storage usage is minimized because duplicates aren’t re‑uploaded.

Prerequisites

Python 3.8+
A Backblaze B2 account + bucket created
B2 Application Key (KeyID + Application Key) — generate one with Read + Write access to your bucket

Installation

# Clone the repository
git clone https://github.com/n0nag0n/b2-dedup.git
cd b2-dedup

# Install Python dependencies
pip install -r requirements.txt

Install the official B2 CLI (optional but recommended)

pip install b2
b2 account authorize   # follow prompts with your KeyID + App Key

b2-dedup will automatically use those credentials.
(Alternatively, export environment variables: B2_KEY_ID and B2_APPLICATION_KEY.)

Usage Examples

1️⃣ First drive (baseline)

# Optional: just scan & hash everything first (no upload)
python b2_dedup.py /mnt/primary-drive \
    --drive-name PrimaryPC \
    --bucket my-backup-bucket-123 \
    --scan-only

# Then do the real upload
python b2_dedup.py /mnt/primary-drive \
    --drive-name PrimaryPC \
    --bucket my-backup-bucket-123

2️⃣ Second (or Nth) drive — duplicates are skipped!

python b2_dedup.py /mnt/media-drive \
    --drive-name MediaNAS \
    --bucket my-backup-bucket-123 \
    --workers 20

Pro tip: Dry‑run first to preview

python b2_dedup.py /mnt/media-drive \
    --drive-name MediaNAS \
    --bucket my-backup-bucket-123 \
    --dry-run

Useful Flags

Flag	Description
`--workers N`	Number of parallel upload workers (default 10). Increase if your internet/upload can handle it.
`--dry-run`	Show what would be uploaded without actually sending data.
`--scan-only`	Build/populate the hash DB without touching B2.
`--refresh-count`	Force re‑count of files (useful if source changed a lot).
`--drive-name NAME`	Prefix used inside the bucket (e.g., `PrimaryPC/`). Change when you rename or reorganize drives.

How Deduplication Works

Prefix = --drive-name (e.g., PrimaryPC/Documents/report.docx).
Deduplication happens on content hash — identical files are stored only once, regardless of path/name.
The DB lives at ~/b2_dedup.db — back it up! (It’s tiny, but losing it means re‑hashing everything.)

For very large initial scans, start with --scan-only overnight, then run the upload.

Combining with Other Backup Tools

b2-dedup is purely for initial / incremental deduped uploads.
You can combine it with rclone, Borg, restic, etc., for versioning or additional features.

Benefits Recap

One cheap, durable off‑site location
Cross‑drive deduplication to slash upload time & storage bills
Parallel, resumable, low‑memory operation

I’ve been running similar setups for years — it’s rock‑solid for hoarding photos, videos, ISOs, and irreplaceable documents.

Get Started

Repository:
Star the repo if you find it useful.
Open an issue or PR for questions, bugs, or contributions.

Happy (deduplicated) backing up! 🚀