Back Up Multiple Drives to Backblaze with Deduplication – Introducing b2-dedup

Published: (January 18, 2026 at 11:06 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Introduction

Backing up terabytes of data across multiple drives, NAS boxes, or different computers? You want everything in one safe, off‑site place without paying a fortune for redundant uploads or duplicate storage.
That’s exactly why I built b2-dedup — a parallel, streaming deduplicating uploader tailored for Backblaze B2.

What You’ll Learn

  • Why Backblaze B2 is (in my opinion) the smartest choice for personal or large‑scale backups vs. AWS S3, Azure Blob, etc.
  • How deduplication across drives saves you time, bandwidth, and money
  • Step‑by‑step setup and usage of b2-dedup

Backblaze B2 vs. Other Cloud Storages (2025‑2026)

ServiceStorage Cost (≈)Egress Cost
B2~$6 /TB / month (recently adjusted from $5/TB)Free egress up to 3× your monthly stored average (unlimited free to many CDNs/partners like Cloudflare, Fastly, etc.)
AWS S3 Standard~$23 /TB / month (first 50 TB tier)$0.08–$0.09 / GB after free tier (restoring 1 TB ≈ $80)
Azure Blob HotSimilar ballpark to S3 (~$18–$23 /TB)Same as S3

Bottom line: B2 is roughly 1/4 to 1/5 the price of always‑hot, instantly accessible storage.

Other Gotchas

  • No upload fees, delete penalties, minimum file sizes, or hidden API‑call charges that sting on large backups (B2 keeps Class A calls free).
  • S3‑compatible API → works with rclone, restic, Veeam, etc.
  • No complex storage tiers/classes to accidentally get stuck in (unless you deliberately use Glacier/Archive for cold data).

For personal users, homelab hoarders, photographers/videographers, or small businesses doing off‑site backups, B2 wins on predictable low cost + sane egress.

Why a New Tool? – The Need for Cross‑Drive Deduplication

When you back up multiple drives (e.g., main PC SSD, external HDDs, media NAS), you often have tons of duplicate files — same photos, movies, installers, OS images across machines.

Standard tools (rclone, Duplicati, etc.) usually deduplicate within one backup job, but not across entirely separate sources.
b2-dedup fixes that:

  • Uses a local SQLite DB (~/b2_dedup.db) to remember SHA‑256 hashes of every file it has ever seen.
  • When you point it at Drive #2, it skips anything already uploaded from Drive #1.
  • Parallel uploads (default 10 workers, tunable) + streaming chunked uploads → low memory, high speed.
  • Resumable — interrupted jobs pick up where they left off.
  • Scan‑only / dry‑run modes for safety.

Result: One B2 bucket, many “drive‑name” prefixes (e.g., PC2025/, MediaNAS/, Laptop/) — but real storage usage is minimized because duplicates aren’t re‑uploaded.

Prerequisites

  • Python 3.8+
  • A Backblaze B2 account + bucket created
  • B2 Application Key (KeyID + Application Key) — generate one with Read + Write access to your bucket

Installation

# Clone the repository
git clone https://github.com/n0nag0n/b2-dedup.git
cd b2-dedup

# Install Python dependencies
pip install -r requirements.txt
pip install b2
b2 account authorize   # follow prompts with your KeyID + App Key

b2-dedup will automatically use those credentials.
(Alternatively, export environment variables: B2_KEY_ID and B2_APPLICATION_KEY.)

Usage Examples

1️⃣ First drive (baseline)

# Optional: just scan & hash everything first (no upload)
python b2_dedup.py /mnt/primary-drive \
    --drive-name PrimaryPC \
    --bucket my-backup-bucket-123 \
    --scan-only
# Then do the real upload
python b2_dedup.py /mnt/primary-drive \
    --drive-name PrimaryPC \
    --bucket my-backup-bucket-123

2️⃣ Second (or Nth) drive — duplicates are skipped!

python b2_dedup.py /mnt/media-drive \
    --drive-name MediaNAS \
    --bucket my-backup-bucket-123 \
    --workers 20

Pro tip: Dry‑run first to preview

python b2_dedup.py /mnt/media-drive \
    --drive-name MediaNAS \
    --bucket my-backup-bucket-123 \
    --dry-run

Useful Flags

FlagDescription
--workers NNumber of parallel upload workers (default 10). Increase if your internet/upload can handle it.
--dry-runShow what would be uploaded without actually sending data.
--scan-onlyBuild/populate the hash DB without touching B2.
--refresh-countForce re‑count of files (useful if source changed a lot).
--drive-name NAMEPrefix used inside the bucket (e.g., PrimaryPC/). Change when you rename or reorganize drives.

How Deduplication Works

  • Prefix = --drive-name (e.g., PrimaryPC/Documents/report.docx).
  • Deduplication happens on content hash — identical files are stored only once, regardless of path/name.
  • The DB lives at ~/b2_dedup.db — back it up! (It’s tiny, but losing it means re‑hashing everything.)

For very large initial scans, start with --scan-only overnight, then run the upload.

Combining with Other Backup Tools

b2-dedup is purely for initial / incremental deduped uploads.
You can combine it with rclone, Borg, restic, etc., for versioning or additional features.

Benefits Recap

  • One cheap, durable off‑site location
  • Cross‑drive deduplication to slash upload time & storage bills
  • Parallel, resumable, low‑memory operation

I’ve been running similar setups for years — it’s rock‑solid for hoarding photos, videos, ISOs, and irreplaceable documents.

Get Started

  • Repository:
  • Star the repo if you find it useful.
  • Open an issue or PR for questions, bugs, or contributions.

Happy (deduplicated) backing up! 🚀

Back to Blog

Related posts

Read more »