From 3+ Days to 3.8 Hours: Scaling a .NET CSV Importer for SQL Server

Published: 1 month ago (January 4, 2026 at 10:29 AM EST)

4 min read

Source: Dev.to

The “Good Enough” Solution That Wasn’t

Every project has that one task: “Just load this massive CSV into the database once, and we’re done.”
That was my starting point. I had a 40 GB CSV file containing complex nested JSON structures (models, diagrams, hotspots, elements) that needed to be parsed and saved into a normalized SQL Server schema.

The first version of the importer was straightforward:

Read a row.
Parse the JSON.
Create entities.
Save to the DB.

It worked, but it took over 3 days (~92 hours) to finish. For a one‑time migration this was painful but acceptable—you launch it on Friday and hope it’s done by Monday.

Then the requirements changed

The business decided this wasn’t a one‑off event. We needed to load several more files of similar size and potentially update them regularly. Suddenly, a 3‑day runtime became a blocker. Loading a queue of files would take weeks, paralyzing analytics and development. The naïve sequential importer was no longer just slow—it was unusable for the new workflow.

The Challenge: Why Was It Slow?

Parsing and inserting data sounds simple, but at scale (40 GB, millions of complex objects) the “standard” approach hits hard limits:

Sequential processing – reading lines and parsing JSON one by one left the CPU idle while waiting for the DB, and vice‑versa.
Database round‑trips – saving entities individually (or in tiny groups) caused massive overhead. The DB spent more time managing transactions and network calls than actually storing data.
Memory pressure – loading full JsonDocument objects for every row created huge GC pressure.
Fragility – a single error after 2 days of processing could crash the whole pipeline, forcing a restart.

The Solution: High‑Performance Architecture

To meet the new “multi‑file” requirement, I redesigned the system to be parallel, batched, and resilient.

1. Controlled Parallelism with `SemaphoreSlim`

Instead of a single thread, I implemented a producer‑consumer pattern using SemaphoreSlim to limit concurrency to 8 parallel workers.

Why: It saturates the CPU and DB connection pool just enough to be fast without choking the server. Unbounded parallelism (Parallel.ForEach) would have killed the database performance.
Safety: Each worker gets its own DbContext via IDbContextFactory, ensuring thread safety without lock contention.

2. Batch Inserts via EF Core (The Big Win)

This was the most critical change. Instead of context.Add(entity); context.SaveChanges(); inside the loop, the new system accumulates entities in memory and flushes them in batches of 100+ rows.

Impact: Reduces network round‑trips by ~100× and drastically lowers transaction‑log overhead.

3. Architecture & SOLID Principles

To keep the code maintainable, I split the parsing logic into independent Processors, each responsible for a specific part of the JSON (e.g., ModelProcessor, DiagramProcessor).

SRP (Single Responsibility): Each processor handles only its slice of the domain.
DIP (Dependency Inversion): High‑level services depend on abstractions (IEntityFactory, IUnitOfWork), making the system easy to test and extend.

4. Reliability Features

Retry policies: Up to 25 retries for transient DB errors (deadlocks, timeouts).
Graceful degradation: If one processor fails on bad data, it logs the error and continues, rather than crashing the entire import.
Optimized parsing: Switched to JsonElement and TryGetProperty for faster, low‑allocation JSON traversal.

The Results: 24× Faster

The performance jump was massive, turning a “weekend task” into a “lunch‑break task”.

Metric	Original Version	Optimized Version	Improvement
Total Time (40 GB)	~92 hours (3.8 days)	~3.8 hours	~24×
Throughput	8–12 rows/sec	192–300 rows/sec	~25×
Time per 1 000 rows	83–125 sec	3–5 sec	~25×
Parallelism	1 thread	8 workers	8×
Memory Usage	2 GB+	~400 MB	~5×

Comparison of Import Performance: Original vs Optimized Architecture

Key Takeaways

Context matters: A 3‑day script is fine once. It’s fatal if you need to run it repeatedly. Always ask, “How often will we run this?”
Batching is king: In EF Core, moving from single inserts to batching is often the single most effective performance upgrade you can make.
Parallelism needs limits: Throwing 100 threads at SQL Server will just slow it down. Finding the “sweet spot” (e.g., 8 workers) is key.
Resilience is a feature: When running for hours, networks will blink and deadlocks will happen. Retry policies turn crashes into minor log warnings.

Future Plans

Add comprehensive tests (xUnit + Moq) with 85 %+ coverage for all processors.
Profile individual pipeline stages to find the next bottleneck (likely JSON‑parsing CPU time).
Expose configuration (batch size, thread count) to adapt to different server specs dynamically.

Code Repository

Check out the code: Link to GitHub Repository

[GitHub repository](https://github.com/belochka1-04/ParsCsvSaveInDb)

From 3+ Days to 3.8 Hours: Scaling a .NET CSV Importer for SQL Server

The “Good Enough” Solution That Wasn’t

Then the requirements changed

The Challenge: Why Was It Slow?

The Solution: High‑Performance Architecture

1. Controlled Parallelism with `SemaphoreSlim`

2. Batch Inserts via EF Core (The Big Win)

3. Architecture & SOLID Principles

4. Reliability Features

The Results: 24× Faster

Key Takeaways

Future Plans

Code Repository

Related posts

Why Arrays Start at Index 0: A Memory-Level Explanation

The Blazor Closure Bug That Made All My Time Slots 24:00 🕛 🕛

Surviving Breaking Changes in Evolving C# APIs, Hard Lessons and Practical Strategies for .NET Devs

C# Smart Enums: advanced

The “Good Enough” Solution That Wasn’t

Then the requirements changed

The Challenge: Why Was It Slow?

The Solution: High‑Performance Architecture

1. Controlled Parallelism with SemaphoreSlim

2. Batch Inserts via EF Core (The Big Win)

3. Architecture & SOLID Principles

4. Reliability Features

The Results: 24× Faster

Key Takeaways

Future Plans

Code Repository

Related posts

Why Arrays Start at Index 0: A Memory-Level Explanation

The Blazor Closure Bug That Made All My Time Slots 24:00 🕛 🕛

Surviving Breaking Changes in Evolving C# APIs, Hard Lessons and Practical Strategies for .NET Devs

C# Smart Enums: advanced

1. Controlled Parallelism with `SemaphoreSlim`