From 3+ Days to 3.8 Hours: Scaling a .NET CSV Importer for SQL Server

Published: (January 4, 2026 at 10:29 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Татьяна Кузнецова

The “Good Enough” Solution That Wasn’t

Every project has that one task: “Just load this massive CSV into the database once, and we’re done.”
That was my starting point. I had a 40 GB CSV file containing complex nested JSON structures (models, diagrams, hotspots, elements) that needed to be parsed and saved into a normalized SQL Server schema.

The first version of the importer was straightforward:

  1. Read a row.
  2. Parse the JSON.
  3. Create entities.
  4. Save to the DB.

It worked, but it took over 3 days (~92 hours) to finish. For a one‑time migration this was painful but acceptable—you launch it on Friday and hope it’s done by Monday.

Then the requirements changed

The business decided this wasn’t a one‑off event. We needed to load several more files of similar size and potentially update them regularly. Suddenly, a 3‑day runtime became a blocker. Loading a queue of files would take weeks, paralyzing analytics and development. The naïve sequential importer was no longer just slow—it was unusable for the new workflow.

The Challenge: Why Was It Slow?

Parsing and inserting data sounds simple, but at scale (40 GB, millions of complex objects) the “standard” approach hits hard limits:

  • Sequential processing – reading lines and parsing JSON one by one left the CPU idle while waiting for the DB, and vice‑versa.
  • Database round‑trips – saving entities individually (or in tiny groups) caused massive overhead. The DB spent more time managing transactions and network calls than actually storing data.
  • Memory pressure – loading full JsonDocument objects for every row created huge GC pressure.
  • Fragility – a single error after 2 days of processing could crash the whole pipeline, forcing a restart.

The Solution: High‑Performance Architecture

To meet the new “multi‑file” requirement, I redesigned the system to be parallel, batched, and resilient.

1. Controlled Parallelism with SemaphoreSlim

Instead of a single thread, I implemented a producer‑consumer pattern using SemaphoreSlim to limit concurrency to 8 parallel workers.

  • Why: It saturates the CPU and DB connection pool just enough to be fast without choking the server. Unbounded parallelism (Parallel.ForEach) would have killed the database performance.
  • Safety: Each worker gets its own DbContext via IDbContextFactory, ensuring thread safety without lock contention.

2. Batch Inserts via EF Core (The Big Win)

This was the most critical change. Instead of context.Add(entity); context.SaveChanges(); inside the loop, the new system accumulates entities in memory and flushes them in batches of 100+ rows.

  • Impact: Reduces network round‑trips by ~100× and drastically lowers transaction‑log overhead.

3. Architecture & SOLID Principles

To keep the code maintainable, I split the parsing logic into independent Processors, each responsible for a specific part of the JSON (e.g., ModelProcessor, DiagramProcessor).

  • SRP (Single Responsibility): Each processor handles only its slice of the domain.
  • DIP (Dependency Inversion): High‑level services depend on abstractions (IEntityFactory, IUnitOfWork), making the system easy to test and extend.

4. Reliability Features

  • Retry policies: Up to 25 retries for transient DB errors (deadlocks, timeouts).
  • Graceful degradation: If one processor fails on bad data, it logs the error and continues, rather than crashing the entire import.
  • Optimized parsing: Switched to JsonElement and TryGetProperty for faster, low‑allocation JSON traversal.

The Results: 24× Faster

The performance jump was massive, turning a “weekend task” into a “lunch‑break task”.

MetricOriginal VersionOptimized VersionImprovement
Total Time (40 GB)~92 hours (3.8 days)~3.8 hours~24×
Throughput8–12 rows/sec192–300 rows/sec~25×
Time per 1 000 rows83–125 sec3–5 sec~25×
Parallelism1 thread8 workers
Memory Usage2 GB+~400 MB~5×

Comparison of Import Performance: Original vs Optimized Architecture

Key Takeaways

  • Context matters: A 3‑day script is fine once. It’s fatal if you need to run it repeatedly. Always ask, “How often will we run this?”
  • Batching is king: In EF Core, moving from single inserts to batching is often the single most effective performance upgrade you can make.
  • Parallelism needs limits: Throwing 100 threads at SQL Server will just slow it down. Finding the “sweet spot” (e.g., 8 workers) is key.
  • Resilience is a feature: When running for hours, networks will blink and deadlocks will happen. Retry policies turn crashes into minor log warnings.

Future Plans

  • Add comprehensive tests (xUnit + Moq) with 85 %+ coverage for all processors.
  • Profile individual pipeline stages to find the next bottleneck (likely JSON‑parsing CPU time).
  • Expose configuration (batch size, thread count) to adapt to different server specs dynamically.

Code Repository

Check out the code: Link to GitHub Repository

[GitHub repository](https://github.com/belochka1-04/ParsCsvSaveInDb)
Back to Blog

Related posts

Read more »