Stop Crashing Node.js: How to Process 10GB Files with 15MB of RAM
Source: Dev.to
The Problem: The “Array.map()” Trap
Most developers process data like this:
const data = JSON.parse(fs.readFileSync('huge-file.json')); // ❌ Memory spikes here
const processed = data.map(record => transform(record)); // ❌ Memory doubles here
fs.writeFileSync('output.json', JSON.stringify(processed));This works for small files, but it scales linearly. A 1 GB file would require at least 2 GB of RAM just to hold the input and output.
The Solution: Constant Memory (O(1))
Data‑Genie treats data as a continuous stream. Instead of loading an array, it uses Async Iterators to pull one record at a time, transform it, and push it to the destination.
The result? You can process a 100 GB file using the same amount of RAM as a 100 KB file.
Data Size Comparison
| Data Size | Naïve Approach (Array‑based) | Data‑Genie (Streaming) |
|---|---|---|
| 100 KB | ~10 MB RAM | ~10 MB RAM |
| 100 MB | ~150 MB RAM | ~12 MB RAM |
| 10 GB | CRASH (OOM) | ~15 MB RAM |
Unified API for Different Formats
import { CSVReader, SQLWriter, Job } from '@pujansrt/data-genie';
const reader = new CSVReader('input.csv');
const writer = new SQLWriter(db, 'users');
await Job.run(reader, writer);Whether your data is in CSV, JSON, Excel, Parquet, or a SQL database, the code looks the same.
Built‑in Dead Letter Queues (DLQ)
In the real world, data is “dirty.” A single malformed row can crash an entire job. Data‑Genie includes built‑in DLQs that automatically divert failed records to a “poison” file while the main job continues.
import { z } from 'zod';
import { CSVReader, JsonWriter, SchemaValidatingReader } from '@pujansrt/data-genie';
const schema = z.object({
id: z.coerce.number(),
email: z.string().email(),
});
const reader = new CSVReader('input.csv');
const validator = new SchemaValidatingReader(reader, schema)
.setDLQ(new JsonWriter('failed_rows.json'));Real‑time Progress with EventEmitter
The latest update turns the Job class into an EventEmitter, allowing you to build progress bars or dashboards without polling.
import { Job, CSVReader, JsonWriter } from '@pujansrt/data-genie';
const job = new Job(new CSVReader('users.csv'), new JsonWriter('output.json'));
job.on('progress', (metrics) => {
console.log(`Processed ${metrics.recordCount} records...`);
});
await job.run();Installation
npm install @pujansrt/data-genieExample Usage
import { CSVReader, JsonWriter, Job } from '@pujansrt/data-genie';
const reader = new CSVReader('users.csv');
const writer = new JsonWriter('output.json');
(async () => {
const metrics = await Job.run(reader, writer);
console.log(`Processed ${metrics.recordCount} records in ${metrics.durationMs} ms`);
})();Further Resources
- GitHub repository: https://github.com/pujansrt/data-genie
- Full documentation: https://pujansrt.github.io/data-genie/