JSON Parsing for Large Payloads: Balancing Speed, Memory, and Scalability

Published: 1 month ago (December 15, 2025 at 05:55 PM EST)

8 min read

Source: Dev.to

Introduction

Imagine that the marketing campaign you set up for Black Friday was a massive success, and customers start pouring into your website. Your Mixpanel setup, which would usually have around 1 000 customer events an hour, ends up receiving millions of events within the same hour. Consequently, your data pipeline is now tasked with parsing vast amounts of JSON data and storing it in your database.

Your standard JSON parsing library can’t keep up with the sudden data growth, and your near‑real‑time analytics reports fall behind. This is when you realize the importance of an efficient JSON parsing library. In addition to handling large payloads, a good library should be able to serialize and deserialize highly nested JSON structures.

In this article we explore Python parsing libraries for large payloads. We specifically look at the capabilities of ujson, orjson, and ijson, and we benchmark the standard library (json), ujson, and orjson for serialization and deserialization performance.

Serialization = converting Python objects to a JSON string.
Deserialization = rebuilding Python objects from a JSON string.

A decision‑flow diagram (shown later) helps you choose the right parser for your workflow. We also cover NDJSON and libraries that can parse NDJSON payloads. Let’s get started.

Stdlib `json`

The standard library supports serialization for all basic Python data types (dicts, lists, tuples, etc.). When you call json.loads(), the entire JSON document is loaded into memory at once. This works fine for small payloads, but for large payloads it can cause:

Out‑of‑memory errors
Choking of downstream workflows

import json

with open("large_payload.json", "r") as f:
    json_data = json.loads(f)   # loads entire file into memory, all tokens at once

`ijson`

For payloads in the hundreds of megabytes range, ijson (short for iterative json) reads files one token at a time, avoiding the memory overhead of loading the whole document.

import ijson

with open("json_data.json", "r") as f:
    # fetch one dict from the array at a time
    for record in ijson.items(f, "items.item"):
        process(record)   # the ijson library reads records one token at a time

ijson therefore streams each element, converts it to a Python dict, and hands it to your processing function (process(record)).

A high‑level illustration of ijson

`ujson`

Ujson – Under the Hood

ujson has long been a popular choice for large JSON payloads because it is a C‑based implementation with Python bindings, making it considerably faster than the pure‑Python json module.

Note: The maintainers have placed ujson in maintenance‑only mode, so new projects typically prefer orjson.

import ujson

taxonomy_data = (
    '{"id":1, "genus":"Thylacinus", "species":"cynocephalus", "extinct": true}'
)

# Deserialize
data_dict = ujson.loads(taxonomy_data)

# Serialize
with open("taxonomy_data.json", "w") as fh:
    ujson.dump(data_dict, fh)

# Deserialize again
with open("taxonomy_data.json", "r") as fh:
    data = ujson.load(fh)
    print(data)

`orjson`

orjson is written in Rust, giving it both speed and memory‑safety guarantees that C‑based libraries (like ujson) lack. It also supports serializing additional Python types such as dataclass and datetime.

A key difference: orjson.dumps() returns bytes, whereas the other libraries return a string. Returning bytes eliminates an extra encoding step, contributing to orjson’s high throughput.

import json
import orjson

# Example payload
book_payload = (
    '{"id":1,"name":"The Great Gatsby","author":"F. Scott Fitzgerald"}'
)

# Serialize to bytes
json_bytes = orjson.dumps(json.loads(book_payload))

# Deserialize back to a Python object
obj = orjson.loads(json_bytes)
print(obj)

Decision Flow Diagram

Below is a simplified flow to help you pick the right parser:

               +-------------------+
               |  Payload size?    |
               +--------+----------+
                        |
          +-------------+-------------+
          |                           |
    100 MB)** – stream with `ijson`.

NDJSON (Newline‑Delimited JSON)

When dealing with log‑style data, NDJSON is often a better fit because each line is a valid JSON document. You can parse NDJSON with:

Standard json – read line‑by‑line.
orjson – fast line‑by‑line deserialization (orjson.loads(line)).
ijson – also works, but the line‑by‑line approach is usually simpler.

import orjson

with open("events.ndjson", "r") as f:
    for line in f:
        event = orjson.loads(line)
        process(event)

Summary

Library	Language	Speed	Memory usage	Streaming support	Extra features
`json` (stdlib)	Python (C)	Baseline	High (loads whole doc)	No	None
`ujson`	C	Fast	Moderate (loads whole doc)	No	Maintenance‑only
`orjson`	Rust	Fastest	Low (bytes output)	No	Dataclass, datetime, UUID, etc.
`ijson`	Python (C)	Moderate (streaming)	Very low	Yes	Event‑based parsing

For most new projects:

Use orjson for speed and extra type support when the payload fits in memory.
Switch to ijson for truly massive payloads or when you need to process data incrementally.

Happy parsing!

JSON Parsing and Serialization with `json`, `ujson`, and `orjson`

import json
import ujson
import orjson

# Sample JSON payload
book_payload = '{"Title":"The Great Gatsby","Author":"F. Scott Fitzgerald","Publishing House":"Charles Scribner\'s Sons"}'

# Deserialize with orjson
data_dict = orjson.loads(book_payload)
print(data_dict)

# Serialize to a file
with open("book_data.json", "wb") as f:
    f.write(orjson.dumps(data_dict))   # Returns a bytes object

# Deserialize from the file
with open("book_data.json", "rb") as f:
    book_data = orjson.loads(f.read())
    print(book_data)

Testing Serialization Capabilities of `json`, `ujson`, and `orjson`

We create a sample dataclass object that contains an integer, a string, and a datetime value.

from dataclasses import dataclass
from datetime import datetime

@dataclass
class User:
    id: int
    name: str
    created: datetime

u = User(id=1, name="Thomas", created=datetime.now())

1. Standard Library `json`

import json

try:
    print("json:", json.dumps(u))
except TypeError as e:
    print("json error:", e)

Result: json raises a TypeError because it cannot serialize dataclass instances or datetime objects.

2. `ujson`

import ujson

try:
    print("ujson:", ujson.dumps(u))
except TypeError as e:
    print("ujson error:", e)

Result: ujson also fails to serialize the dataclass and the datetime value.

3. `orjson`

import orjson

try:
    print("orjson:", orjson.dumps(u))
except TypeError as e:
    print("orjson error:", e)

Result: orjson successfully serializes both the dataclass and the datetime object.

Working with NDJSON (Newline‑Delimited JSON)

NDJSON is a format where each line is a separate JSON object, e.g.:

{"id": "A13434", "name": "Ella"}
{"id": "A13455", "name": "Charmont"}
{"id": "B32434", "name": "Areida"}

It is commonly used for logs and streaming data. Below are three approaches to handling NDJSON in Python.

NDJSON with the Standard Library `json`

import json

ndjson_payload = """{"id": "A13434", "name": "Ella"}
{"id": "A13455", "name": "Charmont"}
{"id": "B32434", "name": "Areida"}"""

# Write the payload to a file
with open("json_lib.ndjson", "w", encoding="utf-8") as fh:
    for line in ndjson_payload.splitlines():
        fh.write(line.strip() + "\n")

# Read and process line‑by‑line
with open("json_lib.ndjson", "r", encoding="utf-8") as fh:
    for line in fh:
        if line.strip():                     # Skip empty lines
            item = json.loads(line)          # Deserialize
            print(item)                      # Or pass to a caller function

NDJSON with `ijson` (streaming parser)

import ijson

ndjson_payload = """{"id": "A13434", "name": "Ella"}
{"id": "A13455", "name": "Charmont"}
{"id": "B32434", "name": "Areida"}"""

# Write the payload to a file
with open("ijson_lib.ndjson", "w", encoding="utf-8") as fh:
    fh.write(ndjson_payload)

# Parse iteratively
with open("ijson_lib.ndjson", "r", encoding="utf-8") as fh:
    for item in ijson.items(fh, "", multiple_values=True):
        print(item)

Explanation: ijson.items(fh, "", multiple_values=True) treats each root element (each line) as a separate JSON object and yields them one at a time.

NDJSON with the Dedicated `ndjson` Library

import ndjson

ndjson_payload = """{"id": "A13434", "name": "Ella"}
{"id": "A13455", "name": "Charmont"}
{"id": "B32434", "name": "Areida"}"""

# Write the payload to a file
with open("ndjson_lib.ndjson", "w", encoding="utf-8") as fh:
    fh.write(ndjson_payload)

# Load the file – returns a list of dictionaries
with open("ndjson_lib.ndjson", "r", encoding="utf-8") as fh:
    ndjson_data = ndjson.load(fh)
    print(ndjson_data)

Takeaways

For small‑to‑moderate NDJSON payloads, the standard json module works fine when you read line‑by‑line.
For very large payloads, ijson is the best choice because it streams data and uses minimal memory.
If you need to generate NDJSON from Python objects, the ndjson library is convenient (ndjson.dumps() handles the conversion automatically).

Why `ijson` Is Not Included in Benchmarking

ijson is a streaming parser, fundamentally different from the bulk parsers (json, ujson, orjson) we benchmarked. Comparing a streaming parser with bulk parsers would be an “apples‑to‑oranges” comparison:

Bulk parsers load the entire JSON document into memory, optimizing for speed.
ijson processes the document incrementally, optimizing for memory efficiency.

Including ijson in a speed‑only benchmark would misleadingly label it as the slowest, ignoring its primary advantage—low memory consumption for massive JSON streams. Therefore, ijson is evaluated separately when memory usage is the primary concern.

Generating a Synthetic JSON Payload for Benchmarking Purposes

We generate a large synthetic JSON payload containing 1 million records using the library mimesis. This data can be used to benchmark JSON libraries. The code below creates the payload; the resulting file is roughly 100 – 150 MB, which is large enough for meaningful performance tests.

from mimesis import Person, Address
import json

person_name = Person("en")
complete_address = Address("en")

with open("large_payload.json", "w") as fh:   # Streaming to a file
    fh.write("[")                           # JSON array start

    for i in range(1_000_000):
        payload = {
            "id": person_name.identifier(),
            "name": person_name.full_name(),
            "email": person_name.email(),
            "address": {
                "street": complete_address.street_name(),
                "city": complete_address.city(),
                "postal_code": complete_address.postal_code()
            }
        }

        json.dump(payload, fh)

        # Add a comma after every element except the last one
        if i < 999_999:
            fh.write(",")

    fh.write("]")                           # JSON array end

Sample Output

[
  {
    "id": "8177",
    "name": "Willia Hays",
    "email": "showers1819@yandex.com",
    "address": {
      "street": "Emerald Cove",
      "city": "Crown Point",
      "postal_code": "58293"
    }
  },
  {
    "id": "5931",
    "name": "Quinn Greer",
    "email": "professional2038@outlook.com",
    "address": {
      "street": "Ohlone",
      "city": "Bridgeport",
      "postal_code": "92982"
    }
  }
]

Let’s Start with Benchmarking

Benchmarking Prerequisites

We read the JSON file into a string and then use each library’s loads() function to deserialize it.

with open("large_payload1.json", "r") as fh:
    payload_str = fh.read()   # raw JSON text

A helper function runs a given loads implementation three times and returns the total elapsed time.

import time

def benchmark_load(func, payload_str):
    start = time.perf_counter()
    for _ in range(3):
        func(payload_str)
    end = time.perf_counter()
    return end - start

Benchmarking Deserialization Speed

import json, ujson, orjson

results = {
    "json.loads":  benchmark_load(json.loads,  payload_str),
    "ujson.loads": benchmark_load(ujson.loads, payload_str),
    "orjson.loads": benchmark_load(orjson.loads, payload_str),
}

for lib, t in results.items():
    print(f"{lib}: {t:.4f} seconds")

Result: orjson is the fastest for deserialization.

Benchmarking Serialization Speed

import json, ujson, orjson

def benchmark_dump(func, obj):
    start = time.perf_counter()
    for _ in range(3):
        func(obj)
    end = time.perf_counter()
    return end - start

# Example object (already loaded)
example_obj = json.loads(payload_str)

ser_results = {
    "json.dumps":  benchmark_dump(json.dumps,  example_obj),
    "ujson.dumps": benchmark_dump(ujson.dumps, example_obj),
    "orjson.dumps": benchmark_dump(orjson.dumps, example_obj),
}

for lib, t in ser_results.items():
    print(f"{lib}: {t:.4f} seconds")

JSON Parsing for Large Payloads: Balancing Speed, Memory, and Scalability

Introduction

Stdlib `json`

`ijson`

`ujson`

`orjson`

Decision Flow Diagram

NDJSON (Newline‑Delimited JSON)

Summary

JSON Parsing and Serialization with `json`, `ujson`, and `orjson`

Testing Serialization Capabilities of `json`, `ujson`, and `orjson`

1. Standard Library `json`

2. `ujson`

3. `orjson`

Working with NDJSON (Newline‑Delimited JSON)

NDJSON with the Standard Library `json`

NDJSON with `ijson` (streaming parser)

NDJSON with the Dedicated `ndjson` Library

Why `ijson` Is Not Included in Benchmarking

Generating a Synthetic JSON Payload for Benchmarking Purposes

Sample Output

Let’s Start with Benchmarking

Benchmarking Prerequisites

Benchmarking Deserialization Speed

Benchmarking Serialization Speed

Related posts

Understanding Database Indexes: How They Work and When They Hurt Performance

Building a standard library HTTP Client in C, C++, Rust and Python idiomatically: The Rosetta Stone for Systems Programming

Como o software é executado no computador

Planning My Next Open-Source Contributions

Introduction

Stdlib json

ijson

ujson

orjson

Decision Flow Diagram

NDJSON (Newline‑Delimited JSON)

Summary

JSON Parsing and Serialization with json, ujson, and orjson

Testing Serialization Capabilities of json, ujson, and orjson

1. Standard Library json

2. ujson

3. orjson

Working with NDJSON (Newline‑Delimited JSON)

NDJSON with the Standard Library json

NDJSON with ijson (streaming parser)

NDJSON with the Dedicated ndjson Library

Why ijson Is Not Included in Benchmarking

Generating a Synthetic JSON Payload for Benchmarking Purposes

Sample Output

Let’s Start with Benchmarking

Benchmarking Prerequisites

Benchmarking Deserialization Speed

Benchmarking Serialization Speed

Related posts

Understanding Database Indexes: How They Work and When They Hurt Performance

Building a standard library HTTP Client in C, C++, Rust and Python idiomatically: The Rosetta Stone for Systems Programming

Como o software é executado no computador

Planning My Next Open-Source Contributions

Stdlib `json`

`ijson`

`ujson`

`orjson`

JSON Parsing and Serialization with `json`, `ujson`, and `orjson`

Testing Serialization Capabilities of `json`, `ujson`, and `orjson`

1. Standard Library `json`

2. `ujson`

3. `orjson`

NDJSON with the Standard Library `json`

NDJSON with `ijson` (streaming parser)

NDJSON with the Dedicated `ndjson` Library

Why `ijson` Is Not Included in Benchmarking