Consuming APIs from a Backend POV: Normalizing Data Across Multiple Endpoints
Source: Dev.to
Where It All Started
The first API I used was the Open Library API. It worked… but not consistently.
- Some books came back without ISBNs.
- Others had no descriptions.
- In some cases, the author data felt a bit off or incomplete.
At first, I thought:
“Maybe this is just how it is.”
But I wanted richer data, so I added a second source: Google Books API. My thinking was simple:
“If one API is missing something, the other one probably has it.”
And that part was true. What I didn’t anticipate was the new set of problems that came with it.
Where Things Started Getting Messy
Once I started consuming data from both APIs, I noticed a few things almost immediately:
- The same book showed up more than once.
- Author names were formatted differently.
- ISBNs existed in one response but not the other.
- Descriptions didn’t always match.
Same book. Different versions of the truth.
A Simplified Example
Open Library response
{
"title": "The Hobbit",
"authors": [{ "name": "J.R.R. Tolkien" }],
"isbn_10": ["0345339681"]
}
Google Books response
{
"volumeInfo": {
"title": "The Hobbit",
"authors": ["J. R. R. Tolkien"],
"industryIdentifiers": [
{ "type": "ISBN_13", "identifier": "9780345339683" }
]
}
}
Both are correct and both describe the same book. But if you store this data as‑is, you’re asking for trouble.
The Real Problem (That Took Me a While to See)
The problem wasn’t Open Library, and it certainly wasn’t Google Books. The problem was me assuming external APIs would agree with each other. They don’t.
Each API has its own structure, priorities, and idea of what “complete” data looks like. That’s when I ran into the concept that quietly fixed everything: Normalization.
So… What Is Normalization?
In the simplest terms:
Normalization is deciding what your data should look
then forcing everything else to conform to it.
For non‑techies
- It’s cleaning and standardizing information before saving it.
- It’s making sure one book doesn’t end up with five slightly different identities.
For techies
- It’s mapping external API responses into a single internal schema.
Either way, the idea is the same:
One system. One structure. One source of truth.
Why Normalization Actually Matters
Before normalization I had:
- Duplicate books in my database.
- Inconsistent author names.
- Unreliable ISBN lookups.
After normalization I got:
- One book = one record.
- Predictable fields.
- Much cleaner logic downstream.
It’s not flashy, but it quietly saves you hours of debugging later.
Achieving Normalization
Step One: Decide What a “Book” Means to You
Before touching any API logic, I had to answer a simple question:
“What does a book look inside my system?”
Here’s the structure I settled on:
Book = {
"title": str,
"authors": list[str],
"isbn_10": str | None,
"isbn_13": str | None,
"description": str | None
}
This became my reference point. Anything coming from outside had to be reshaped to fit this.
Step Two: Normalize Each API Separately
Instead of mixing logic, I treated each API independently.
Open Library Normalization
def normalize_openlibrary(data):
return {
"title": data.get("title"),
"authors": [a.get("name") for a in data.get("authors", [])],
"isbn_10": data.get("isbn_10", [None])[0],
"isbn_13": data.get("isbn_13", [None])[0],
"description": data.get("description")
}
Google Books Normalization
def normalize_googlebooks(data):
info = data.get("volumeInfo", {})
isbn_10 = None
isbn_13 = None
for identifier in info.get("industryIdentifiers", []):
if identifier["type"] == "ISBN_10":
isbn_10 = identifier["identifier"]
elif identifier["type"] == "ISBN_13":
isbn_13 = identifier["identifier"]
return {
"title": info.get("title"),
"authors": info.get("authors", []),
"isbn_10": isbn_10,
"isbn_13": isbn_13,
"description": info.get("description")
}
At this point, both APIs were finally speaking the same language.
Step Three: Merging Without Duplicating
Normalization gets your data into the same shape. Merging decides which data wins.
My rules were simple:
- Prefer ISBN‑13 when available.
- Use Google Books as a fallback for missing descriptions.
def merge_books(primary, fallback):
return {
"title": primary["title"] or fallback["title"],
"authors": primary["authors"] or fallback["authors"],
"isbn_10": primary["isbn_10"] or fallback["isbn_10"],
"isbn_13": primary["isbn_13"] or fallback["isbn_13"],
"description": primary["description"] or fallback["description"],
}
Nothing fancy—just clear rules.
The mental model that helped me was: APIs are raw ingredients, normalization is the recipe, and the database is the final dish. Skip the recipe, and you still get food, just not something you’d confidently serve.
What I Took Away From This
- Never assume external APIs agree.
- Define a single source of truth early.
- Normalize each source before merging.
- Keep merging logic simple and deterministic.
- Invest in data hygiene now; it pays off later.
Happy coding, and may your data always be clean!
PIs don’t owe you consistency
- More data sources = more responsibility
- Normalization isn’t optional once you scale
Most importantly, I learned that backend work isn’t just about fetching data. It’s about deciding what truth looks like in your system and enforcing it.
If you’re consuming multiple APIs and things feel slightly off, normalization is probably the missing piece.
Happy building 🚀