Publishing Pipeline - Refactoring

Published: 3 weeks ago (January 15, 2026 at 11:53 AM EST)

4 min read

Source: Dev.to

Source: Dev.to

Where We Started

The original setup had clear limitations:

One script, one platform
Tight coupling between:
- content loading
- publishing logic
- database access
Minimal state tracking
Hard to debug
Hard to extend
Impossible to reason about once it grew past a few hundred lines

In short: it worked, but it didn’t scale — neither technically nor mentally.

Original Flow

Original pipeline diagram

Characteristics of the Old Design

One script owns everything
Platform logic intertwined
Hard to add new platforms
Any change risks breaking unrelated behavior
Re‑running safely is difficult

The New Architecture: Clear Responsibilities Everywhere

Over the past weeks, the pipeline was redesigned around layers, each with a single responsibility.

1. Pipeline Layer (Orchestration)

Responsible only for flow:

Load posts
Process media
Inject backlinks
Publish to enabled platforms
Track results

No platform‑specific logic lives here.

2. Publisher Layer (Per‑Platform Semantics)

Each platform gets:

a Publisher (what to publish, when, and how)
a Client (how to talk to the API)

This distinction turned out to be crucial. For example:

Platform	Update Support
WordPress	Supports updates
Dev.to	Supports updates with constraints
X (Twitter)	Publish‑once‑only

That difference now lives exactly where it belongs: inside the publisher, not smeared across the pipeline.

3. Client Layer (Pure API Communication)

Clients do one thing only:

Authenticate
Send requests
Return normalized results

They contain no:

Database access
Publishing decisions
Content logic

This makes them:

Testable
Replaceable
Reusable

4. Database Layer (State & Deduplication)

The database now tracks:

What was published
Where it was published
Which content hash was used
Which media assets already exist
Canonical URLs

This enables:

Idempotent runs
Safe re‑publishing
Deduplication of media
Correct canonical handling across platforms

New Flow Diagram

New pipeline diagram

What Changed

Pipeline only controls flow.
Publishers decide what to do.
Clients only talk to APIs.
Database provides state and idempotency.
Each part is testable and replaceable.

This is the turning point where the system stops being fragile.

Why This Is Faster (and Calmer)

One unexpected benefit: performance and reliability improved naturally—not because of clever optimizations, but because:

Small modules load faster
Less global state stays in memory
Failures are isolated
Logs are precise and meaningful
Re‑runs are cheap and safe

Instead of hoping nothing breaks, the pipeline now knows when nothing needs to happen.

Adding a New Platform Is Now Boring (That’s a Compliment)

To add a new platform today, the steps are:

Write a client (platforms/foo_client.py)
Write a publisher (publishers/foo.py)
Wire it into the composition root

No changes required to:

Media handling
Backlinks
Canonical logic
Database schema
Pipeline flow

That’s the kind of boring you (well, I for certain) want.

Media, Backlinks, Canonicals — All Solved Properly

Some highlights that are now just working:

Media deduplication via hashing → upload once, reuse everywhere
Featured images correctly attached per post
Backlinks injected deterministically
Canonical URLs handled centrally and propagated correctly
Per‑platform semantics respected (no more accidental re‑posting)

None of this required hacks—only the right separation of concerns.

What I Learned (The Unexpected Part)

This project wasn’t about learning Python—but it turned into exactly that. In three weeks, I learned more about:

Python module design
Data modeling
Immutability vs. mutation
Error handling
API semantics
Architectural boundaries

Key insight: Python becomes dramatically easier once the architecture is clean. Most complexity wasn’t Python at all—it was unclear responsibility.

What’s Next: Roadmap to v1.4.0

The current version 1.3.1 closes the stabilization phase.

The next milestone, 1.4.0, will focus on capabilities.

Planned Features

Platform‑Aware Update Strategies
- … (details to be added)

# Publish‑once (X)

- **Update‑in‑place** (WordPress)  
- **Conditional update** (Dev.to)  
- **Future:** quote‑tweet or follow‑up logic  

## 2. Rate Limiting & Scheduling

- Per‑platform throttling  
- Optional delayed publishing  
- CI‑safe dry‑run mode  

## 3. Observability Improvements

- Structured logs  
- Optional JSON output  
- Better failure summaries

New Platforms (Starting with One)

At least one new platform will be added in v1.4.0.

Candidates

LinkedIn (high priority)
Xing
Indeed (content‑syndication angle)

Later roadmap

Substack
Patreon
Possibly others

Thanks to the current architecture, adding these is now a contained task, not a rewrite.

And Yes — This Will Become a Series

This post is likely the first of a small series covering:

Python for infrastructure‑minded people
Real‑world refactoring
Automation that survives growth
Designing for change, not just success

Not tutorials — but honest write‑ups of systems that evolved under real constraints.

Final Thoughts

What started as a utility script has become a small platform.
Not because it needed to be fancy — but because it needed to be understandable, extendable, and trustworthy.

And that, more than anything, made the difference.

Down the road it might become a service that interested people can use for their own writing. Since I am developing and running my own Kubernetes cloud, there should be nothing in the way of making this a container solution accessible for tenants.

Support

Did you find this post helpful? You can support me.