Thinking in Pipelines: A Better Way to Structure Go Systems

Published: (May 3, 2026 at 12:55 PM EDT)
8 min read
Source: Dev.to

Source: Dev.to

I was working on a personal project recently – a job scraper

And in the process, I came across a pattern that’s genuinely changed how I think about structuring backend systems in Go.
It’s called the Pipeline Pattern, and it shows up in a lot of places – payments, analytics, APIs, etc.

In this article I’ll walk you through it using my job‑scraper project, which is a perfect use case for this pattern.

The Mess We’re Trying to Avoid

Before I show you the pattern, let’s look at what the code could look like without it.

My scraper does four things:

  1. Scrape job listings from multiple sources
  2. Normalize them (i.e., clean them up)
  3. Score them based on keywords (the most relevant to my skill‑set get the highest scores)
  4. Save them to a database

A naïve implementation might look like this (simplified):

for _, raw := range rawJobs {
    // normalize
    raw.Title = strings.TrimSpace(raw.Title)
    raw.Location = strings.ReplaceAll(raw.Location, "NYC", "New York")

    // score
    score := 0
    for _, keyword := range keywords {
        if strings.Contains(raw.Title, keyword) {
            score++
        }
    }

    // save
    s.Repo.Create(raw.Title, raw.Location, score)
}

That technically works, but it creates several problems:

  • No clear stages. Where does normalization end and scoring begin? You have to read everything to understand anything.
  • Hard to test. How do you test just the scoring logic? You can’t – it’s glued to everything else inside that loop.
  • Hard to change. Want a new scoring rule? You’re digging through the loop, touching normalization, maybe breaking the save logic. Everything is coupled.
  • Hard to reuse. If you need that normalization logic elsewhere, you have to copy‑paste it, duplicating code.
  • Concurrency feels impossible. Want to process jobs concurrently? The loop is tangled, so it’s unclear what can safely run in parallel.

The Realization

When you look at that loop, you’ll notice it isn’t really one problem. It’s the same set of steps repeated for every job:

Scrape → Clean → Evaluate → Save

So the question becomes – what if we made those steps explicit?

The Pipeline

Instead of one big loop doing everything, we break the work into distinct stages:

Scrape → Normalize → Score → Store

Each stage does one thing. Data flows through, gets transformed, and moves on.

Below is the actual pipeline from my scraper.

type Pipeline struct {
    scorer         scoring.Scorer
    jobService     JobService
    companyService CompanyService
    logger         *slog.Logger
}

func NewPipeline(
    scorer scoring.Scorer,
    jobService JobService,
    companyService CompanyService,
    logger *slog.Logger,
) *Pipeline {
    return &Pipeline{
        scorer:         scorer,
        jobService:     jobService,
        companyService: companyService,
        logger:         logger,
    }
}

Notice what’s happening in NewPipeline. We’re not hard‑coding any specific scraper or store; we pass them in. We’ll see why that matters shortly.

Running the pipeline

func (p *Pipeline) Run(ctx context.Context, scraper Scraper) error {
    // 1. Scrape
    rawJobs, err := scraper.Scrape(ctx)
    if err != nil {
        return fmt.Errorf("scraping %s: %w", scraper.Source(), err)
    }

    var saved, failed int

    for _, rawJob := range rawJobs {
        // 2. Normalize
        normalizedJob, err := normalize.Normalize(rawJob)
        if err != nil {
            failed++
            continue
        }

        // 3. Score
        normalizedJob.Score = p.scorer.Score(normalizedJob)

        // 4. Save
        if err := p.jobService.Save(ctx, normalizedJob); err != nil {
            failed++
            continue
        }
        saved++
    }

    p.logger.Info("pipeline finished", "saved", saved, "failed", failed)
    return nil
}

The logic is the same as the messy version, but now each stage lives in its own function. Reading the Run method top‑to‑bottom instantly tells you what the system does: scrape → normalize → score → save. No digging, no guessing. The structure itself tells the story.

Swappability

Because each stage is separate and wired through interfaces, you can swap any part of the pipeline without touching the rest.

Swap the scraper

// testing
scraper := &FakeScraper{}

// production
scraper := &RemotiveScraper{}

The pipeline code stays exactly the same.

Swap the store

// development
store := NewInMemoryStore()

// production
store := NewPostgresStore()

Again, the pipeline remains unchanged.

Swap the scorer

// simple keyword scorer
scorer := &KeywordScorer{keywords: []string{"Go", "backend", "remote"}}

// later, a smarter scorer
scorer := &MLScorer{}

The pipeline still works without modification.

Why This Matters

  • Clear separation of concerns makes the codebase easier to read and reason about.
  • Testability improves dramatically; you can unit‑test each stage in isolation.
  • Extensibility is as simple as implementing a new interface and wiring it up.
  • Concurrency becomes straightforward – you can parallelise any stage that is stateless (e.g., normalization or scoring) without risking race conditions in other stages.

TL;DR

The Pipeline Pattern turns a tangled monolithic loop into a clean, composable series of stages. By defining explicit boundaries—scrape, normalize, score, store—you gain readability, testability, swappability, and the ability to run parts of the system concurrently.

Give it a try in your next Go backend project; you’ll wonder how you ever lived without it.

Concurrency – The Worker Pool

So now the system is clean and flexible. But what about performance?

Right now, everything runs one job at a time: scrape one, normalize it, score it, save it, then move to the next. For small datasets that’s fine, but for a thousand jobs it’s slow.

Leveling it up

The idea is simple: instead of processing one job at a time, spin up a pool of workers (goroutines) and let them process jobs concurrently. If you haven’t used goroutines before, they’re basically lightweight threads in Go. You can spin up many of them without much overhead. The key word there is many, not unlimited. More on that in a second.

Here’s the shape of it:

func (p *Pipeline) Run(ctx context.Context, scraper Scraper, numWorkers int) error {
    rawJobs, err := scraper.Scrape(ctx)
    if err != nil {
        return fmt.Errorf("scraping %s: %w", scraper.Source(), err)
    }

    jobs := make(chan RawJob)
    var wg sync.WaitGroup

    // spin up workers
    for i := 0; i < numWorkers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for raw := range jobs {
                // normalize, score, save
            }
        }()
    }

    // feed jobs into the channel
    for _, raw := range rawJobs {
        jobs <- raw
    }
    close(jobs)

    wg.Wait()
    return nil
}

Think of it like a queue and a team. The channel is the queue: raw jobs go in one end, workers pull from the other. Each worker runs a job through the pipeline stages independently. sync.WaitGroup just makes sure we don’t return until every worker is done.

The numWorkers parameter is the important part. You decide how many workers run, not the Go runtime. That matters because unbounded concurrency has a real cost—a thousand goroutines all trying to write to your database at the same time will hurt more than help. Three to ten workers, controlled, is usually the right call.

The pipeline didn’t change conceptually; it just runs in parallel now.

Pipelines Are Everywhere

Once you internalize this pattern, you start seeing it everywhere.

  • Payments: validate the card → charge it → save the transaction.
  • Analytics: collect the event → clean it → store it.
  • APIs: receive the request → process it → send the response.

Different domains, same shape. Data comes in, moves through stages, and emerges transformed. That’s the pipeline pattern, and it keeps showing up because it maps well onto real‑world work: step‑by‑step with clear handoffs between stages.

If you learn to recognize it, you can apply it not just in job scrapers but anywhere you write code that takes something, does a series of things to it, and produces a result.

So, Why Bother?

You could write a system without any of this. Plenty of working software is just one big loop doing everything, and for a small throwaway project that’s fine.

But the moment your system needs to grow, that big loop starts fighting you:

  • You want to add a new data source, but the scraping logic is tangled with normalization.
  • You want to test scoring in isolation, but it’s buried three levels deep.
  • You want to swap your in‑memory store for a real database, but there’s no clean seam to grab onto.

I’ve run into all this while working on my scraper and other recent builds. The pipeline pattern has made those problems manageable.

The four ideas worth keeping

  1. Break into stages. Each stage does one thing.
  2. Keep stages focused. If a stage is hard to name, it’s probably doing too much.
  3. Make parts swappable. Wire through interfaces, not concrete types.
  4. Control concurrency. Use a worker pool, not unlimited goroutines.

Thinking in pipelines makes a system easier to reason about, which matters a lot when you’re in the middle of building something or making updates.

That’s the pattern. Hope it’s useful!

0 views
Back to Blog

Related posts

Read more »

Go do zero: var vs :=

Go tem duas formas de declarar variáveis: var e :=. Elas existem por motivos diferentes e têm regras distintas. Saber quando usar cada uma evita erros bobos e c...