Building a Two-Tower Recommendation System

Published: 3 months ago (February 2, 2026 at 02:05 PM EST)

5 min read

Source: Dev.to

Source: Dev.to

Overview

I was using Algolia for search and recommendations on POSH, my e‑commerce app. It worked great, but the bill kept growing—every search request and recommendation call added up as users constantly browsed products.

So I built my own recommendation system using a two‑tower model (the same approach YouTube and Google use):

One tower represents products as vectors.
The other represents users based on their behavior.

To get recommendations, I simply find the products closest to the user’s vector.

Below is a step‑by‑step description of how I built it.

Data Pipeline

Everything starts with user behavior captured via Firebase Analytics:

Event	Meaning
Product viewed	Just browsing
Product clicked	Showed interest
Added to cart	Strong intent

Not all interactions are equal, so I assign weights:

Event	Weight
View	0.1
Click	2.0
Add to cart	5.0

Product Vectorization

All products live in Elasticsearch. To make recommendations work, each product must be represented as a 384‑dimensional vector.

Model: all-MiniLM-L6-v2 from Sentence‑Transformers – fast, lightweight, good for semantic similarity.

How vectors are created

Combine product attributes into a single text string, e.g.:

Nike Air Max | by Nike | Shoes | Sneakers | Running | Blue color | premium

The string includes:
- Product name
- Merchant name
- Category hierarchy (parent → category → sub‑category)
- Color
- Price tier (budget / mid‑range / premium / luxury)
Feed the string to the model → 384‑dimensional vector.
Store the vector in Elasticsearch as a dense_vector field for similarity search.

User Tower Architecture

The user tower consumes a user’s recent interaction history and outputs a single vector that lives in the same space as product vectors.

Input

Up to 20 recent interactions, each consisting of:

The product’s 384‑dim vector
The interaction type (view / click / add‑to‑cart)

Output

A single 384‑dim user vector.

Model workflow

Embed interaction type and concatenate with the product vector.
Pass the sequence through a multi‑head attention layer so the model can learn which interactions matter most.
Apply recency decay – newer interactions receive higher weight.
Pool the attended representations into one vector and normalize it.

The resulting user vector sits in the same space as all product vectors.

Training

I used contrastive learning:

For each user:
- Positive: the next product they actually interacted with.
- Negatives: 10 random products they did not interact with.

The loss pushes the user vector closer to positives and farther from negatives.

Real‑Time Updates

Training is a one‑time (or periodic) job, but user preferences change constantly. I handle updates with AWS SQS.

Flow

An interaction event is sent from Firebase → a message lands in SQS:

{
  "customer_id": 12345,
  "product_id": 5678,
  "event_name": "product_clicked"
}

An SQS consumer processes the message:
- Fetches the product vector from Elasticsearch.
- Loads the user’s recent interaction history.
- Runs the history through the trained user‑tower model.
- Saves the new user vector back to Elasticsearch.

The whole pipeline takes milliseconds, so by the time the user scrolls to the next page their recommendations are already refreshed.

Pruning: interactions older than 2 days are dropped to keep the model focused on recent behavior.

Recommendations with Cosine Similarity

Both user and product vectors share the same 384‑dim space. To retrieve relevant products, I query Elasticsearch with a script_score that computes cosine similarity:

{
  "script_score": {
    "script": {
      "source": "cosineSimilarity(params.user_vector, 'product_vector') + 1.0",
      "params": { "user_vector": userVector }
    }
  }
}

+ 1.0 shifts the score to a positive range because cosine similarity can be negative.

Fallback

If a user has no vector yet (new user or insufficient interactions), the system falls back to the default sorting (popularity + recency) or respects an explicit sort (e.g., price).

The result: logged‑in users with interaction history receive a personalized feed, while everyone else still gets a sensible default. Pagination works unchanged; only the order is re‑ranked by relevance.

Results & Learnings

I’m not a data scientist—this was my first foray into building a recommendation system.
By self‑hosting everything (Elasticsearch, PyTorch model, SQS consumers) I eliminated third‑party recommendation APIs and managed‑ML costs.
Latency dropped dramatically because all components run on the same private network—no external round‑trips.
The system scales with existing infrastructure and gives fine‑grained, real‑time personalization without the exploding Algolia bill.

TL;DR

Collect weighted events via Firebase.
Vectorize products with a Sentence‑Transformer and store in Elasticsearch.
Train a two‑tower model (product + user) using contrastive learning.
Update user vectors in real time via SQS.
Serve recommendations by ranking products with cosine similarity to the user vector.

The approach is cheap, fast, and fully under my control. 🚀

## Elasticsearch over the local subnet — way faster than hitting Algolia's servers

Since launching the two‑tower model:

- **40 % increase** in app orders  
- **10 % increase** in user retention  

Users are finding products they actually want, and they're coming back more often.

### What's next

The model works, but there’s room to improve:

- **More events** – add `product_favorited`, `product_shared`, and `product_purchased` to capture stronger intent signals.  
- **Product labels** – tag products with attributes like *vintage*, *handmade*, *streetwear* and use those labels to fine‑tune the model.

### Takeaway

You don’t need a dedicated machine‑learning team to build personalized recommendations. The two‑tower architecture is well‑documented, PyTorch is approachable, and tools like Elasticsearch and SQS handle the infrastructure. If your recommendation costs are eating into your margins, it might be worth building your own.

> If you’ve built something similar or have suggestions to improve this approach, I’d love to hear from you.