Inside the feature store powering real-time AI in Dropbox Dash

Published: 1 month ago (December 18, 2025 at 01:00 PM EST)

5 min read

Source: Dropbox Tech Blog

Dropbox Dash uses AI to understand questions about your files, work chats, and company content, bringing everything together in one place for deeper, more focused work. With tens of thousands of potential work documents to consider, both search and agents rely on a ranking system powered by real‑time machine learning to find the right files fast. At the core of that ranking is our feature store, a system that manages and delivers the data signals (“features”) our models use to predict relevance.

Dropbox Dash: The AI teammate that understands your work

Why a custom feature store?

To help users find exactly what they need, Dash must read between the lines of user behavior across file types, company content, and the messy, fragmented realities of collaboration. It then surfaces the most relevant documents, images, and conversations when—and how—they’re needed.

The feature store is a critical part of how we rank and retrieve the right context across your work. It’s built to:

Serve features quickly
Keep pace as user behavior changes
Let engineers move fast from idea to production

(For more on how feature stores connect to context engineering in Dash, check out our deep dive on context engineering.)

What’s in this post?

We’ll walk through:

How we built the feature store behind Dash’s ranking system
Why off‑the‑shelf solutions didn’t fit
How we designed for speed and scale
What it takes to keep features fresh as user behavior evolves

Along the way, we’ll share the trade‑offs we made and the lessons that shaped our approach.

Our Goals and Requirements

Building a feature store for Dash required a custom solution rather than an off‑the‑shelf product. The main constraints were:

Area	Challenge	Why It Matters
Hybrid Infrastructure	On‑premises low‑latency service mesh ↔ Spark‑native cloud environment	Standard cloud‑native stores couldn’t span both worlds, so we needed a bridge that kept development velocity high.
Search Ranking Scale	One query → thousands of feature lookups (behavioral, contextual, real‑time signals)	The store must sustain massive parallel reads while staying under sub‑100 ms latency budgets.
Real‑Time Relevance	Signals (e.g., document open, Slack join) must be reflected in the next search within seconds	Requires an ingestion pipeline that can keep up with user‑behavior velocity at scale.
Mixed Computation Patterns	Some features are streaming‑first; others need batch processing of historic data	A unified framework is needed to support both efficiently, reducing cognitive load for engineers and shortening the path from idea to production.

Summary

Bridge on‑prem & cloud without sacrificing speed.
Support massive parallel reads while guaranteeing low latency.

Metric	Python (original)	Go (new)
Latency	≤ 100 ms	25–35 ms
Throughput	Thousands of req/s with high CPU	Thousands of req/s with lower CPU
Scalability	Limited by GIL & process coordination	Linear scaling with goroutine count

The Go service now handles thousands of requests per second, adding only ~5–10 ms of processing overhead on top of Dynovault’s client latency and consistently achieving p95 latencies of ~25–35 ms.

Impact

Met Dash’s latency targets reliably.
Prevented feature serving from becoming a bottleneck as search traffic and feature complexity grew.

Read more about Go at the official site.

Keeping Features Fresh

Speed matters only when the data itself is fresh. Stale features can lower ranking quality and hurt user experience, so our feature store must reflect new signals as soon as possible—often within minutes of user actions.

The Challenge

Scale – Many of Dash’s most important features depend on large joins, aggregations, and historical context, making fully real‑time computation impractical.
Balance – We needed an ingestion strategy that kept data fresh and reliable without overwhelming our infrastructure or slowing development.

Our Solution: A Three‑Part Ingestion System

Ingestion Type	What It Handles	Key Benefits
Batch ingestion	Complex, high‑volume transformations built on the medallion architecture (raw → refined stages).	• Intelligent change detection → only modified records are written. • Write volume reduced from hundreds of millions per hour to < 5 minutes.
Streaming ingestion	Fast‑moving signals (e.g., collaboration activity, content interactions).	• Near‑real‑time processing of unbounded datasets. • Features stay aligned with users’ current actions.
Direct writes	Lightweight or pre‑computed features (e.g., relevance scores from an LLM evaluation pipeline).	• Bypass batch pipelines entirely. • Data appears in the online store within seconds.

Outcome

By combining these three ingestion paths, Dash can keep feature values fresh without forcing all computation onto a single pipeline. This preserves ranking quality while scaling to real‑world usage.

What We Learned

Building a feature store at Dropbox scale reinforced several hard‑earned lessons about systems design.

Serving‑side insights

Python’s concurrency model became a limiting factor for high‑throughput, mixed CPU‑I/O workloads.
Even with careful parallelism, the Global Interpreter Lock (GIL) capped performance for CPU‑bound work such as JSON parsing.
Switching to multiple processes introduced new coordination bottlenecks.
Rewriting the serving layer in Go removed those trade‑offs and let us scale concurrency predictably.

Data‑side insights

Infrastructure changes mattered, but understanding access patterns mattered just as much.
Only 1–5 % of feature values change in a typical 15‑minute window.
Exploiting this fact dramatically reduced write volume and ingestion time, turning hour‑long batch cycles into five‑minute updates—improving freshness without increasing system load.

Hybrid architecture

Feast – orchestration & consistency
Spark – large‑scale computation
Dynovault – low‑latency online serving

Rather than relying on a single vendor solution, this approach lets us tune each layer to its strengths while keeping training and serving aligned.

Takeaway

The work underscored the value of a middle path between building everything from scratch and adopting off‑the‑shelf systems wholesale. By combining open‑source foundations with internal infrastructure and tailoring them to real constraints, we built a feature store that meets today’s needs and can evolve with us in the future.

Acknowledgments

Special thanks to all current and past members of the AI/ML Platform and Data Platform teams for their contributions, as well as our machine‑learning engineers who spin up the magic with the tooling we build.

If building innovative products, experiences, and infrastructure excites you, come build the future with us! Visit jobs.dropbox.com to see our open roles.