AWS re:Invent 2025 - Fine-tuning models for accuracy and latency at Robinhood Markets (IND392)
Source: Dev.to
Introduction
Robinhood Markets leverages fine‑tuning to improve accuracy and latency for generative AI use cases such as Cortex Digest and their CX AI agent. Their methodology balances the generative‑AI trilemma of cost, quality, and latency through a three‑stage tuning roadmap: prompt tuning, trajectory tuning, and LoRA fine‑tuning.
Fine‑tuning Roadmap
| Stage | Goal | Typical Techniques |
|---|---|---|
| Prompt tuning | Quickly adapt model behavior with minimal data | Prompt engineering, few‑shot examples |
| Trajectory tuning | Align model outputs over longer sequences | Reinforcement learning from human feedback (RLHF), DPO |
| LoRA fine‑tuning | Efficiently update large models with low‑rank adapters | Low‑Rank Adaptation (LoRA) on SageMaker |
Evaluation System
Robinhood employs a three‑layer evaluation framework:
- LLM‑as‑judge – Automated scoring using a separate LLM.
- Human feedback – Expert reviewers validate model outputs.
- Task‑specific metrics –
- Categorical correctness – Ensures the right classification of financial events.
- Semantic intent – Measures how well the model captures user intent, especially for the CX planner.
Dataset Curation
- Quality over quantity – Data is stratified by relevance and reliability (e.g., analyst reports vs. random blogs).
- Strategic curation – Focuses on high‑impact sources to improve signal‑to‑noise ratio.
Use Cases
Cortex Digest
- Provides concise, accurate explanations for rapid stock movements.
- Fine‑tuned models learn domain‑specific vocabulary (e.g., “advice” → financial guidance) and prioritize authoritative sources.
Custom Indicators & Scans
- Translates natural‑language queries into executable trading logic.
- Users can request indicators (e.g., “golden crossover”) in plain English; the model generates the corresponding code and displays it on charts.
- Scans apply the generated logic across the entire market in real time, democratizing algorithmic trading.
CX AI Agent
- Serves millions of customers with a multi‑stage architecture:
- Foundation – Amazon Bedrock handles heavy‑weight inference.
- Planner – Converts user queries into tool calls and actionable plans.
- Execution – Performs the planned actions and returns results.
LoRA Implementation & Performance
- Deployed LoRA‑fine‑tuned models on Amazon SageMaker.
- Achieved >50 % latency reduction (from 3–6 s to 1–2 s) while maintaining parity with frontier models in quality.
- Demonstrated production‑scale viability in a regulated financial‑services environment.
Key Takeaways
- A structured tuning roadmap enables systematic trade‑offs between cost, quality, and latency.
- Layered evaluation (LLM‑as‑judge + human feedback + task metrics) ensures robust model performance.
- Strategic dataset curation and LoRA fine‑tuning can deliver substantial latency improvements without sacrificing accuracy, even in high‑stakes domains like finance.