AWS re:Invent 2025 - Fine-tuning models for accuracy and latency at Robinhood Markets (IND392)

Published: (December 5, 2025 at 05:36 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

Introduction

Robinhood Markets leverages fine‑tuning to improve accuracy and latency for generative AI use cases such as Cortex Digest and their CX AI agent. Their methodology balances the generative‑AI trilemma of cost, quality, and latency through a three‑stage tuning roadmap: prompt tuning, trajectory tuning, and LoRA fine‑tuning.

Fine‑tuning Roadmap

StageGoalTypical Techniques
Prompt tuningQuickly adapt model behavior with minimal dataPrompt engineering, few‑shot examples
Trajectory tuningAlign model outputs over longer sequencesReinforcement learning from human feedback (RLHF), DPO
LoRA fine‑tuningEfficiently update large models with low‑rank adaptersLow‑Rank Adaptation (LoRA) on SageMaker

Evaluation System

Robinhood employs a three‑layer evaluation framework:

  1. LLM‑as‑judge – Automated scoring using a separate LLM.
  2. Human feedback – Expert reviewers validate model outputs.
  3. Task‑specific metrics
    • Categorical correctness – Ensures the right classification of financial events.
    • Semantic intent – Measures how well the model captures user intent, especially for the CX planner.

Dataset Curation

  • Quality over quantity – Data is stratified by relevance and reliability (e.g., analyst reports vs. random blogs).
  • Strategic curation – Focuses on high‑impact sources to improve signal‑to‑noise ratio.

Use Cases

Cortex Digest

  • Provides concise, accurate explanations for rapid stock movements.
  • Fine‑tuned models learn domain‑specific vocabulary (e.g., “advice” → financial guidance) and prioritize authoritative sources.

Custom Indicators & Scans

  • Translates natural‑language queries into executable trading logic.
  • Users can request indicators (e.g., “golden crossover”) in plain English; the model generates the corresponding code and displays it on charts.
  • Scans apply the generated logic across the entire market in real time, democratizing algorithmic trading.

CX AI Agent

  • Serves millions of customers with a multi‑stage architecture:
    1. Foundation – Amazon Bedrock handles heavy‑weight inference.
    2. Planner – Converts user queries into tool calls and actionable plans.
    3. Execution – Performs the planned actions and returns results.

LoRA Implementation & Performance

  • Deployed LoRA‑fine‑tuned models on Amazon SageMaker.
  • Achieved >50 % latency reduction (from 3–6 s to 1–2 s) while maintaining parity with frontier models in quality.
  • Demonstrated production‑scale viability in a regulated financial‑services environment.

Key Takeaways

  • A structured tuning roadmap enables systematic trade‑offs between cost, quality, and latency.
  • Layered evaluation (LLM‑as‑judge + human feedback + task metrics) ensures robust model performance.
  • Strategic dataset curation and LoRA fine‑tuning can deliver substantial latency improvements without sacrificing accuracy, even in high‑stakes domains like finance.
Back to Blog

Related posts

Read more »