Running a RAG Pipeline in a Production Full-Stack Application (Without a Vector Database)

Published: (January 19, 2026 at 03:00 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Note: If you haven’t read the first article, you can check it out here: [Insert link to previous post]

Overview

In this post we take the RAG pipeline from the backend‑only version and wrap it in a complete front‑end application. The result is a full‑stack app that lets you:

  1. Sign up / log in (via AWS Cognito).
  2. Upload PDF documents.
  3. Wait for the documents to be indexed (asynchronously).
  4. Ask questions that are answered solely from the content of the uploaded PDFs.

The goal is to demonstrate how the low‑budget RAG pipeline behaves with real users and traffic, without any hidden latency tricks or massive‑scale assumptions. It’s ideal for early‑stage experimentation, internal tools, or MVPs where feature validation matters more than peak performance.

The source code lives in the same GitHub repository as the previous post, on the full-stack-implementation branch.

Tech Stack

LayerTechnologyPurpose
FrontendNext.js (App Router)React framework with server‑side rendering & routing
Tailwind CSS 4Utility‑first styling
shadcn/uiComponent library built on Radix UI primitives (New York style)
NextAuth.jsAuthentication integration for Next.js
Lucide ReactIcon set
BackendAWS LambdaServerless API layer
AWS CognitoUser pool & authentication
Amazon S3 (presigned URLs)PDF storage
Amazon DynamoDBEmbeddings & metadata store
Amazon BedrockLLM & embedding models
Amazon EventBridge + SQSDecoupled, asynchronous document indexing

User Flow

  1. Authentication – The landing page shows a Sign‑up / Login form.
    Powered by the Authentication stack (Cognito + Login / Register Lambdas).

  2. Upload PDFs – After signing in, users can drag‑and‑drop PDFs.
    Files are uploaded to S3 via presigned URLs; an EventBridge rule pushes a message to an SQS queue, which triggers the indexing Lambda.

  3. Indexing – The Lambda extracts text, creates embeddings (via Bedrock), and stores the vectors & metadata in DynamoDB.
    This step is fully asynchronous – users are not blocked.

  4. Chat UI – Once indexing finishes, the documents appear in the chat interface.
    Select one or more documents, type a question, and the Ask‑Questions Lambda retrieves the relevant chunks, runs a Bedrock LLM inference, and returns the answer.

Demo GIF

Insert GIF showing PDF upload flow here

Architecture Diagram

Insert diagram of the Lambda indexing flow (S3 → EventBridge → SQS → Lambda → DynamoDB) here

Performance Highlights

Document Indexing

PDF Size# of PDFsAvg. Lambda Duration
~400 KB1~8 s
~200 KB each2~8 s total

Indexing remains comfortably asynchronous; users are never blocked.

I have indexed 20+ documents of similar size and observed roughly 4 s per document for the indexing Lambda (including DynamoDB write).

Get‑Documents API

Testing with Postman

  • First request: slower due to cold start.
  • Subsequent requests: warm start, consistently fast.

Insert screenshot of Postman response & Lambda duration here

Cold starts are infrequent in a multi‑user environment, so most users will see rapid responses.

Ask‑Questions Lambda

ScenarioInputAvg. Duration (cold)Avg. Duration (warm)
1 document“What is this document about?”~1.2 s~0.6 s
4 documentsSame question, multiple docs~2.0 s~1.1 s

Insert graph of Lambda execution times (cold vs. warm) here

The Lambda simply receives a payload containing the selected document IDs and the user’s question, fetches the relevant embeddings from DynamoDB, runs a Bedrock LLM inference, and returns the answer.

Cost & Operational Benefits

  • No long‑running services → Zero fixed monthly fees.
  • Serverless only (Lambda, S3, DynamoDB, Bedrock) → Pay‑as‑you‑go.
  • Separate stacks for authentication and document processing → Independent scaling & easy API‑gateway protection (import Cognito User Pool ID).

Every architectural decision is aimed at minimizing cost and operational overhead while still delivering a functional RAG experience.

Next Steps

  • Add caching (e.g., DynamoDB TTL or CloudFront) to further reduce cold‑start impact.
  • Implement pagination for large document sets.
  • Explore fine‑tuning Bedrock models for domain‑specific accuracy.

Feel free to clone the repo, switch to the full-stack-implementation branch, and experiment with your own PDFs!

Happy building!

Summary of the Four Documents

When inquiring about one document, the cold‑start duration is around 3 seconds, while a warm Lambda executes in under 1.5 seconds.

For four documents, the cold‑start duration is just above 4 seconds, and the warm‑start duration is just under 4 seconds.

Key takeaway: An average user won’t experience many Lambda cold starts. For a budget RAG system that uses very few AWS services (to keep complexity minimal), these performances are acceptable.

This application confirms what the first blog post theorized — a DynamoDB‑based RAG system is not just cheap, but usable, and if you understand its limits, it has acceptable performance.

Trade‑offs

  • Lambda cold starts are real.
  • Document query latency behaves as expected and increases as the document count grows.

For early‑stage applications, internal tools, experiments, or even a hackathon competition where you don’t want to break the bank but still need a working RAG system, this setup does its job well. It lets you ship the product, gather user feedback, and delay expensive architectural decisions until the data forces your hand.

Back to Blog

Related posts

Read more »

Is Omarchy Any Good...?

Overview If you’ve been living under a rock, you’ve probably heard the buzz about Omarchy Linux – a relatively new distro created by 37signals co‑founder David...