Running a RAG Pipeline in a Production Full-Stack Application (Without a Vector Database)

Published: 2 hours ago (January 19, 2026 at 03:00 AM EST)

4 min read

Source: Dev.to

Note: If you haven’t read the first article, you can check it out here: [Insert link to previous post]

Overview

In this post we take the RAG pipeline from the backend‑only version and wrap it in a complete front‑end application. The result is a full‑stack app that lets you:

Sign up / log in (via AWS Cognito).
Upload PDF documents.
Wait for the documents to be indexed (asynchronously).
Ask questions that are answered solely from the content of the uploaded PDFs.

The goal is to demonstrate how the low‑budget RAG pipeline behaves with real users and traffic, without any hidden latency tricks or massive‑scale assumptions. It’s ideal for early‑stage experimentation, internal tools, or MVPs where feature validation matters more than peak performance.

The source code lives in the same GitHub repository as the previous post, on the full-stack-implementation branch.

Tech Stack

Layer	Technology	Purpose
Frontend	Next.js (App Router)	React framework with server‑side rendering & routing
	Tailwind CSS 4	Utility‑first styling
	shadcn/ui	Component library built on Radix UI primitives (New York style)
	NextAuth.js	Authentication integration for Next.js
	Lucide React	Icon set
Backend	AWS Lambda	Serverless API layer
	AWS Cognito	User pool & authentication
	Amazon S3 (presigned URLs)	PDF storage
	Amazon DynamoDB	Embeddings & metadata store
	Amazon Bedrock	LLM & embedding models
	Amazon EventBridge + SQS	Decoupled, asynchronous document indexing

User Flow

Authentication – The landing page shows a Sign‑up / Login form.
Powered by the Authentication stack (Cognito + Login / Register Lambdas).
Upload PDFs – After signing in, users can drag‑and‑drop PDFs.
Files are uploaded to S3 via presigned URLs; an EventBridge rule pushes a message to an SQS queue, which triggers the indexing Lambda.
Indexing – The Lambda extracts text, creates embeddings (via Bedrock), and stores the vectors & metadata in DynamoDB.
This step is fully asynchronous – users are not blocked.
Chat UI – Once indexing finishes, the documents appear in the chat interface.
Select one or more documents, type a question, and the Ask‑Questions Lambda retrieves the relevant chunks, runs a Bedrock LLM inference, and returns the answer.

Demo GIF

Insert GIF showing PDF upload flow here

Architecture Diagram

Insert diagram of the Lambda indexing flow (S3 → EventBridge → SQS → Lambda → DynamoDB) here

Performance Highlights

Document Indexing

PDF Size	# of PDFs	Avg. Lambda Duration
~400 KB	1	~8 s
~200 KB each	2	~8 s total

Indexing remains comfortably asynchronous; users are never blocked.

I have indexed 20+ documents of similar size and observed roughly 4 s per document for the indexing Lambda (including DynamoDB write).

Get‑Documents API

Testing with Postman

First request: slower due to cold start.
Subsequent requests: warm start, consistently fast.

Insert screenshot of Postman response & Lambda duration here

Cold starts are infrequent in a multi‑user environment, so most users will see rapid responses.

Ask‑Questions Lambda

Scenario	Input	Avg. Duration (cold)	Avg. Duration (warm)
1 document	“What is this document about?”	~1.2 s	~0.6 s
4 documents	Same question, multiple docs	~2.0 s	~1.1 s

Insert graph of Lambda execution times (cold vs. warm) here

The Lambda simply receives a payload containing the selected document IDs and the user’s question, fetches the relevant embeddings from DynamoDB, runs a Bedrock LLM inference, and returns the answer.

Cost & Operational Benefits

No long‑running services → Zero fixed monthly fees.
Serverless only (Lambda, S3, DynamoDB, Bedrock) → Pay‑as‑you‑go.
Separate stacks for authentication and document processing → Independent scaling & easy API‑gateway protection (import Cognito User Pool ID).

Every architectural decision is aimed at minimizing cost and operational overhead while still delivering a functional RAG experience.

Next Steps

Add caching (e.g., DynamoDB TTL or CloudFront) to further reduce cold‑start impact.
Implement pagination for large document sets.
Explore fine‑tuning Bedrock models for domain‑specific accuracy.

Feel free to clone the repo, switch to the full-stack-implementation branch, and experiment with your own PDFs!

Happy building!

Summary of the Four Documents

When inquiring about one document, the cold‑start duration is around 3 seconds, while a warm Lambda executes in under 1.5 seconds.

For four documents, the cold‑start duration is just above 4 seconds, and the warm‑start duration is just under 4 seconds.

Key takeaway: An average user won’t experience many Lambda cold starts. For a budget RAG system that uses very few AWS services (to keep complexity minimal), these performances are acceptable.

This application confirms what the first blog post theorized — a DynamoDB‑based RAG system is not just cheap, but usable, and if you understand its limits, it has acceptable performance.

Trade‑offs

Lambda cold starts are real.
Document query latency behaves as expected and increases as the document count grows.

For early‑stage applications, internal tools, experiments, or even a hackathon competition where you don’t want to break the bank but still need a working RAG system, this setup does its job well. It lets you ship the product, gather user feedback, and delay expensive architectural decisions until the data forces your hand.