Web/Mobile Personal AI CFO

Published: 2 months ago (February 21, 2026 at 05:00 AM EST)

8 min read

Source: Dev.to

Source: Dev.to

FinTrack
Technical high‑level walkthrough of building a personal‑finance application that combines traditional CRUD with AI features—and why I split the services.

The Problem

I wanted to build a personal‑finance app that could do more than just track transactions. I wanted it to understand your finances:

answer questions in natural language,
categorize expenses automatically, and
even take voice input, like a real personal assistant (ask Finny).

The challenge? Most AI work happens in Python, while typical web back‑ends live in Node.js. How do you get the best of both worlds without a tangled mess?

Note: There’s more than one way to skin a cat. The right solution depends on concurrency, volume, cost, and the context in which your app will run.

High‑Level Architecture

I chose a dual‑backend architecture: one service for the core application, another dedicated entirely to AI.

High level arch

Component	Tech Stack	Responsibility
Frontend	React + TypeScript + Vite	UI, routing, state, and a single API abstraction layer
ExpressoTS API	Node.js (ExpressoTS framework) + PostgreSQL	All non‑AI operations: transactions, accounts, goals, budgets, auth
Python AI Service	Python (FastAPI/Flask) + OpenRouter (LLM provider) + Vector store (e.g., Pinecone)	RAG chat, transaction classification, voice transcript parsing, document extraction

Why Split the Backend?

Keep API Keys Server‑Side
AI providers require secret keys. The Python service holds those credentials and proxies all AI requests, so the browser never talks directly to the provider.
Python’s Strengths for AI
Classification, embeddings, and RAG pipelines are natural fits for Python’s mature ecosystem (NumPy, pandas, OpenAI SDK, etc.). Replicating that in Node would add custom code and maintenance overhead.
Independent Scaling & Concurrency
CRUD traffic and AI traffic have different load profiles. With two services we can scale each independently—spikes in chat usage won’t affect the core transaction APIs.
Clear Separation of Concerns
Business data & CRUD live in ExpressoTS; AI/ML lives in Python. This makes onboarding, debugging, and refactoring far easier.

Frontend Structure

The app follows a clean layering pattern:

Layer	Location	Responsibility
UI	`components/`, `pages/`	Presentational components only
State	`contexts/`, `hooks/`	Global state and reusable logic
Data	`api/`	All data‑fetching and API calls
Utils	`utils/`	Pure functions, no side effects

The api/ folder is the single entry point for backend communication. It abstracts whether a request goes to ExpressoTS or the Python service, exposing functions such as fetchTransactions(), classifyTransactions(), or askFinny(). The rest of the app stays unaware of the dual‑backend setup.

The Python AI Service: What It Does

1. RAG for Financial Q&A

Users ask questions like “How much did I spend on groceries last month?” or “What’s left in my dining budget?”. The RAG service:

Generates an embedding for the user’s question.
Searches a vector store for relevant transactions, goals, and spending caps.
Injects that context into the LLM prompt.
Returns an answer grounded in the user’s actual data.

Embeddings are stored for transactions, categories, goals, and budgets. Vector search returns the most relevant slices of data, eliminating hallucinations about numbers.

2. Batch Transaction Classification

When a user uploads a CSV from their bank, dozens or hundreds of uncategorized transactions appear. The classifier:

Uses the user’s existing category hierarchy.
Learns from past classifications (e.g., “Starbucks” → Coffee).
Returns structured JSON with categoryId and subcategoryId.

Transactions are batched to stay within token limits and processed asynchronously for speed. The frontend shows progress and refreshes the list when done.

3. Voice Input

Users can speak a transaction: “Fifty dollars at the gas station for fuel.” The voice parser turns that into a structured object:

{
  "amount": 50,
  "description": "Gas station",
  "category": "Transportation",
  "subcategory": "Fuel"
}

The transcript is sent to the Python service, which returns a transaction object ready for the user to confirm or edit before saving.

Patterns That Worked

Context‑Aware AI – The RAG service receives user_id and household_id from the auth token, scoping every query to that user’s data and preventing cross‑household leakage.
Rate Limiting & Usage Tiers – AI calls are expensive. We enforce per‑user limits (e.g., X chat queries, Y voice sessions, Z classifications) to control cost and ensure fair usage.
Asynchronous Job Queue – Long‑running AI tasks (batch classification, voice parsing) are queued with a worker system (e.g., Celery or RQ). The frontend polls for job status, keeping the UI responsive.
Caching Embeddings – Frequently accessed embeddings (e.g., recent transactions) are cached in Redis, reducing vector‑store latency and cutting provider costs.
Observability – Structured logs, OpenTelemetry tracing, and Prometheus metrics give visibility into request latency, error rates, and token usage across both services.

TL;DR

Split the backend into a Node/ExpressoTS service for fast, concurrent CRUD and a Python service for AI‑heavy workloads.
Keep API keys server‑side, leverage Python’s AI ecosystem, and scale each service independently.
A clean frontend layering and a single api/ abstraction hide the dual‑backend complexity from the UI.

The result is a responsive, AI‑enhanced personal‑finance app that can handle real‑world traffic without sacrificing performance or security.

3. Caching Where It Makes Sense

Stateless questions like “What’s my total income this year?” can be cached. We cache by query + user_id with a short TTL. Conversational threads aren’t cached—they’re too dynamic.

4. Tool Calling for Actions

The chat isn’t just Q&A. Users can say “Add twenty dollars for lunch” and the AI triggers a tool that creates a transaction.

We define tool schemas (add_transaction, get_balance, etc.).
The LLM decides when to call them.
The Python service executes the call and returns the result for the model to summarize.

What I’d Do Differently

Start with the dual‑backend split from day one.
I flirted with a monolith and then split. Migrating mid‑build was painful.

Standardize error handling early.
ExpressoTS and Python return errors in different shapes. I added a shared error‑format layer later—would’ve saved time to do it upfront.

Invest in embedding pipelines sooner.
RAG quality depends on good embeddings. We iterated on what to embed (full transaction text vs. summaries) and when to re‑index. Doing that earlier would have improved the chat experience faster.

There’s more to it (obviously I can’t reveal all strategies used), but this is a good start. :)

Takeaways

Building an AI‑powered app doesn’t mean throwing everything into one stack, especially if you’re “vibe coding” (be cautious). Splitting the backend—ExpressoTS for CRUD, Python for AI—gave me:

Clear ownership of API keys and AI logic
The right tool for each job
Independent scaling, concurrency, and deployment
A maintainable codebase

If you’re considering adding AI to an existing app, or starting fresh with AI in mind, a dedicated Python, Rust, or C++ service for all AI workloads is a pattern that scales well.

Performance and Cost: Why This Matters When You’re Starting Out

When you’re building your first product or bootstrapping a side project, every dollar counts. Hosting an app and running AI can get expensive fast. This architecture helps on both fronts.

Concurrency without Over‑Provisioning

The dual‑backend split means CRUD and AI run on different processes. The ExpressoTS API can handle thousands of lightweight requests (transactions, account fetches, auth checks) on a modest instance, while the Python service runs AI jobs asynchronously. We’re not paying for one beefy server that sits idle waiting for AI calls—we right‑size each service. A small Node instance for the API and a small Python instance for AI often cost less than a single large monolith trying to do both.

Lower Infrastructure Cost

By isolating AI workloads, we avoid scaling the entire app when chat usage spikes. A free‑tier or low‑cost Python host can handle the AI service early on. As usage grows, we scale only that service. The CRUD API stays stable and cheap.

Cost‑Effective from Day One

I built this understanding that hosting has to be cost‑effective, especially for developers starting their own business. You don’t need Kubernetes or a fleet of containers to launch. Two small services, clear boundaries, and the ability to scale each independently—that’s a path that keeps bills low while you validate your product.

I’m the developer behind FinTrack, a personal finance app with an AI called Finny as your best advisor. If you’re building something similar, I’d love to hear how you approached it—drop a comment below.