ScanVault

Published: (February 16, 2026 at 02:24 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Overview

ScanVault is a CLI‑first document intelligence platform that transforms unstructured files—receipts, invoices, PDFs, screenshots, whiteboard photos—into structured, searchable data. It solves the problem of accumulating mountains of documents across devices with no fast way to locate specific information later (e.g., “what was that invoice number?” or “how much did I spend on travel last month?”).

Architecture

  • Full‑stack TypeScript monorepo

    • Azure Functions serverless API (Cosmos DB, Blob Storage, Azure AI Search)
    • Next.js web dashboard with a category‑management UI
    • Commander.js CLI as the primary user interface
  • Shared ai-extract package abstracts extraction across OpenAI, Anthropic, Google, and a zero‑config fallback using Tesseract.js OCR.

  • Category system & field extraction enable powerful queries such as:

    vault search "total > 50 category:finance"

Copilot CLI Impact

Infrastructure & API Development

  • Scaffolding Azure resources (Bicep modules for Cosmos DB, Blob Storage, Azure AI Search, Key Vault) with Copilot’s suggestions reduced verbose boilerplate.
  • Rapid creation of Azure Functions HTTP handlers, auth middleware, Zod validation, and Cosmos DB queries.
  • Automated generation of CRUD endpoints for assets, categories, and settings.

Search Query Parser

  • Copilot helped build a regex‑based tokenizer and comprehensive test cases for filters like category:finance status:ready total>50.

CLI Enhancements

  • Accelerated construction of the Commander.js command tree and output formatting (--json, --quiet, table modes).
  • Assisted with the Copilot extraction integration: child‑process spawning, JSON response parsing, and fallback heuristics for category inference.

Web Dashboard

  • Tailwind CSS styling, collapsible sections, context menus, and API client wiring were all streamlined with Copilot’s code suggestions.

Across the monorepo—from shared Zod schemas to infrastructure templates—Copilot CLI cut boilerplate friction, letting the focus stay on architecture and product decisions.

Copilot Extraction Integration

When no third‑party API key is configured, the vault upload command automatically delegates extraction to Copilot via a custom extraction skill. This provides out‑of‑the‑box structured field extraction, entity recognition, and auto‑categorization without additional cost or configuration.

  • Input/Output JSON contract
    • Input: a file (binary or base64).
    • Output: a summary, extracted fields with confidence scores, named entities, and a suggested category.

The CLI pipes the file context to Copilot, parses the structured response, and feeds it into the same upload‑confirm pipeline used by other providers. Consequently, search, export, and the web dashboard behave identically whether extraction is performed by OpenAI, Anthropic, or Copilot.

Usage Examples

  • Search with filters

    vault search "total > 50 category:finance"
  • Summarize a vault

    vault summarize --category finance --since 1w
  • Ask natural‑language questions

    vault ask --category finance --since 1w "how much did I spend last week?"

These commands leverage Copilot at runtime to answer natural‑language queries over the indexed documents.

Repository

https://github.com/ronak-guliani/scanvault

Video

(Link to demo video, if available)

0 views
Back to Blog

Related posts

Read more »

Preface

Motivation I wanted to record my studies to have consistency. Since I don't directly learn building projects from my CS program, I want to be an expert in my a...