Beyond Simple Prompts: Architecting an AI Agent

Published: 1 month ago (December 17, 2025 at 10:40 AM EST)

8 min read

Source: Dev.to

The Problem Space

Contract review follows a predictable pattern. A legal team receives a counter‑party’s redlined contract, reviews each change against the organization’s risk tolerance, and either accepts, rejects, or modifies each suggestion. This process can take hours for a single contract.

When I set out to automate it, I realized the problem involves multiple aspects, including but not limited to:

Analyze contracts against specific guidelines
Generate specific text suggestions with rationale
Apply changes as Word tracked changes – not as plain‑text replacements
Survive document mutations – users edit contracts while analysis runs

Requirements #3 and #4 are where most tools stumble. They output suggestions in a chat interface, leaving users to copy‑paste and re‑format manually. That’s not automation; it’s a fancier Ctrl + F.

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Architecture                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌──────────────┐     ┌──────────────┐     ┌──────────────┐    │
│   │   Web App    │     │  Word Add‑in │     │   Backend    │    │
│   │  (Next.js)   │     │ (Office.js)  │     │  (FastAPI)   │    │
│   └──────┬───────┘     └──────┬───────┘     └──────┬───────┘    │
│          │                    │                    │            │
│          │    REST + SSE      │    REST + SSE      │            │
│          └────────────────────┴────────────────────┘            │
│                               │                                 │
│                    ┌──────────┴──────────┐                      │
│                    │   Analysis Engine   │                      │
│                    │  ┌───────────────┐  │                      │
│                    │  │  DSPy + LLM   │  │                      │
│                    │  │  (OpenAI /    │  │                      │
│                    │  │   Mistral)    │  │                      │
│                    │  └───────────────┘  │                      │
│                    └─────────────────────┘                      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Three Core Components

Web Dashboard – Next.js application for rule management, analytics, and administration.
Word Add‑in – Microsoft Office plugin (React + Office.js) where users actually review contracts.
Backend API – FastAPI service handling analysis, LLM orchestration, and document processing.

The interesting engineering lives in the Word Add‑in (document manipulation) and the Backend API (analysis pipeline).

Challenge #1: The Mutable Document Problem

Scenario that breaks naive implementations

Step	What Happens
1️⃣	User uploads a 50‑page contract.
2️⃣	System analyzes paragraphs 1‑50 and stores suggestions keyed by paragraph index.
3️⃣	While waiting, the user deletes paragraph 12.
4️⃣	System returns: “Paragraph 47 needs revision.”
5️⃣	Paragraph 47 is now paragraph 46 → the suggestion is applied to the wrong location.

Solution – Paragraph Anchoring

I built a logic that assigns persistent IDs to each paragraph during preprocessing. The IDs are stored in the OOXML so they survive:

Deleting adjacent paragraphs
Cutting and pasting sections
Accepting/rejecting other tracked changes

On the frontend, a Zustand store maintains bidirectional mappings:

interface ParagraphStore {
  /** index → UUID */
  indexToPersistentIdMap: Map;

  /** UUID → index */
  persistentIdToIndexMap: Map;

  /** Fallback matching by text */
  findAnchorByText(text: string): string | null;
}

When analysis results return, they reference UUIDs. The store resolves the current paragraph indices at application time—not at analysis time.

Challenge #2: Generating Word Tracked Changes

This is the hard part. Office.js provides no API for creating tracked changes; paragraph.insertText() simply replaces text. To produce real redlines (strikethrough deletions, colored insertions) you must:

Generate a diff between the original and suggested text.
Convert that diff to OOXML elements (e.g., <w:del>, <w:ins>).
Apply those OOXML elements to the document via Office.js.

Token‑Based Diffing

Character‑level diffs generate garbage in Word.

T̶h̶e̶A quick b̶r̶o̶w̶n̶red fox

Token‑level diffs are far cleaner:

The → A quick brown → red fox

Preserving Paragraph Properties

Contracts rely heavily on numbering, indentation, and styles. A naïve replacement destroys this formatting. The diff‑to‑OOXML conversion therefore preserves the original paragraph properties and only injects the tracked‑change markup.

Challenge #3: Long‑Running Analysis

A 50‑page contract with 30 playbook rules can take 2–3 minutes to analyze. Blocking an HTTP request for that long is unacceptable.

Session‑Based Async Processing

Client                         Server
  │                               │
  │  POST /analysis/start → {sessionId}
  │                               │
  │←─────────────────────────────│
  │                               │
  │  GET /analysis/status?sessionId → {status, progress}
  │                               │
  │←─────────────────────────────│
  │                               │
  │  GET /analysis/result?sessionId → {suggestions}
  │                               │
  │←─────────────────────────────│

The client initiates an analysis session and receives a sessionId.
The server runs the heavy LLM‑driven pipeline in a background worker (e.g., Celery, RQ).
The client polls for status or receives Server‑Sent Events (SSE) updates.
Once complete, the client fetches the suggestions, which reference the persistent paragraph IDs.

Takeaways

✅ What Worked	❌ What Was Tricky
Persistent paragraph IDs survive user edits.	Office.js lacks native tracked‑change APIs.
Token‑level diffs keep Word output readable.	Mapping async results back to the live document.
Session‑based async processing keeps the UI responsive.	Handling edge cases (tables, footnotes, headers).

By combining robust anchoring, token‑level diff → OOXML conversion, and asynchronous session handling, we can deliver a truly automated contract‑review experience that respects the document’s original formatting and tolerates user‑driven mutations.

TL;DR

Assign UUIDs to each paragraph and store them in OOXML.
Diff at the token level, convert the diff to tracked‑change OOXML, and inject it via Office.js.
Run analysis asynchronously and return results keyed by the persistent IDs.

The result: a seamless, end‑to‑end system that lets legal teams review contracts with AI‑generated redlines without ever leaving Word.

Session‑Based Polling Example

POST │
─────►│ Create session
{ session_id: "abc123" } │ Start background task
◄─────│
│
GET /sessions/abc123 │
─────►│
{ status: "processing", │
  progress: 45% } │
◄─────│
│
... poll every 3 seconds ... │
│
GET /sessions/abc123 │
─────►│
{ status: "complete", │
  results: [...] } │
◄─────│

Cache Validation with Content Hashing

Users often analyze the same contract multiple times—different guidelines, or checking after minor edits. Re‑analyzing unchanged content wastes time and API costs.

The hash comparison catches:

Re‑uploads of identical files
“Analyze again” clicks without actual changes
Multiple users analyzing the same template

Cache‑hit rate in production: ~40 % for typical contract‑review workflows.

Challenge #4 – Grounding and Hallucination Prevention

Legal documents require precision. An AI suggesting “Vendor liability is capped at $1M” when the contract says “$500K” is worse than no suggestion at all.

Solution: Use Structured Output with Explicit Citations.
Every suggestion must reference the exact source text. This catches cases where the model paraphrases instead of quoting verbatim.

The Analysis Pipeline

┌────────────────────────────────────────────────────────────────┐
│                     Redline Analysis Pipeline                  │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  1. DOCUMENT INGESTION                                         │
│     ┌─────────┐     ┌─────────────┐     ┌──────────────┐       │
│     │  DOCX   │────>│ Extract     │────>│ Paragraph    │       │
│     │  File   │     │ OOXML       │     │ Anchoring    │       │
│     └─────────┘     └─────────────┘     └──────────────┘       │
│                                                                │
│  2. CONTENT NORMALIZATION                                      │
│     ┌─────────────┐     ┌─────────────────┐                    │
│     │ OOXML with  │────>│ Unified         │                    │
│     │ Tracked     │     │ Markdown        │                    │
│     │ Changes     │     │ (Original +     │                    │
│     │             │     │  Revised views) │                    │
│     └─────────────┘     └─────────────────┘                    │
│                                                                │
│  3. LLM ANALYSIS                                               │
│     ┌─────────────┐     ┌─────────────┐     ┌──────────────┐   │
│     │             │────>│ DSPy        │────>│ Structured   │   │
│     │ Rules       │     │ Signatures  │     │ Suggestions  │   │
│     └─────────────┘     └─────────────┘     └──────────────┘   │
│                                                                │
│  4. OUTPUT GENERATION                                          │
│     ┌─────────────┐     ┌─────────────┐     ┌──────────────┐   │
│     │ Suggestions │────>│ Token Diff  │────>│ OOXML        │   │
│     │ + Rationale │     │ Algorithm   │     │              │   │
│     └─────────────┘     └─────────────┘     └──────────────┘   │
│                                                                │
└────────────────────────────────────────────────────────────────┘

OOXML‑to‑Markdown Conversion

Incoming contracts often already contain tracked changes from counter‑party negotiations. The converter:

Parses OOXML elements
Generates two synchronized views: Original (deletions, no insertions) and Revised (insertions, no deletions)
Preserves paragraph IDs from content controls

This abstraction lets the LLM work on clean Markdown rather than raw XML, keeping the conversion complexity isolated in the conversion layer.

Results

Metric	Value
Processing time (20‑page contract)	30‑45 seconds (depends on rule complexity)
Cache‑hit rate	~40 % (saves re‑analysis on unchanged content)
Hallucination rate	< 5 % (caught by validation, not shown to users)
Format preservation	95 % (paragraph properties maintained)
Tracked‑change accuracy	Token‑level precision

Lessons Learned

Office.js is powerful but under‑documented. The OOXML manipulation pattern isn’t in any official guide; I reverse‑engineered it by exporting documents and reading the XML.
Character‑level diffs are wrong for documents. Always tokenize first; generic diff libraries don’t understand word boundaries.
Async patterns matter more than you think. Session‑based polling sounds simple, but handling edge cases (browser refresh, network drops, server restarts) required careful state management.
Ground everything. LLMs will confidently cite text that doesn’t exist. Validation layers catch this, but only if the output schema forces explicit source references.
Content hashing is cheap insurance. SHA‑256 computation is negligible compared to LLM costs; cache validation paid for itself in the first week.

Tech Stack Summary

Layer	Technology	Why
Backend API	FastAPI (Python)	Async‑native, great for long‑running tasks
LLM Orchestration	DSPy	Structured outputs, provider‑agnostic
LLM Providers	OpenAI, Mistral	Redundancy, cost optimisation
Database	Supabase (PostgreSQL)	Real‑time subscriptions, hosted
Web Frontend	Next.js	SSR for dashboard, API routes

# Word Add‑in  
**Technology:** React + Office.js  
*Only option for Word integration*

Document Processing

Tools: python-docx, custom OOXML
Limitation: No library handles tracked changes

Closing Thoughts

The interesting engineering in “AI for X” products is rarely the AI part.
Calling an LLM API is straightforward. The challenge is everything around it:

Maintaining document fidelity
Handling state across long‑running operations
Building validation layers that catch model failures before users see them

Legal redlining pushed me to solve problems I didn’t anticipate—paragraph anchoring, OOXML manipulation, token‑based diffing. Each solution came from understanding the domain deeply, not from finding a better prompt.

If you’re building in this space, I’d be interested to hear about your approach.

Arun Venkataramanan is a Senior Software Engineer at Ottimate, where he works on architecting solutions for accounts‑payable automation. With a background spanning core banking systems (TCS), fintech platforms, and enterprise automation, he focuses on building solutions and tools to help users automate repetitive tasks in their day‑to‑day work.

Connect on LinkedIn.