Beyond Simple Prompts: Architecting an AI Agent
Source: Dev.to
The Problem Space
Contract review follows a predictable pattern. A legal team receives a counter‑party’s redlined contract, reviews each change against the organization’s risk tolerance, and either accepts, rejects, or modifies each suggestion. This process can take hours for a single contract.
When I set out to automate it, I realized the problem involves multiple aspects, including but not limited to:
- Analyze contracts against specific guidelines
- Generate specific text suggestions with rationale
- Apply changes as Word tracked changes – not as plain‑text replacements
- Survive document mutations – users edit contracts while analysis runs
Requirements #3 and #4 are where most tools stumble. They output suggestions in a chat interface, leaving users to copy‑paste and re‑format manually. That’s not automation; it’s a fancier Ctrl + F.
System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Web App │ │ Word Add‑in │ │ Backend │ │
│ │ (Next.js) │ │ (Office.js) │ │ (FastAPI) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ │ REST + SSE │ REST + SSE │ │
│ └────────────────────┴────────────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ Analysis Engine │ │
│ │ ┌───────────────┐ │ │
│ │ │ DSPy + LLM │ │ │
│ │ │ (OpenAI / │ │ │
│ │ │ Mistral) │ │ │
│ │ └───────────────┘ │ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Three Core Components
- Web Dashboard – Next.js application for rule management, analytics, and administration.
- Word Add‑in – Microsoft Office plugin (React + Office.js) where users actually review contracts.
- Backend API – FastAPI service handling analysis, LLM orchestration, and document processing.
The interesting engineering lives in the Word Add‑in (document manipulation) and the Backend API (analysis pipeline).
Challenge #1: The Mutable Document Problem
Scenario that breaks naive implementations
| Step | What Happens |
|---|---|
| 1️⃣ | User uploads a 50‑page contract. |
| 2️⃣ | System analyzes paragraphs 1‑50 and stores suggestions keyed by paragraph index. |
| 3️⃣ | While waiting, the user deletes paragraph 12. |
| 4️⃣ | System returns: “Paragraph 47 needs revision.” |
| 5️⃣ | Paragraph 47 is now paragraph 46 → the suggestion is applied to the wrong location. |
Solution – Paragraph Anchoring
I built a logic that assigns persistent IDs to each paragraph during preprocessing. The IDs are stored in the OOXML so they survive:
- Deleting adjacent paragraphs
- Cutting and pasting sections
- Accepting/rejecting other tracked changes
On the frontend, a Zustand store maintains bidirectional mappings:
interface ParagraphStore {
/** index → UUID */
indexToPersistentIdMap: Map;
/** UUID → index */
persistentIdToIndexMap: Map;
/** Fallback matching by text */
findAnchorByText(text: string): string | null;
}
When analysis results return, they reference UUIDs. The store resolves the current paragraph indices at application time—not at analysis time.
Challenge #2: Generating Word Tracked Changes
This is the hard part. Office.js provides no API for creating tracked changes; paragraph.insertText() simply replaces text. To produce real redlines (strikethrough deletions, colored insertions) you must:
- Generate a diff between the original and suggested text.
- Convert that diff to OOXML elements (e.g.,
<w:del>,<w:ins>). - Apply those OOXML elements to the document via Office.js.
Token‑Based Diffing
Character‑level diffs generate garbage in Word.
T̶h̶e̶A quick b̶r̶o̶w̶n̶red fox
Token‑level diffs are far cleaner:
The → A quick brown → red fox
Preserving Paragraph Properties
Contracts rely heavily on numbering, indentation, and styles. A naïve replacement destroys this formatting. The diff‑to‑OOXML conversion therefore preserves the original paragraph properties and only injects the tracked‑change markup.
Challenge #3: Long‑Running Analysis
A 50‑page contract with 30 playbook rules can take 2–3 minutes to analyze. Blocking an HTTP request for that long is unacceptable.
Session‑Based Async Processing
Client Server
│ │
│ POST /analysis/start → {sessionId}
│ │
│←─────────────────────────────│
│ │
│ GET /analysis/status?sessionId → {status, progress}
│ │
│←─────────────────────────────│
│ │
│ GET /analysis/result?sessionId → {suggestions}
│ │
│←─────────────────────────────│
- The client initiates an analysis session and receives a
sessionId. - The server runs the heavy LLM‑driven pipeline in a background worker (e.g., Celery, RQ).
- The client polls for status or receives Server‑Sent Events (SSE) updates.
- Once complete, the client fetches the suggestions, which reference the persistent paragraph IDs.
Takeaways
| ✅ What Worked | ❌ What Was Tricky |
|---|---|
| Persistent paragraph IDs survive user edits. | Office.js lacks native tracked‑change APIs. |
| Token‑level diffs keep Word output readable. | Mapping async results back to the live document. |
| Session‑based async processing keeps the UI responsive. | Handling edge cases (tables, footnotes, headers). |
By combining robust anchoring, token‑level diff → OOXML conversion, and asynchronous session handling, we can deliver a truly automated contract‑review experience that respects the document’s original formatting and tolerates user‑driven mutations.
TL;DR
- Assign UUIDs to each paragraph and store them in OOXML.
- Diff at the token level, convert the diff to tracked‑change OOXML, and inject it via Office.js.
- Run analysis asynchronously and return results keyed by the persistent IDs.
The result: a seamless, end‑to‑end system that lets legal teams review contracts with AI‑generated redlines without ever leaving Word.
Session‑Based Polling Example
POST │
─────►│ Create session
{ session_id: "abc123" } │ Start background task
◄─────│
│
GET /sessions/abc123 │
─────►│
{ status: "processing", │
progress: 45% } │
◄─────│
│
... poll every 3 seconds ... │
│
GET /sessions/abc123 │
─────►│
{ status: "complete", │
results: [...] } │
◄─────│
Cache Validation with Content Hashing
Users often analyze the same contract multiple times—different guidelines, or checking after minor edits. Re‑analyzing unchanged content wastes time and API costs.
The hash comparison catches:
- Re‑uploads of identical files
- “Analyze again” clicks without actual changes
- Multiple users analyzing the same template
Cache‑hit rate in production: ~40 % for typical contract‑review workflows.
Challenge #4 – Grounding and Hallucination Prevention
Legal documents require precision. An AI suggesting “Vendor liability is capped at $1M” when the contract says “$500K” is worse than no suggestion at all.
Solution: Use Structured Output with Explicit Citations.
Every suggestion must reference the exact source text. This catches cases where the model paraphrases instead of quoting verbatim.
The Analysis Pipeline
┌────────────────────────────────────────────────────────────────┐
│ Redline Analysis Pipeline │
├────────────────────────────────────────────────────────────────┤
│ │
│ 1. DOCUMENT INGESTION │
│ ┌─────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ DOCX │────>│ Extract │────>│ Paragraph │ │
│ │ File │ │ OOXML │ │ Anchoring │ │
│ └─────────┘ └─────────────┘ └──────────────┘ │
│ │
│ 2. CONTENT NORMALIZATION │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ OOXML with │────>│ Unified │ │
│ │ Tracked │ │ Markdown │ │
│ │ Changes │ │ (Original + │ │
│ │ │ │ Revised views) │ │
│ └─────────────┘ └─────────────────┘ │
│ │
│ 3. LLM ANALYSIS │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ │────>│ DSPy │────>│ Structured │ │
│ │ Rules │ │ Signatures │ │ Suggestions │ │
│ └─────────────┘ └─────────────┘ └──────────────┘ │
│ │
│ 4. OUTPUT GENERATION │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ Suggestions │────>│ Token Diff │────>│ OOXML │ │
│ │ + Rationale │ │ Algorithm │ │ │ │
│ └─────────────┘ └─────────────┘ └──────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
OOXML‑to‑Markdown Conversion
Incoming contracts often already contain tracked changes from counter‑party negotiations. The converter:
- Parses OOXML elements
- Generates two synchronized views: Original (deletions, no insertions) and Revised (insertions, no deletions)
- Preserves paragraph IDs from content controls
This abstraction lets the LLM work on clean Markdown rather than raw XML, keeping the conversion complexity isolated in the conversion layer.
Results
| Metric | Value |
|---|---|
| Processing time (20‑page contract) | 30‑45 seconds (depends on rule complexity) |
| Cache‑hit rate | ~40 % (saves re‑analysis on unchanged content) |
| Hallucination rate | < 5 % (caught by validation, not shown to users) |
| Format preservation | 95 % (paragraph properties maintained) |
| Tracked‑change accuracy | Token‑level precision |
Lessons Learned
- Office.js is powerful but under‑documented. The OOXML manipulation pattern isn’t in any official guide; I reverse‑engineered it by exporting documents and reading the XML.
- Character‑level diffs are wrong for documents. Always tokenize first; generic diff libraries don’t understand word boundaries.
- Async patterns matter more than you think. Session‑based polling sounds simple, but handling edge cases (browser refresh, network drops, server restarts) required careful state management.
- Ground everything. LLMs will confidently cite text that doesn’t exist. Validation layers catch this, but only if the output schema forces explicit source references.
- Content hashing is cheap insurance. SHA‑256 computation is negligible compared to LLM costs; cache validation paid for itself in the first week.
Tech Stack Summary
| Layer | Technology | Why |
|---|---|---|
| Backend API | FastAPI (Python) | Async‑native, great for long‑running tasks |
| LLM Orchestration | DSPy | Structured outputs, provider‑agnostic |
| LLM Providers | OpenAI, Mistral | Redundancy, cost optimisation |
| Database | Supabase (PostgreSQL) | Real‑time subscriptions, hosted |
| Web Frontend | Next.js | SSR for dashboard, API routes |
# Word Add‑in
**Technology:** React + Office.js
*Only option for Word integration*
Document Processing
- Tools:
python-docx, custom OOXML - Limitation: No library handles tracked changes
Closing Thoughts
The interesting engineering in “AI for X” products is rarely the AI part.
Calling an LLM API is straightforward. The challenge is everything around it:
- Maintaining document fidelity
- Handling state across long‑running operations
- Building validation layers that catch model failures before users see them
Legal redlining pushed me to solve problems I didn’t anticipate—paragraph anchoring, OOXML manipulation, token‑based diffing. Each solution came from understanding the domain deeply, not from finding a better prompt.
If you’re building in this space, I’d be interested to hear about your approach.
Arun Venkataramanan is a Senior Software Engineer at Ottimate, where he works on architecting solutions for accounts‑payable automation. With a background spanning core banking systems (TCS), fintech platforms, and enterprise automation, he focuses on building solutions and tools to help users automate repetitive tasks in their day‑to‑day work.
Connect on LinkedIn.