Battle for Context: How We Implemented AI Coding in an Enterprise Project

Published: 1 week ago (December 30, 2025 at 03:57 PM EST)

8 min read

Source: Dev.to

Introduction: A Task Nobody Had Solved

Imagine this: you need to give an analyst the ability to code. Not “write a prompt to ChatGPT,” but actually make changes to an enterprise product with a three‑year history and a million lines of code.

The developer isn’t sitting next to them dictating every line. They set up the environment, control quality, and only intervene when something goes wrong.

Sounds like science fiction? We thought so too—until we tried.

Why This Is Harder Than It Seems

When a programmer uses an AI assistant, they control every step. They see what’s happening under the hood and notice oddities in the code immediately.

With an analyst, everything is different. They see only the result: “the form appeared” or “the form doesn’t work.” Code quality, architectural decisions, potential bugs—all of this remains behind the scenes.

We decided to create a system that compensates for this blindness—a system where AI can’t “cause trouble” even if it really wants to.

Tool Selection: Why Cursor

We tried several options (GitHub Copilot, Claude Code, various API wrappers) and settled on Cursor for several reasons.

Multi‑model Support

Cursor lets us use different models for different tasks:

flowchart LR
    A[Task] --> B{Type?}
    B -->|Planning| C[Claude Opus 4.5]
    B -->|Implementation| D[Gemini 3 Flash]
    C --> E[200K tokens]
    D --> F[1M tokens]

Claude Opus 4.5 – architectural planning (smart but “expensive” in tokens)
Gemini 3 Flash – implementation (fast, cheap, and most importantly — 1 million tokens of context)

MCP Integration

Model Context Protocol (MCP) is a way to connect external tools to AI.

Component	Purpose
Jira	Task management
Context7	Library documentation
Memory Bank	Context preservation between sessions
Beads	Atomic task tracking

Flexible Rules

Cursor allows creating .mdc files with rules that automatically load depending on context.
Working on a React component → load React rules.
Writing a script → load Node.js rules.

Security Requirements: Working Locally

Our security team imposed strict requirements: no access to the corporate network during development and no cloning of the production database.
We built a full mocking system on MSW (Mock Service Worker):

flowchart TD
    A[Frontend App] --> B[MSW Interceptor]
    B --> C{Request Type}
    C -->|API Call| D[Mock Handlers]
    C -->|Static| E[Pass‑Through]
    D --> F[Fake Data Generators]
    F --> G[@faker-js]
    D --> H[Response]

50+ handlers for all API endpoints
Realistic data generators using @faker-js
Full business‑logic emulation

Quality Gates: The Stricter, The Better

Key insight: AI needs strict constraints. Without them, it starts to “create”: sees outdated code → refactors; notices a potential vulnerability → “fixes” it; finds a style mismatch → reformats.
In practice, a simple task like “add a field to a form” can turn into a PR with 100 000 lines.

Our Quality‑Gates Pipeline

flowchart TD
    A[Commit] --> B[commitlint]
    B -->|Pass| C[ESLint]
    B -->|Fail| X[❌ Rejected]
    C -->|Pass| D[TypeScript]
    C -->|Fail| X
    D -->|Pass| E[Vitest]
    D -->|Fail| X
    E -->|Pass| F[Secretlint]
    E -->|Fail| X
    F -->|Pass| G[✅ Push]
    F -->|Fail| X

Tool	Role
commitlint	Checks commit‑message format
ESLint	Strict TypeScript rules, import order
TypeScript	Strict mode, no `any`
Vitest	Unit tests must pass
Secretlint	Detects accidentally committed secrets

If the code doesn’t pass these checks, the commit never happens.

The Context Problem: The Main Pain Point

Context is the thing that almost killed the entire project.

With a simple 10‑file app, AI sees the whole project and works perfectly.
With a million‑line, three‑year‑old codebase, AI only sees a fragment—the tip of the iceberg.

Project Size	Tokens	AI Effective Work Time
Tutorial project	100 K	Unlimited
Medium product	500 K	2‑3 hours
Enterprise (3+ years)	1 M+	20‑30 minutes

After ~30 minutes, AI starts to “forget,” repeats mistakes, proposes already‑rejected solutions, and breaks things that were just working.

Four Rakes We Stepped On

Rake #1: “It Worked on a Simple Example”

Experiment: Ask an analyst to create a registration form on a clean boilerplate (minimal React project, reference rules, 10 files).

Result: 15 minutes, everything works perfectly.

Same task on the real project: nothing works. AI gets confused by dependencies, uses outdated patterns, and conflicts with existing code.

Lesson: It’s not that AI is “dumb”; it’s the lack of context.

Rake #2: AI “Fixed” the Entire Project

Task: add one feature. AI completed it and:

Replaced all any with specific types
“Fixed” potential vulnerabilities
Reformatted half the project
Updated outdated dependencies

Result: PR with 100 000+ lines; GitLab couldn’t even display the diff. We spent two weeks untangling it, and the product was broken.

Lesson: Explicitly limit the scope of AI work with rules.

Rake #3: Token Limitation

We didn’t initially realize most models have a context window limited to ~100 K tokens. When the codebase exceeds that, the model can’t see the whole picture, leading to the problems described above.

Rake #4: Over‑reliance on AI for Decision‑Making

We let the model suggest architectural changes without human review. The model’s “optimizations” conflicted with long‑standing design decisions, causing regressions.

Lesson: Keep humans in the loop for any high‑impact decision.

Takeaways

Context is king. For large codebases, slice the problem into well‑defined, context‑bounded tasks.
Strict quality gates are non‑negotiable; they prevent AI from slipping unchecked changes into the repo.
Model selection matters. Use a cheap, high‑token model for implementation and a more capable (but expensive) model for planning.
Security‑first mocking lets you develop locally without exposing production data.
Human oversight remains essential, especially for architectural or security‑critical changes.

By embracing these principles, we turned a potentially chaotic AI‑assisted workflow into a reliable, repeatable process that scales to enterprise‑size codebases.

200K Tokens – Not Enough for Enterprise

For an Enterprise project, 200 K tokens are only enough for 3‑5 iterations.
After that the AI starts “forgetting” the beginning of the conversation, proposes solutions you’ve already rejected, and repeats mistakes.

Lesson: For Enterprise work you need models with at least 1 million tokens of context.

Rake #4 – Auto‑Mode Is a Trap

Cursor can automatically select a model. It sounds convenient, but in practice it often picks a cheap model with a small context window.

We wasted a lot of time before we understood that for serious work you need to manually select the model.

Lesson: Use Claude Opus for planning and Gemini Flash for implementation. Never rely on auto‑mode.

How We Solved the Context Problem

After all the “rakes” we built a system that isn’t perfect but works.

Two‑Level Task Tracking

flowchart TD
    subgraph Jira [Jira – Team Level]
        J1[VP‑385: Add Registration Form]
    end

    subgraph Beads [Beads – Atomic Level]
        B1[bd‑1: Review UserForm.tsx]
        B2[bd‑2: Add email field]
        B3[bd‑3: Write test]
    end

    J1 --> B1
    B1 --> B2
    B2 --> B3

Jira – top‑level tasks for the team (e.g., VP‑385: Add registration form).
Beads – atomic tasks for the AI:
- bd‑1: Review file UserForm.tsx
- bd‑2: Add email field
- bd‑3: Write test

Beads are stored locally and synced with Git, so the AI always knows which step it stopped at.

Memory Bank

An “external memory” for the AI that stores:

Current context – what we’re working on now
Progress – what’s already done
Research – findings and references
Archives – completed tasks

When the AI “forgets” something, it can pull the needed information from the Memory Bank and restore its understanding.

Model Combination

We split the work between two models:

Model	Role	Reason
Claude Opus 4.5	Architect – creates plans, writes specs, conducts reviews	200 K token window is enough for planning
Gemini 3 Flash	Executor – implements code according to the plan	1 M token window lets it work for hours without losing the thread

flowchart LR
    A[Task] --> B[Opus: Planning]
    B --> C[Opus: Spec / TZ]
    C --> D[Gemini: Implementation]
    D --> E[Opus: Review]
    E -->|Issues| D
    E -->|OK| F[Done]

Cycle: Opus plans → Gemini implements → Opus reviews.

Project Statistics (1.5 weeks on `feature/msw-mocks`)

Metric	Value
Commits	425
Files changed	672
Lines added	+85 000
Lines removed	–11 000
Tests added	~200
Tokens spent	1.5 billion

Implemented features

✅ Full MSW mocking system (50+ handlers)
✅ Schedule timeline with Gantt chart
✅ Quality Gates (ESLint, TypeScript, Husky)
✅ Beads integration
✅ 200+ unit tests

Comparison: Traditional Development vs. AI‑Assisted Development

Parameter	Traditional	With AI
Time per feature	2‑3 days	1.5 weeks*
Code quality	Depends on developer	High (Quality Gates)
Tests	Often skipped	200+ automatically
Documentation	Often none	Generated

*Includes infrastructure setup, learning curve, and all the “rakes.”

Important nuance: The first feature is expensive. We spent 1.5 weeks learning the workflow, setting up rules, and stepping on rakes. Subsequent features take ≈10× less time.

Role Evolution

Analyst – no longer just “writes specs.” Becomes a junior developer who must understand SQL, work with Git, and read code at a basic level.
Developer – no longer just “writes code.” Becomes an architect who focuses on design patterns rather than language syntax. AI can write in Java, Node.js, Python, Go, etc., so developers become universal specialists.

Conclusions & Recommendations

What Works

Opus + Gemini combination – smart architect + fast executor
Quality Gates – stricter constraints → better results
Two‑level tracking – Jira for the team, Beads for the AI
Memory Bank – external memory to avoid losing context
Data mocking – complete development autonomy

What Doesn’t Work

Auto‑mode for model selection
AI without constraints (it will “fix” the entire project)
Models with context windows < 1 M tokens for Enterprise work

Checklist for Getting Started

Set up local development environment
Implement Quality Gates (ESLint, strict TypeScript)
Create a data‑mocking system
Connect MCP (Jira, Context7, Memory Bank)
Train analysts on Git and SQL
Choose the right models (Opus + Gemini)

Final Thoughts

The battle for context isn’t won yet. Context windows keep growing, but Enterprise projects remain too large for an AI to see them in full. We therefore need systems that help AI maintain focus: task trackers, Memory Bank, and Quality Gates.

We spent 1.5 billion tokens to reach these insights. Hopefully our experience helps you spend far fewer.

What’s your experience with AI coding in large projects? Share in the comments!

About the Author

Working on UI with React. Tools: Cursor IDE, Claude Opus 4.5, Gemini 3 Flash.

Tags: ai cursor enterprise programming devjournal

Battle for Context: How We Implemented AI Coding in an Enterprise Project

Introduction: A Task Nobody Had Solved

Why This Is Harder Than It Seems

Tool Selection: Why Cursor

Multi‑model Support

MCP Integration

Flexible Rules

Security Requirements: Working Locally

Quality Gates: The Stricter, The Better

Our Quality‑Gates Pipeline

The Context Problem: The Main Pain Point

Four Rakes We Stepped On

Rake #1: “It Worked on a Simple Example”

Rake #2: AI “Fixed” the Entire Project

Rake #3: Token Limitation

Rake #4: Over‑reliance on AI for Decision‑Making

Takeaways

200K Tokens – Not Enough for Enterprise

Rake #4 – Auto‑Mode Is a Trap

How We Solved the Context Problem

Two‑Level Task Tracking

Memory Bank

Model Combination

Project Statistics (1.5 weeks on `feature/msw-mocks`)

Comparison: Traditional Development vs. AI‑Assisted Development

Role Evolution

Conclusions & Recommendations

What Works

What Doesn’t Work

Checklist for Getting Started

Final Thoughts

About the Author

Related posts

Congrats to the AI Agents Intensive Course Writing Challenge Winners!

How GitHub Pull Requests in VS Code Improved My Open-Source Workflow

AI SEO agencies Nordic

How do I discover new music that actually fits my taste?

Introduction: A Task Nobody Had Solved

Why This Is Harder Than It Seems

Tool Selection: Why Cursor

Multi‑model Support

MCP Integration

Flexible Rules

Security Requirements: Working Locally

Quality Gates: The Stricter, The Better

Our Quality‑Gates Pipeline

The Context Problem: The Main Pain Point

Four Rakes We Stepped On

Rake #1: “It Worked on a Simple Example”

Rake #2: AI “Fixed” the Entire Project

Rake #3: Token Limitation

Rake #4: Over‑reliance on AI for Decision‑Making

Takeaways

200K Tokens – Not Enough for Enterprise

Rake #4 – Auto‑Mode Is a Trap

How We Solved the Context Problem

Two‑Level Task Tracking

Memory Bank

Model Combination

Project Statistics (1.5 weeks on feature/msw-mocks)

Comparison: Traditional Development vs. AI‑Assisted Development

Role Evolution

Conclusions & Recommendations

What Works

What Doesn’t Work

Checklist for Getting Started

Final Thoughts

About the Author

Related posts

Congrats to the AI Agents Intensive Course Writing Challenge Winners!

How GitHub Pull Requests in VS Code Improved My Open-Source Workflow

AI SEO agencies Nordic

How do I discover new music that actually fits my taste?

Rake #1: “It Worked on a Simple Example”

Rake #2: AI “Fixed” the Entire Project

Rake #3: Token Limitation

Rake #4: Over‑reliance on AI for Decision‑Making

Project Statistics (1.5 weeks on `feature/msw-mocks`)