Battle for Context: How We Implemented AI Coding in an Enterprise Project
Source: Dev.to
Introduction: A Task Nobody Had Solved
Imagine this: you need to give an analyst the ability to code. Not “write a prompt to ChatGPT,” but actually make changes to an enterprise product with a three‑year history and a million lines of code.
The developer isn’t sitting next to them dictating every line. They set up the environment, control quality, and only intervene when something goes wrong.
Sounds like science fiction? We thought so too—until we tried.
Why This Is Harder Than It Seems
When a programmer uses an AI assistant, they control every step. They see what’s happening under the hood and notice oddities in the code immediately.
With an analyst, everything is different. They see only the result: “the form appeared” or “the form doesn’t work.” Code quality, architectural decisions, potential bugs—all of this remains behind the scenes.
We decided to create a system that compensates for this blindness—a system where AI can’t “cause trouble” even if it really wants to.
Tool Selection: Why Cursor
We tried several options (GitHub Copilot, Claude Code, various API wrappers) and settled on Cursor for several reasons.
Multi‑model Support
Cursor lets us use different models for different tasks:
flowchart LR
A[Task] --> B{Type?}
B -->|Planning| C[Claude Opus 4.5]
B -->|Implementation| D[Gemini 3 Flash]
C --> E[200K tokens]
D --> F[1M tokens]
- Claude Opus 4.5 – architectural planning (smart but “expensive” in tokens)
- Gemini 3 Flash – implementation (fast, cheap, and most importantly — 1 million tokens of context)
MCP Integration
Model Context Protocol (MCP) is a way to connect external tools to AI.
| Component | Purpose |
|---|---|
| Jira | Task management |
| Context7 | Library documentation |
| Memory Bank | Context preservation between sessions |
| Beads | Atomic task tracking |
Flexible Rules
Cursor allows creating .mdc files with rules that automatically load depending on context.
Working on a React component → load React rules.
Writing a script → load Node.js rules.
Security Requirements: Working Locally
Our security team imposed strict requirements: no access to the corporate network during development and no cloning of the production database.
We built a full mocking system on MSW (Mock Service Worker):
flowchart TD
A[Frontend App] --> B[MSW Interceptor]
B --> C{Request Type}
C -->|API Call| D[Mock Handlers]
C -->|Static| E[Pass‑Through]
D --> F[Fake Data Generators]
F --> G[@faker-js]
D --> H[Response]
- 50+ handlers for all API endpoints
- Realistic data generators using
@faker-js - Full business‑logic emulation
Quality Gates: The Stricter, The Better
Key insight: AI needs strict constraints. Without them, it starts to “create”: sees outdated code → refactors; notices a potential vulnerability → “fixes” it; finds a style mismatch → reformats.
In practice, a simple task like “add a field to a form” can turn into a PR with 100 000 lines.
Our Quality‑Gates Pipeline
flowchart TD
A[Commit] --> B[commitlint]
B -->|Pass| C[ESLint]
B -->|Fail| X[❌ Rejected]
C -->|Pass| D[TypeScript]
C -->|Fail| X
D -->|Pass| E[Vitest]
D -->|Fail| X
E -->|Pass| F[Secretlint]
E -->|Fail| X
F -->|Pass| G[✅ Push]
F -->|Fail| X
| Tool | Role |
|---|---|
| commitlint | Checks commit‑message format |
| ESLint | Strict TypeScript rules, import order |
| TypeScript | Strict mode, no any |
| Vitest | Unit tests must pass |
| Secretlint | Detects accidentally committed secrets |
If the code doesn’t pass these checks, the commit never happens.
The Context Problem: The Main Pain Point
Context is the thing that almost killed the entire project.
- With a simple 10‑file app, AI sees the whole project and works perfectly.
- With a million‑line, three‑year‑old codebase, AI only sees a fragment—the tip of the iceberg.
| Project Size | Tokens | AI Effective Work Time |
|---|---|---|
| Tutorial project | 100 K | Unlimited |
| Medium product | 500 K | 2‑3 hours |
| Enterprise (3+ years) | 1 M+ | 20‑30 minutes |
After ~30 minutes, AI starts to “forget,” repeats mistakes, proposes already‑rejected solutions, and breaks things that were just working.
Four Rakes We Stepped On
Rake #1: “It Worked on a Simple Example”
Experiment: Ask an analyst to create a registration form on a clean boilerplate (minimal React project, reference rules, 10 files).
Result: 15 minutes, everything works perfectly.
Same task on the real project: nothing works. AI gets confused by dependencies, uses outdated patterns, and conflicts with existing code.
Lesson: It’s not that AI is “dumb”; it’s the lack of context.
Rake #2: AI “Fixed” the Entire Project
Task: add one feature. AI completed it and:
- Replaced all
anywith specific types - “Fixed” potential vulnerabilities
- Reformatted half the project
- Updated outdated dependencies
Result: PR with 100 000+ lines; GitLab couldn’t even display the diff. We spent two weeks untangling it, and the product was broken.
Lesson: Explicitly limit the scope of AI work with rules.
Rake #3: Token Limitation
We didn’t initially realize most models have a context window limited to ~100 K tokens. When the codebase exceeds that, the model can’t see the whole picture, leading to the problems described above.
Rake #4: Over‑reliance on AI for Decision‑Making
We let the model suggest architectural changes without human review. The model’s “optimizations” conflicted with long‑standing design decisions, causing regressions.
Lesson: Keep humans in the loop for any high‑impact decision.
Takeaways
- Context is king. For large codebases, slice the problem into well‑defined, context‑bounded tasks.
- Strict quality gates are non‑negotiable; they prevent AI from slipping unchecked changes into the repo.
- Model selection matters. Use a cheap, high‑token model for implementation and a more capable (but expensive) model for planning.
- Security‑first mocking lets you develop locally without exposing production data.
- Human oversight remains essential, especially for architectural or security‑critical changes.
By embracing these principles, we turned a potentially chaotic AI‑assisted workflow into a reliable, repeatable process that scales to enterprise‑size codebases.
200K Tokens – Not Enough for Enterprise
- For an Enterprise project, 200 K tokens are only enough for 3‑5 iterations.
- After that the AI starts “forgetting” the beginning of the conversation, proposes solutions you’ve already rejected, and repeats mistakes.
Lesson: For Enterprise work you need models with at least 1 million tokens of context.
Rake #4 – Auto‑Mode Is a Trap
Cursor can automatically select a model. It sounds convenient, but in practice it often picks a cheap model with a small context window.
We wasted a lot of time before we understood that for serious work you need to manually select the model.
Lesson: Use Claude Opus for planning and Gemini Flash for implementation. Never rely on auto‑mode.
How We Solved the Context Problem
After all the “rakes” we built a system that isn’t perfect but works.
Two‑Level Task Tracking
flowchart TD
subgraph Jira [Jira – Team Level]
J1[VP‑385: Add Registration Form]
end
subgraph Beads [Beads – Atomic Level]
B1[bd‑1: Review UserForm.tsx]
B2[bd‑2: Add email field]
B3[bd‑3: Write test]
end
J1 --> B1
B1 --> B2
B2 --> B3
- Jira – top‑level tasks for the team (e.g., VP‑385: Add registration form).
- Beads – atomic tasks for the AI:
- bd‑1: Review file
UserForm.tsx - bd‑2: Add email field
- bd‑3: Write test
- bd‑1: Review file
Beads are stored locally and synced with Git, so the AI always knows which step it stopped at.
Memory Bank
An “external memory” for the AI that stores:
- Current context – what we’re working on now
- Progress – what’s already done
- Research – findings and references
- Archives – completed tasks
When the AI “forgets” something, it can pull the needed information from the Memory Bank and restore its understanding.
Model Combination
We split the work between two models:
| Model | Role | Reason |
|---|---|---|
| Claude Opus 4.5 | Architect – creates plans, writes specs, conducts reviews | 200 K token window is enough for planning |
| Gemini 3 Flash | Executor – implements code according to the plan | 1 M token window lets it work for hours without losing the thread |
flowchart LR
A[Task] --> B[Opus: Planning]
B --> C[Opus: Spec / TZ]
C --> D[Gemini: Implementation]
D --> E[Opus: Review]
E -->|Issues| D
E -->|OK| F[Done]
Cycle: Opus plans → Gemini implements → Opus reviews.
Project Statistics (1.5 weeks on feature/msw-mocks)
| Metric | Value |
|---|---|
| Commits | 425 |
| Files changed | 672 |
| Lines added | +85 000 |
| Lines removed | –11 000 |
| Tests added | ~200 |
| Tokens spent | 1.5 billion |
Implemented features
- ✅ Full MSW mocking system (50+ handlers)
- ✅ Schedule timeline with Gantt chart
- ✅ Quality Gates (ESLint, TypeScript, Husky)
- ✅ Beads integration
- ✅ 200+ unit tests
Comparison: Traditional Development vs. AI‑Assisted Development
| Parameter | Traditional | With AI |
|---|---|---|
| Time per feature | 2‑3 days | 1.5 weeks* |
| Code quality | Depends on developer | High (Quality Gates) |
| Tests | Often skipped | 200+ automatically |
| Documentation | Often none | Generated |
*Includes infrastructure setup, learning curve, and all the “rakes.”
Important nuance: The first feature is expensive. We spent 1.5 weeks learning the workflow, setting up rules, and stepping on rakes. Subsequent features take ≈10× less time.
Role Evolution
- Analyst – no longer just “writes specs.” Becomes a junior developer who must understand SQL, work with Git, and read code at a basic level.
- Developer – no longer just “writes code.” Becomes an architect who focuses on design patterns rather than language syntax. AI can write in Java, Node.js, Python, Go, etc., so developers become universal specialists.
Conclusions & Recommendations
What Works
- Opus + Gemini combination – smart architect + fast executor
- Quality Gates – stricter constraints → better results
- Two‑level tracking – Jira for the team, Beads for the AI
- Memory Bank – external memory to avoid losing context
- Data mocking – complete development autonomy
What Doesn’t Work
- Auto‑mode for model selection
- AI without constraints (it will “fix” the entire project)
- Models with context windows < 1 M tokens for Enterprise work
Checklist for Getting Started
- Set up local development environment
- Implement Quality Gates (ESLint, strict TypeScript)
- Create a data‑mocking system
- Connect MCP (Jira, Context7, Memory Bank)
- Train analysts on Git and SQL
- Choose the right models (Opus + Gemini)
Final Thoughts
The battle for context isn’t won yet. Context windows keep growing, but Enterprise projects remain too large for an AI to see them in full. We therefore need systems that help AI maintain focus: task trackers, Memory Bank, and Quality Gates.
We spent 1.5 billion tokens to reach these insights. Hopefully our experience helps you spend far fewer.
What’s your experience with AI coding in large projects? Share in the comments!
About the Author
Working on UI with React. Tools: Cursor IDE, Claude Opus 4.5, Gemini 3 Flash.
Tags: ai cursor enterprise programming devjournal