What hampers IT Project executions
Source: Dev.to
Cover Image

I have published the details at .
I’ve spent over 18 years building systems. This is what I learned studying the ones that failed. It’s not about code quality; it’s about decisions made 6–12 months before the failure.
The Failures Fall Into Seven Categories
1. Execution Model Mismatch
The mistake
Reality: Customer contracts require fixed scope.
Result: Permanent scope creep, client rage, and a re‑switch to waterfall after ~18 months.
Case – A manufacturing‑software company used fixed‑price contracts together with Scrum. Features could not be delivered within a sprint because the scope kept changing. When they tried to cap scope, customers replied, “That wasn’t in the original agreement.” It took them two years to realize Scrum doesn’t work with a fixed‑scope business model.
The better approach
| Business model | Recommended delivery model |
|---|---|
| Fixed‑scope contracts | Waterfall or hybrid (plan‑driven + iterative delivery) |
| Outcome‑based (SaaS, platform) | Agile |
| Scope changes mid‑project | Rolling‑wave + time‑and‑materials |
| AI products | Experimentation‑driven (test + learn rapidly) |
The framework matters. The business model determines the framework.
2. Estimation Theater
The mistake
Choosing story points, function points, T‑shirt sizing, or AI‑based estimation makes everyone feel confident. Six months later velocity drops 40 % and the team hits integration hell.
Why every method fails – They conflate three distinct dimensions:
- Effort – How many engineer‑hours?
- Complexity – How many unknowns?
- Execution risk – What could go wrong?
Example – Microservices migration
| Service range | Original estimate | Actual effort |
|---|---|---|
| 1‑5 | 13 sp each | Completed in 3 months |
| 6‑10 | 13 sp each | Took 6 months (integration complexity) |
| 11‑25 | 13 sp each | 3‑4 months each (cascading API dependencies) |
Effort stayed consistent; complexity exploded.
The better approach – Estimate each dimension separately and add a contingency based on complexity/risk.
- **Effort**: 120 engineer‑hours
- **Complexity**: Low (5 unknowns) / Medium (15 unknowns) / High (40+ unknowns)
- **Risk**: Technical, integration, vendor, talent
- **Contingency**: +30‑50 % buffer (depending on complexity/risk)
Commit to the effort, not the timeline.
3. Testing Confidence
The mistake
Team metric: 80 % code coverage + shift‑left testing + AI‑generated tests.
Production metric: Same incident rate as two years ago.
Why metrics lie
- Code coverage only tells you “did the code run?”, not “does it work?”
- Shift‑left testing focuses on happy paths; production sees 99 % edge cases.
- AI‑generated tests inherit the same blind spots as human‑written tests (wrong data, wrong scale).
Case – Retail company with 85 % automation coverage. On Black Friday:
- Tests ran on clean data with stable traffic.
- Real traffic spiked 100×, cache invalidation occurred, connection‑pool exhausted.
- Production was down for 6 hours → $5 M lost revenue.
The better approach – Switch the question from “does it work?” to “when does it break?” – i.e., chaos engineering and failure injection.
# Instead of this simple happy‑path test:
def test_user_creation():
user = create_user("john@example.com", "password123")
assert user.id is not None
# Do this:
def test_user_creation_under_load():
# What breaks at 1 000 req/sec?
load_test(create_user, requests=1000)
def test_user_creation_with_slow_db():
# What if the DB is slow?
slow_db_connection()
user = create_user("john@example.com", "password123")
assert user.id is not None # or does it timeout?
def test_user_creation_concurrent_writes():
# What if duplicate emails hit simultaneously?
concurrent(lambda: create_user("john@example.com", "pass"))
assert no_duplicates()
Teams that adopt this style see ~70 % fewer incidents.
4. AI‑Generated Code Debt
The mistake
| Timeline | Observation |
|---|---|
| Month 1 | “We’re using Copilot! Velocity +30 %!” |
| Month 6 | Code reviews become impossible, bugs double, refactoring nightmare. |
| Month 9 | Code‑base cleanup takes 2 months → productivity goes negative. |
Why it happens – AI writes code that works for the happy path but often lacks error handling, type checking, and introduces hidden bugs.
Example generated code
def process_orders(orders):
results = []
for order in orders:
price = order['price'] * order['quantity']
if price > 1000:
price = price * 0.9 # 10% discount
results.append({'order_id': order['id'], 'total': price})
return results
Works immediately but:
- No error handling (e.g., missing keys).
- No type checking.
- Discount logic duplicated elsewhere → technical debt (47 places now contain discount code).
The better approach
| Use AI for | Avoid AI for |
|---|---|
| Low‑stakes code (utilities, boilerplate, tests) | Decision‑making code (architecture, core algorithms) |
| Well‑defined problems (implement spec, not design) | Novel problems (no known solution) |
| Repetitive patterns (CRUD endpoints, error handlers) | Critical paths (security, payment processing, data integrity) |
Measure
- Code‑quality metrics (cyclomatic complexity, test coverage)
- Bug density (bugs per 1 000 LOC)
- Maintenance cost (refactoring hours per quarter)
5. Observability Theater
The mistake
Team collects 10 000 metrics but actually uses only 3; the rest sit unused in dashboards.
Why engineers ignore observability
- Over‑loaded dashboards make it hard to find the signal.
The better approach – Keep dashboards lean and actionable:
- Define clear SLOs/SLIs.
- Select 5‑7 key metrics that directly indicate health (e.g., error rate, latency p95, CPU usage, request volume, queue depth).
- Group related metrics into a single visual.
- Add alerts only on thresholds that matter; avoid alert fatigue.
When engineers can answer “Is the system healthy?” quickly, they can act faster.
Engineer workflow example
tail -f /var/log/app.log | grep "slow"
- Finds the problem in 2 minutes
- Dashboard says “all systems nominal”
- Log shows: “database query took 45 seconds”
Design observability for decisions, not just data collection.
Ask: What does the CTO need to know to make a decision?
Answer – 5 essential questions:
- Is this an outage? (yes/no)
- How many users are affected? (number)
- What’s broken? (service name, error type)
- What’s the blast radius? (cascading failures?)
- Can we roll back? (undo last deploy?)
Build dashboards that answer those five items. Everything else is optional.
Case Study: Media Company – Alert Fatigue
- Built “state‑of‑the‑art” observability.
- Alert fatigue killed adoption.
- 45‑minute outage: alerts fired, nobody saw them.
- Cost: $2 M.
Organization Transformation Gravity
The mistake
You modernize the tech stack:
- Microservices? ✓
- DevOps? ✓
- Cloud? ✓
But you keep the 1995 org structure:
- Approval process: 12 months
- Hiring: 3 months per person
- Team structure: Silos by function
Result: DevOps can deploy in 5 minutes but must wait 12 months for approval.
Real‑world example: Telecom Company
- Modernized to DevOps + microservices, hired “fancy” architects, built beautiful infrastructure.
- But:
- Feature requests still routed through a legacy billing‑system approval process.
- Billing system sold as “modules” (no flexibility).
- Customer contracts were 12‑month fixed scope.
Agile dev team clashed with fixed‑scope requirements.
Outcome (3 years later):
- $50 M spent, zero improvement in time‑to‑market, three leadership changes.
The better approach
Transform the organization first—technology follows, not the reverse.
Ask:
- How fast can we make decisions? (1 week? 1 month? 1 quarter?)
- How much autonomy do teams have? (full? subject to approval?)
- How aligned are incentives? (shipping fast? cost control? risk aversion?)
- Can we move people between teams? (reorg cost? retention risk?)
Legacy gravity is stronger than new technology. You can’t microservice your way out of broken incentives.
Vendor Lock‑In Optimized as Innovation
The mistake
Choose a “leading vendor, industry‑standard, great case studies.”
Vendor optimizes for:
- Lock‑in (proprietary APIs, custom languages, high switching cost)
- Roadmap driven by highest‑paying customers (usually not you)
Vendor gets acquired → product line killed.
Result: Stuck with $5 M+ switching cost or forced to rewrite.
Real‑world example: Fintech Company
- Chose a vendor for “core platform.”
- Vendor messaging:
- “Built for fintech”
- “Enterprise‑grade”
- “Zero‑downtime deployments”
3 years later:
- Vendor acquired → new owner killed product line.
- Migration required a complete rewrite.
- Cost: $8 M, 18 months, three departures.
The better approach
Assume vendors will disappoint.
- Use open standards (SQL, REST, standard frameworks).
- Own critical data flows (never let a vendor own your data).
- Keep switching costs low (avoid proprietary APIs).
- Plan the exit (what would it cost to migrate?).
- Diversify risk (use multiple vendors where possible).
The Decision Framework That Actually Works
| Step | What to Do | Focus |
|---|---|---|
| 1. Name the decision | Vendor, execution model, tech stack, scaling, migration strategy, etc. | Clarify scope |
| 2. Model primary consequences (6–18 months) | • Engineering effort • Operational burden • Learning curve • Cost trajectory | Immediate impact |
| 3. Model secondary consequences (18–36 months) | • Migration cost (undo difficulty) • Vendor risk (acquisition, shutdown) • Talent risk (hiring/retention) • Organizational risk (cultural change) | Near‑future risk |
| 4. Model tertiary consequences (3+ years) | • Lock‑in (stuck forever?) • Scaling limits (break points) • Obsolescence (outdated in 5 years?) • Opportunity cost (what can’t we do?) | Long‑term outlook |
| 5. Decide transparently | Choose the best trade‑off given constraints, not “the best” in absolute. Document the rationale. | Future accountability |
Six months from now, you’ll need to remember why you made this call.
Test Your Own Judgment
I built a simple simulator that walks you through this framework in 2 minutes.
- Try it:
- No login. No signup. Just test yourself.
