Building Tri-Fort: Why We Abandoned Pure Machine Learning and Built a Construction Intelligence Engine Instead
Source: Dev.to
Introduction
Over the last several months, I’ve been building Tri-Fort, an AI-powered construction cost estimation platform designed for the Kenyan construction industry. At first, the goal seemed straightforward: Gather historical construction data, train a machine learning model, and let AI predict project costs. Like many founders building AI products today, I assumed the machine learning model would be the product. I was wrong. The deeper I went into the construction industry, the more I realized that the biggest challenge wasn’t model selection, neural networks, or feature engineering. The challenge was data. And that realization fundamentally changed the architecture of Tri-Fort. This article documents the engineering journey, the mistakes, the discoveries, and how we evolved from an ML-first architecture into a hybrid construction intelligence platform. The first version of Tri-Fort was designed around a traditional machine learning pipeline. Users would enter: Location Project type Built-up area Number of floors Finish level Material preferences The system would then: Generate features Feed them into a regression model Return estimated construction costs The architecture looked something like this: User Input ↓ Feature Engineering ↓ ML Model ↓ Cost Prediction
Simple. At least on paper. Most machine learning tutorials assume you already have clean data. Construction doesn’t work that way. The data we had access to included: Bills of Quantities (BoQs) Work schedules Cost books Quantity Surveyor reports Project specifications Market research datasets Historical pricing documents Contractor estimates At first glance, this looked like a goldmine. In reality, it was chaos. Files existed as: PDFs Scanned PDFs Excel workbooks OCR outputs Cost schedules Multiple revisions of the same project The same project often existed in three or four versions. For example: Kiambu Mall BoQ Kiambu Mall Revised BoQ Kiambu Mall Perimeter Wall BoQ Kiambu Mall 2nd Floor Provision BoQ
To a human, these are clearly related. To a machine learning pipeline, they appear as entirely different projects. Rather than blindly train a model, we built a data discovery and audit pipeline. The pipeline performed: File inventory Project grouping Duplicate detection OCR quality assessment Cost recovery analysis Dataset readiness scoring What we found was surprising. Out of dozens of documents and thousands of extracted rows: Only 9 distinct projects were recoverable Only 2 projects contained evidence of actual final costs The remaining projects were estimates This was a critical distinction. Most datasets contained: Estimated Cost
What we actually needed was: Final Actual Cost
Those are not the same thing. Training on estimates teaches a model to reproduce estimates. It does not teach a model to predict reality. At one point, the platform appeared production-ready. The APIs worked. Authentication worked. Reporting worked. Infrastructure passed testing. Even the ML pipeline passed synthetic validation. But the data audit exposed an uncomfortable truth. The model wasn’t learning from reality. It was learning from other estimates. Shipping at that point would have created an illusion of intelligence. So deployment was paused. The machine learning model was no longer the priority. The data became the priority. While auditing the data, we acquired an official Quantity Surveying cost handbook. This changed everything. Instead of treating the handbook as a PDF, we treated it as a structured knowledge source. The handbook contained: Regional construction rates Cost benchmarks Building classifications Measurement standards Cost adjustment factors Material pricing references Suddenly we had something more valuable than a small ML dataset. We had domain expertise. The next challenge was engineering. How do you transform a static handbook into software? We built an extraction pipeline that converts handbook data into structured rules. The system identifies: Regions Rate schedules Building classes Construction categories Cost multipliers These are stored in a machine-readable rule graph. Conceptually: Handbook PDF ↓ Extraction ↓ Rule Graph ↓ Cost Intelligence Engine
Instead of hardcoding numbers throughout the application, the cost engine can now reason from structured construction knowledge. The current architecture no longer relies exclusively on machine learning. Instead it combines three intelligence sources. Official QS benchmark rates. Recovered BoQs and project data. Inputs collected through the estimator. The architecture now looks like this: User Inputs ↓ Feature Engine ↓ Handbook Intelligence ↓ Historical Cost Intelligence ↓ Cost Engine ↓ Explainable Estimate
This approach is dramatically more stable than pure ML. Construction projects involve large sums of money. Users don’t trust black boxes. If a system says: KES 18,400,000
the next question is: Why? Modern AI systems often struggle with this. Tri-Fort now generates reasoning traces. For example: Base rate: 54,000 KES/sqm Location adjustment: Nairobi +20% Luxury finish adjustment: +15% Two-storey adjustment: +8% Historical correction: -2%
Users see not only the estimate but the rationale. That transparency creates trust. Alongside the estimation engine, the platform required production-grade infrastructure. The stack includes: FastAPI PostgreSQL Domain-driven architecture Background task processing Next.js TypeScript Responsive dashboard Docker Compose Caddy HTTPS automation Environment-driven configuration Everything is configured so a VPS deployment requires only: git pull docker compose up -d —build
No code changes. No production-specific branches. No manual edits. If I could restart this project tomorrow, I’d follow three rules. Never trust dataset size. Audit it. A thousand rows can represent five projects. Domain knowledge beats machine learning when data is scarce. A handbook written by experienced Quantity Surveyors can outperform a poorly trained model. Users care about answers, not algorithms. Nobody hires a construction estimator because it uses AI. They hire it because the estimate is accurate. The long-term vision remains machine learning. But now the roadmap is grounded in reality. The next stage focuses on collecting: Final accounts Completion certificates Contractor invoices Variation orders Actual project costs As the dataset grows, machine learning can become increasingly important. Eventually the platform will evolve into a true hybrid system: Domain Knowledge + Historical Projects + Machine Learning + Human Explainability
That’s the future. Not AI replacing expertise. AI amplifying it. The biggest lesson from building Tri-Fort is that successful AI products are rarely about the model. They’re about understanding the problem deeply enough to know when a model is not the answer. For construction estimation, intelligence comes from a combination of: Engineering Quantity surveying Historical data Domain expertise Software architecture Machine learning is just one piece of that puzzle. And sometimes the smartest engineering decision is knowing when not to rely on it.