Google’s Cloud AI lead on the three frontiers of model capability
Source: TechCrunch
As a product VP at Google Cloud, Michael Gerstenhaber works mostly on Vertex, the company’s unified platform for deploying enterprise AI. It gives him a high‑level view of how companies are actually using AI models, and what still needs to be done to unleash the potential of agentic AI.
When I spoke with Michael, I was particularly struck by one idea I hadn’t heard before. As he put it, AI models are pushing against three frontiers at once: raw intelligence, response time, and a third quality that has less to do with raw capability than with cost — whether a model can be deployed cheaply enough to run at massive, unpredictable scale. It’s a new way of thinking about model capabilities, and a particularly valuable one for anyone trying to push frontier models in a new direction.
This interview has been edited for length and clarity.
Interview
Why don’t you start by walking us through your experience in AI so far, and what you do at Google?
I’ve been in AI for about two years now. I was at Anthropic for a year and a half, and I’ve been at Google almost half a year now. I run Vertex, Google’s developer platform. Most of our customers are engineers building their own applications. They want access to agentic patterns, an agentic platform, and the inference of the smartest models in the world. I provide them that, but I don’t provide the applications themselves—that’s up to Shopify, Thomson Reuters, and our various customers in their own domains.
What drew you to Google?
Google is, I think, unique in the world because we have everything from the interface to the infrastructure layer. We can build data centers, buy electricity, and even build power plants. We have our own chips, our own model, the inference layer we control, and the agentic layer we control. We offer APIs for memory and interleaved code writing, an agent engine that ensures compliance and governance, and chat interfaces with Gemini Enterprise and Gemini Chat for consumers. That vertical integration felt like a major strength.
It feels like all three of the big labs are really close in capabilities. Is it just a race for more intelligence, or is it more complicated than that?
I see three boundaries:
-
Raw intelligence – Models like Gemini Pro are tuned for raw intelligence. For tasks such as writing code, you want the best possible output, even if it takes a while, because you’ll eventually need to maintain and deploy it.
-
Latency – In use‑cases like customer support, you need intelligence to apply a policy (e.g., processing a return or upgrading a seat) but you also have a strict latency budget. If the answer takes 45 minutes, the user will have hung up long before the response arrives, so the most intelligent model within the latency budget is preferred.
-
Cost at scale – Companies like Reddit or Meta need to moderate massive amounts of content. They have large budgets, but they can’t take enterprise‑level risk on a model that won’t scale predictably. They must choose the highest‑intelligence model they can afford to run at virtually unlimited scale, making cost a critical factor.
Why are agentic systems taking so long to catch on despite impressive demos?
The technology is only about two years old, and there’s still a lot of missing infrastructure. We lack robust patterns for auditing what agents do and for authorizing data access by agents. Production‑ready patterns take time to develop, and production adoption always lags behind what the technology can theoretically achieve.
That said, adoption has been unusually fast in software engineering because it fits neatly into the software development lifecycle. Google’s dev environment allows safe experimentation, and code moves from dev to test only after multiple human‑in‑the‑loop audits. This low‑risk, high‑audit process accelerates deployment within software teams, but similar patterns need to be created for other professions and domains.