Closing the knowledge gap with agent skills
Source: Google Developers Blog
Large language models (LLMs) have fixed knowledge, being trained at a specific point in time. Software engineering practices are fast‑paced and change often, with new libraries launched every day and best practices evolving quickly.
This leaves a knowledge gap that language models can’t solve on their own. At Google DeepMind we see this in a few ways: our models don’t know about themselves when they’re trained, and they aren’t necessarily aware of subtle changes in best practices (like thought circulation) or SDK updates.
Many solutions exist, from web‑search tools to dedicated MCP services, but more recently, agent skills have surfaced as an extremely lightweight but potentially effective way to close this gap.
While there are strategies that we, as model builders, can implement, we wanted to explore what is possible for any SDK maintainer. Below is what we did to build the Gemini API developer skill and the results it had on performance.
What we built
To help coding agents building with the Gemini API, we built a skill that:
- explains the high‑level feature set of the API,
- describes the current models and SDKs for each language,
- demonstrates basic sample code for each SDK, and
- lists the documentation entry points (as sources of truth).
The skill is a basic set of primitive instructions that guide an agent toward using our latest models and SDKs, while referring to the documentation to encourage retrieving fresh information from the source of truth.
The skill is available on GitHub or can be installed directly into a project:
# Install with Vercel skills
npx skills add google-gemini/gemini-skills --skill gemini-api-dev --global
# Install with Context7 skills
npx ctx7 skills install /google-gemini/gemini-skills gemini-api-dev
Skill tester
We created an evaluation harness with 117 prompts that generate Python or TypeScript code using the Gemini SDKs. The prompts cover various categories, including agentic coding tasks, building chatbots, document processing, streaming content, and a number of specific SDK features.
Tests were run in two modes:
- Vanilla – directly prompting the model.
- With skill – the model receives the same system instruction used by the Gemini CLI (see the source) and two tools:
activate_skillandfetch_url(for downloading the docs).
A prompt is considered a failure if it uses one of our old SDKs.
Skills work, but they need reasoning

- The latest Gemini 3 series of models achieve excellent results with the addition of the
gemini-api-devskill, improving from a low baseline (6.8 % for both 3.0 Pro and Flash, 28 % for 3.1 Pro). - The older 2.5 series also benefit, but not as dramatically. Using modern models with strong reasoning support makes a noticeable difference.
All categories performed well

Adding the skill was effective across almost all domains for the top‑performing model (gemini-3.1-pro-preview).
SDK Usage had the lowest pass rate, at 95 %. The failures span a range of tasks, including some difficult or unclear requests, and notably include prompts that explicitly request Gemini 2.0 models.
Example (failed SDK‑usage prompt)
When I use the Python API with the Gemini 2.0 Flash model, and the output is quite long, the returned content is an array of output chunks instead of the whole thing. I guess it was doing some kind of streaming input. How do I turn this off and get the whole output together?
Skill issues
These initial results are encouraging, but we know from Vercel’s work that direct instruction through AGENTS.md can be more effective than using skills. Consequently, we are exploring other ways to supply live knowledge of SDKs, such as directly using MCPs for documentation.
Skill simplicity is a huge benefit, yet there isn’t a great update story—users must update manually. In the long term, stale skill information could remain in users’ workspaces, potentially causing more harm than good.
Despite these minor issues, we’re excited to start using skills in our workflows. The Gemini API skill is still fairly new, but we’ll keep it maintained as we push model updates and explore avenues for improvement. Follow Mark and Phil for updates, and don’t forget to try it out and share your feedback.