A Guide to Fine-Tuning FunctionGemma

Published: 3 days ago (February 18, 2026 at 07:54 PM EST)

5 min read

Source: Google Developers Blog

Jan 16, 2026

In the world of Agentic AI, the ability to call tools is what translates natural language into executable software actions. Last month, we released FunctionGemma, a specialized version of our Gemma 3 270M model explicitly fine‑tuned for function calling. It is designed for developers building fast and cost‑effective agents that translate natural language into executable API actions.

Specific applications often require specialist models. In this post, we demonstrate how to fine‑tune FunctionGemma to handle tool‑selection ambiguity—when a model must choose between one or more seemingly similar functions to call. We also introduce the FunctionGemma Tuning Lab, a demo tool that makes this process accessible without writing a single line of training code.

Why Fine‑Tune for Tool Calling?

If FunctionGemma already supports tool calling, why is fine‑tuning necessary?

The answer lies in context and policy. A generic model doesn’t know your business rules. Common use cases for fine‑tuning include:

Resolving selection ambiguity – When a user asks, “What is the travel policy?”, a base model might default to a Google search. An enterprise model, however, should search the internal knowledge base.
Ultra‑specialization – Train a model to master niche tasks or proprietary formats not found in public data, such as handling domain‑specific mobile actions (e.g., controlling device features) or parsing internal APIs to construct highly complex regulatory reports.
Model distillation – Use a large model to generate synthetic training data, then fine‑tune a smaller, faster model to run that specific workflow efficiently.

The Case Study: Internal Docs vs. Google Search

Let’s look at a practical example from the technical guide on fine‑tuning FunctionGemma using the Hugging Face TRL library.

The Challenge

The goal was to train a model to distinguish between two specific tools:

search_knowledge_base – internal documents
search_google – public information

When asked “What are the best practices for writing a simple recursive function in Python?” a generic model defaults to Google.
When asked “What is the reimbursement limit for travel meals?” the model needs to know that this is an internal‑policy question.

The Solution

To evaluate performance we used the bebechien/SimpleToolCalling dataset, which contains sample conversations that require a choice between the two tools above.

The dataset is split into training and testing sets. Keeping the test set separate lets us evaluate the model on unseen data, ensuring it learns the underlying routing logic rather than merely memorising examples.

⚠️ A Critical Note on Data Distribution

How you split your data is just as important as the data itself.

from datasets import load_dataset

# Load the raw dataset
dataset = load_dataset("bebechien/SimpleToolCalling", split="train")

# Convert to conversational format
dataset = dataset.map(
    create_conversation,
    remove_columns=dataset.features,
    batched=False,
)

# 50 % training / 50 % testing split (no shuffling)
dataset = dataset.train_test_split(test_size=0.5, shuffle=False)

In the guide a 50/50 split with shuffle=False was used deliberately to highlight the model’s improvement on a large volume of unseen data.

Warning: If your source data is ordered by category (e.g., all search_google examples first, then all search_knowledge_base examples), disabling shuffling will train the model on one tool only and test it on the other, leading to catastrophic performance.

Best practice:

Ensure the source data is already mixed, or
Set shuffle=True when the ordering is unknown, so the model sees a balanced representation of all tools during training.

The Result

The model was fine‑tuned with SFTTrainer (Supervised Fine‑Tuning) for 8 epochs. The training data explicitly taught the model which queries belong to which domain.

Training loss curve
The loss (error rate) drops sharply at the start, indicating rapid adaptation to the new routing logic.

After fine‑tuning, the model’s behaviour changed dramatically. It now adheres strictly to the enterprise policy. For example, when asked:

“What is the process for creating a new Jira project?”

the fine‑tuned model correctly emits:

call:search_knowledge_base{query:Jira project creation process}

Thus, with proper data handling and targeted fine‑tuning, the model reliably routes queries to the appropriate internal or external knowledge source.

Introducing the FunctionGemma Tuning Lab

Not everyone wants to manage Python dependencies, configure SFTConfig, or write training loops from scratch. Introducing the FunctionGemma Tuning Lab.

screenshot

The FunctionGemma Tuning Lab is a user‑friendly demo hosted on Hugging Face Spaces. It streamlines the entire process of teaching the model your specific function schemas.

Key Features

No‑Code Interface – Define function schemas (JSON) directly in the UI, no Python scripts required.
Custom Data Import – Upload a CSV containing your User Prompt, Tool Name, and Tool Arguments.
One‑Click Fine‑Tuning – Adjust learning rate and epochs with sliders and start training instantly. Default settings work well for most use cases.
Real‑Time Visualization – Watch training logs and loss curves update live to monitor convergence.
Auto‑Evaluation – The lab automatically evaluates performance before and after training, giving immediate feedback on improvements.

Getting Started with the Tuning Lab

To run the lab locally, clone the repository with the Hugging Face CLI and start the app:

hf download google/functiongemma-tuning-lab --repo-type=space --local-dir=functiongemma-tuning-lab
cd functiongemma-tuning-lab
pip install -r requirements.txt
python app.py

That’s it—you’re ready to fine‑tune FunctionGemma with your own function schemas!

Conclusion

Whether you choose to write your own training script using TRL or to use the demo visual interface of the FunctionGemma Tuning Lab, fine‑tuning is the key to unlocking the full potential of FunctionGemma. It transforms a generic assistant into a specialized agent capable of:

Adhering to strict business logic
Handling complex, proprietary data structures

Thanks for reading!

References

Blog post

FunctionGemma: Bringing bespoke function calling to the edge

Code examples

Hugging Face Space

FunctionGemma Tuning Lab

A Guide to Fine-Tuning FunctionGemma

Jan 16, 2026

Why Fine‑Tune for Tool Calling?

The Case Study: Internal Docs vs. Google Search

The Challenge

The Solution

⚠️ A Critical Note on Data Distribution

The Result

Introducing the FunctionGemma Tuning Lab

Key Features

Getting Started with the Tuning Lab

Conclusion

References

Blog post

Code examples

Hugging Face Space

Related posts

Tailor Gemini CLI to your workflow with hooks

Access public data insights faster: Data Commons MCP is now hosted on Google Cloud

Conductor Update: Introducing Automated Reviews

Easy FunctionGemma finetuning with Tunix on Google TPUs

Jan 16, 2026

Why Fine‑Tune for Tool Calling?

The Case Study: Internal Docs vs. Google Search

The Challenge

The Solution

⚠️ A Critical Note on Data Distribution

The Result

Introducing the FunctionGemma Tuning Lab

Key Features

Getting Started with the Tuning Lab

Conclusion

References

Blog post

Code examples

Hugging Face Space

Related posts

Tailor Gemini CLI to your workflow with hooks

Access public data insights faster: Data Commons MCP is now hosted on Google Cloud

Conductor Update: Introducing Automated Reviews

Easy FunctionGemma finetuning with Tunix on Google TPUs

Jan 16, 2026

Hugging Face Space