A Guide to Fine-Tuning FunctionGemma
Source: Google Developers Blog
Why Fine‑Tune for Tool Calling?
If FunctionGemma already supports tool calling, why is fine‑tuning necessary?
The answer lies in context and policy. A generic model doesn’t know your business rules. Common reasons to fine‑tune include:
-
Resolving selection ambiguity
When a user asks, “What is the travel policy?”, a base model might default to a public Google search. An enterprise‑tuned model should instead query the internal knowledge base. -
Ultra‑specialization
Train the model to master niche tasks or proprietary formats that aren’t present in public data—for example, handling domain‑specific mobile actions (controlling device features) or parsing internal APIs to generate complex regulatory reports. -
Model distillation
Use a large model to generate synthetic training data, then fine‑tune a smaller, faster model to run that specific workflow efficiently.
The Case Study: Internal Docs vs. Google Search
Let’s look at a practical example from the technical guide on fine‑tuning FunctionGemma using the Hugging Face TRL library.
The Challenge
The goal was to train a model to distinguish between two specific tools:
search_knowledge_base– internal documentssearch_google– public information
When asked “What are the best practices for writing a simple recursive function in Python?” a generic model defaults to Google.
For a query like “What is the reimbursement limit for travel meals?” the model must know that this is an internal‑policy question.
The Solution
To evaluate performance we used the bebechien/SimpleToolCalling dataset, which contains sample conversations that require a choice between the two tools above.
The dataset is split into training and testing sets. Keeping the test set separate lets us evaluate the model on unseen data, ensuring it learns the underlying routing logic rather than merely memorising examples.
When we evaluated the base FunctionGemma model with a 50 / 50 split between training and testing, the results were sub‑optimal: the base model chose the wrong tool or offered to “discuss” the policy instead of executing the function call.
⚠️ A Critical Note on Data Distribution
How you split your data is just as important as the data itself.
from datasets import load_dataset
# Load the raw dataset
dataset = load_dataset("bebechien/SimpleToolCalling", split="train")
# Convert to conversational format
dataset = dataset.map(
create_conversation,
remove_columns=dataset.features,
batched=False,
)
# 50 % train / 50 % test split (no shuffling)
dataset = dataset.train_test_split(test_size=0.5, shuffle=False)
In this case study the guide used a 50 / 50 split with shuffle=False because the original dataset is already shuffled.
Warning: If your source data is ordered by category (e.g., all
search_googleexamples first, then allsearch_knowledge_base), disabling shuffling will train the model on one tool only and test it on the other, leading to catastrophic performance.
Best practice:
- Ensure your source data is pre‑mixed, or
- Set
shuffle=Truewhen the ordering is unknown, so the model sees a balanced representation of all tools during training.
The Result
The model was fine‑tuned with SFTTrainer (Supervised Fine‑Tuning) for 8 epochs. The training data explicitly taught the model which queries belong to which domain.

The graph shows loss (error rate) decreasing over time. The sharp drop at the beginning indicates rapid adaptation to the new routing logic.
After fine‑tuning, the model’s behavior changed dramatically. It now adheres strictly to the enterprise policy. For example, when asked “What is the process for creating a new Jira project?” the fine‑tuned model correctly emits:
call:search_knowledge_base{query:Jira project creation process}
Introducing the FunctionGemma Tuning Lab
Not everyone wants to manage Python dependencies, configure SFTConfig, or write training loops from scratch. Introducing the FunctionGemma Tuning Lab.

The FunctionGemma Tuning Lab is a user‑friendly demo hosted on Hugging Face Spaces. It streamlines the entire process of teaching the model your specific function schemas.
Key Features
- No‑Code Interface – Define function schemas (JSON) directly in the UI; no Python scripts required.
- Custom Data Import – Upload a CSV containing your User Prompt, Tool Name, and Tool Arguments.
- One‑Click Fine‑Tuning – Adjust learning rate and epochs with sliders and start training instantly. Default settings work well for most use cases.
- Real‑Time Visualization – Watch training logs and loss curves update live to monitor convergence.
- Auto‑Evaluation – The lab automatically evaluates performance before and after training, giving immediate feedback on improvements.
Getting Started with the Tuning Lab
To run the lab locally, clone the repository with the Hugging Face CLI and start the app:
hf download google/functiongemma-tuning-lab --repo-type=space --local-dir=functiongemma-tuning-lab
cd functiongemma-tuning-lab
pip install -r requirements.txt
python app.py
Now you can experiment with fine‑tuning FunctionGemma without writing any code!
Conclusion
Whether you choose to write your own training script using TRL or to use the demo visual interface of the FunctionGemma Tuning Lab, fine‑tuning is the key to unlocking the full potential of FunctionGemma. It transforms a generic assistant into a specialized agent capable of:
- Adhering to strict business logic
- Handling complex, proprietary data structures
Thanks for reading!