Distilling Knowledge into Tiny LLMs

Published: (January 15, 2026 at 12:35 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Distilling Knowledge into Tiny LLMs

Large Language Models (LLMs) are the magic behind AI. These massive billion‑ and trillion‑parameter models generalize well when trained on enough data.

A big problem is that they are hard to run and expensive, so many developers call LLMs through APIs such as OpenAI or Claude. In practice, developers also spend a lot of time crafting complex prompt logic to cover edge cases, believing they need a huge model to handle all the rules.

If you truly want control over your business processes, running a local model is a better choice. The good news is that it doesn’t have to be a multi‑billion‑parameter beast. By fine‑tuning a smaller LLM you can handle specific business logic, reduce prompt complexity, and keep everything in‑house.

This article shows how to distill knowledge into tiny LLMs.

Install dependencies

Install txtai and the required libraries:

pip install txtai[pipeline-train] datasets

The LLM

We’ll use a 600 M parameter Qwen‑3 model for this example. The target task is translating user requests into Linux commands.

from txtai import LLM

llm = LLM("Qwen/Qwen3-0.6B")

Test the base model

llm("""
Translate the following request into a linux command. Only print the command.

Find number of logged in users
""", maxlength=1024)

Output

ps -e

The model understands the request but the command isn’t correct. Let’s improve it through fine‑tuning.

Finetuning the LLM with knowledge

Even a 600 M model can be enhanced by distilling domain‑specific knowledge. We’ll use the Linux commands dataset from Hugging Face and txtai’s training pipeline.

Create the training dataset

"""
Translate the following request into a linux command. Only print the command.

{user request}
"""
from datasets import load_dataset
from transformers import AutoTokenizer

# Model path
path = "Qwen/Qwen3-0.6B"
tokenizer = AutoTokenizer.from_pretrained(path)

# Load the training dataset
dataset = load_dataset("mecha-org/linux-command-dataset", split="train")

def prompt(row):
    text = tokenizer.apply_chat_template([
        {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
        {"role": "user", "content": row["input"]},
        {"role": "assistant", "content": row["output"]}
    ], tokenize=False, enable_thinking=False)

    return {"text": text}

# Map to training prompts
train = dataset.map(prompt, remove_columns=["input", "output"])

Train the model

from txtai.pipeline import HFTrainer

trainer = HFTrainer()

model = trainer(
    "Qwen/Qwen3-0.6B",
    train,
    task="language-generation",
    maxlength=512,
    bf16=True,
    per_device_train_batch_size=4,
    num_train_epochs=1,
    logging_steps=50,
)

Evaluate the fine‑tuned model

from txtai import LLM

llm = LLM(model)

# Example 1
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Find number of logged in users"}
])

Output

who | wc -l
# Example 2
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "List the files in my home directory"}
])

Output

ls ~/
# Example 3
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Zip the data directory with all its contents"}
])

Output

zip -r data.zip data

The model also works without the explicit system prompt:

llm("Calculate the total amount of disk space used for my home directory. Only print the total.")

Output

du -sh ~

Wrapping up

This article demonstrated how straightforward it is to distill knowledge into LLMs using txtai. You don’t always need a giant model—spending a little time fine‑tuning a tiny LLM can be well worth the effort.

Back to Blog

Related posts

Read more »

Rapg: TUI-based Secret Manager

We've all been there. You join a new project, and the first thing you hear is: > 'Check the pinned message in Slack for the .env file.' Or you have several .env...

Technology is an Enabler, not a Saviour

Why clarity of thinking matters more than the tools you use Technology is often treated as a magic switch—flip it on, and everything improves. New software, pl...