Distilling Knowledge into Tiny LLMs

Published: (January 15, 2026 at 12:35 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Distilling Knowledge into Tiny LLMs

Large Language Models (LLMs) are the magic behind AI. These massive billion‑ and trillion‑parameter models generalize well when trained on enough data.

A big problem is that they are hard to run and expensive, so many developers call LLMs through APIs such as OpenAI or Claude. In practice, developers also spend a lot of time crafting complex prompt logic to cover edge cases, believing they need a huge model to handle all the rules.

If you truly want control over your business processes, running a local model is a better choice. The good news is that it doesn’t have to be a multi‑billion‑parameter beast. By fine‑tuning a smaller LLM you can handle specific business logic, reduce prompt complexity, and keep everything in‑house.

This article shows how to distill knowledge into tiny LLMs.

Install dependencies

Install txtai and the required libraries:

pip install txtai[pipeline-train] datasets

The LLM

We’ll use a 600 M parameter Qwen‑3 model for this example. The target task is translating user requests into Linux commands.

from txtai import LLM

llm = LLM("Qwen/Qwen3-0.6B")

Test the base model

llm("""
Translate the following request into a linux command. Only print the command.

Find number of logged in users
""", maxlength=1024)

Output

ps -e

The model understands the request but the command isn’t correct. Let’s improve it through fine‑tuning.

Finetuning the LLM with knowledge

Even a 600 M model can be enhanced by distilling domain‑specific knowledge. We’ll use the Linux commands dataset from Hugging Face and txtai’s training pipeline.

Create the training dataset

"""
Translate the following request into a linux command. Only print the command.

{user request}
"""
from datasets import load_dataset
from transformers import AutoTokenizer

# Model path
path = "Qwen/Qwen3-0.6B"
tokenizer = AutoTokenizer.from_pretrained(path)

# Load the training dataset
dataset = load_dataset("mecha-org/linux-command-dataset", split="train")

def prompt(row):
    text = tokenizer.apply_chat_template([
        {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
        {"role": "user", "content": row["input"]},
        {"role": "assistant", "content": row["output"]}
    ], tokenize=False, enable_thinking=False)

    return {"text": text}

# Map to training prompts
train = dataset.map(prompt, remove_columns=["input", "output"])

Train the model

from txtai.pipeline import HFTrainer

trainer = HFTrainer()

model = trainer(
    "Qwen/Qwen3-0.6B",
    train,
    task="language-generation",
    maxlength=512,
    bf16=True,
    per_device_train_batch_size=4,
    num_train_epochs=1,
    logging_steps=50,
)

Evaluate the fine‑tuned model

from txtai import LLM

llm = LLM(model)

# Example 1
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Find number of logged in users"}
])

Output

who | wc -l
# Example 2
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "List the files in my home directory"}
])

Output

ls ~/
# Example 3
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Zip the data directory with all its contents"}
])

Output

zip -r data.zip data

The model also works without the explicit system prompt:

llm("Calculate the total amount of disk space used for my home directory. Only print the total.")

Output

du -sh ~

Wrapping up

This article demonstrated how straightforward it is to distill knowledge into LLMs using txtai. You don’t always need a giant model—spending a little time fine‑tuning a tiny LLM can be well worth the effort.

Back to Blog

Related posts

Read more »

𝗗𝗲𝘀𝗶𝗴𝗻𝗲𝗱 𝗮 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻‑𝗥𝗲𝗮𝗱𝘆 𝗠𝘂𝗹𝘁𝗶‑𝗥𝗲𝗴𝗶𝗼𝗻 𝗔𝗪𝗦 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗘𝗞𝗦 | 𝗖𝗜/𝗖𝗗 | 𝗖𝗮𝗻𝗮𝗿𝘆 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁𝘀 | 𝗗𝗥 𝗙𝗮𝗶𝗹𝗼𝘃𝗲𝗿

!Architecture Diagramhttps://dev-to-uploads.s3.amazonaws.com/uploads/articles/p20jqk5gukphtqbsnftb.gif I designed a production‑grade multi‑region AWS architectu...