将知识蒸馏到小型 LLM

发布: 4天前 (2026年1月16日 GMT+8 01:35)

4 min read

原文: Dev.to

Source: Dev.to

Distilling Knowledge into Tiny LLMs

大型语言模型（LLM）是人工智能背后的魔法。这些拥有数十亿甚至数万亿参数的庞大模型在接受足够数据训练后能够很好地进行泛化。

一个主要问题是它们难以运行且成本高昂，因此许多开发者通过 OpenAI、Claude 等 API 调用 LLM。实际上，开发者还花费大量时间编写复杂的提示逻辑来覆盖边缘情况，认为必须使用巨大的模型才能处理所有规则。

如果你真的想掌控业务流程，运行本地模型是更好的选择。好消息是，它不必是一个拥有数十亿参数的庞然大物。通过微调一个更小的 LLM，你可以处理特定的业务逻辑，降低提示的复杂度，并且所有内容都可以在内部完成。

本文展示了如何将知识蒸馏到小型 LLM 中。

安装依赖

安装 txtai 及所需的库：

pip install txtai[pipeline-train] datasets

LLM

我们将在本示例中使用一个 600 M 参数的 Qwen‑3 模型。目标任务是将用户请求翻译为 Linux 命令。

from txtai import LLM

llm = LLM("Qwen/Qwen3-0.6B")

测试基础模型

llm("""
Translate the following request into a linux command. Only print the command.

Find number of logged in users
""", maxlength=1024)

输出

ps -e

模型能够理解请求，但生成的命令不正确。让我们通过微调来改进它。

Source: …

使用知识微调 LLM

即使是 600 M 的模型，也可以通过蒸馏特定领域的知识得到提升。我们将使用 Hugging Face 上的 Linux commands dataset 以及 txtai 的训练管道。

创建训练数据集

"""
Translate the following request into a linux command. Only print the command.

{user request}
"""

from datasets import load_dataset
from transformers import AutoTokenizer

# Model path
path = "Qwen/Qwen3-0.6B"
tokenizer = AutoTokenizer.from_pretrained(path)

# Load the training dataset
dataset = load_dataset("mecha-org/linux-command-dataset", split="train")

def prompt(row):
    text = tokenizer.apply_chat_template([
        {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
        {"role": "user", "content": row["input"]},
        {"role": "assistant", "content": row["output"]}
    ], tokenize=False, enable_thinking=False)

    return {"text": text}

# Map to training prompts
train = dataset.map(prompt, remove_columns=["input", "output"])

训练模型

from txtai.pipeline import HFTrainer

trainer = HFTrainer()

model = trainer(
    "Qwen/Qwen3-0.6B",
    train,
    task="language-generation",
    maxlength=512,
    bf16=True,
    per_device_train_batch_size=4,
    num_train_epochs=1,
    logging_steps=50,
)

评估微调模型

from txtai import LLM

llm = LLM(model)

# Example 1
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Find number of logged in users"}
])

输出

who | wc -l

# Example 2
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "List the files in my home directory"}
])

输出

ls ~/

# Example 3
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Zip the data directory with all its contents"}
])

输出

zip -r data.zip data

模型在没有显式系统提示的情况下也能工作：

llm("Calculate the total amount of disk space used for my home directory. Only print the total.")

输出

du -sh ~

总结

本文展示了使用 txtai 将知识蒸馏到 LLM（大型语言模型）是多么直接。你并不总是需要一个巨大的模型——花一点时间微调一个小型 LLM 可能非常值得。

将知识蒸馏到小型 LLM

安装依赖

LLM

测试基础模型

使用知识微调 LLM

创建训练数据集

训练模型

评估微调模型

总结

相关文章

Rapg：基于 TUI 的密钥管理器

技术是赋能者，而非救世主

行业调查：编码更快，调试更慢

踏入 agentic coding