在微控制器上运行机器学习 — embml 的示例用法

发布: 2天前 (2026年3月1日 GMT+8 18:06)

10 分钟阅读

I’m sorry, but I can’t retrieve the content from external links. If you paste the text you’d like translated here, I’ll be happy to translate it into Simplified Chinese while preserving the formatting and code blocks.

概述

大多数嵌入式开发者现在已经听过 TinyML 的宣传：在 Python 中训练模型，量化后转换，烧录一个冻结的二进制块到设备上，让微控制器进行推理。模型永远不学习，也不适应——它只负责执行。

这种方式适用于一类问题，但会遗漏很多场景。

如果你的传感器在现场使用六个月后出现漂移怎么办？
如果希望设备能够针对它所连接的特定电机进行调优，而不是使用训练数据集中的通用模型怎么办？
如果根本没有服务器参与怎么办？

embml 是一个示例仓库，探索在设备本身上进行机器学习的实现方式——纯 C 实现，无动态分配，除标准库外没有外部依赖，链路中也没有任何 Python 运行时。

📦 示例仓库：

它不是生产级框架，而是一个结构良好、易于阅读的起点——一个参考，嵌入式开发者可以克隆、阅读、理解并进行改造。每个算法都用 C99 从头实现，调用者负责管理所有缓冲区。

仓库内容

该库包含八个模块，全部位于 src/ 目录：

模块	算法描述
`embml_linear`	通过 SGD 的在线线性回归
`embml_logistic`	通过 SGD 的二元逻辑回归
`embml_lms`	LMS 与归一化 LMS 自适应滤波器
`embml_rls`	带遗忘因子的递归最小二乘 (RLS)
`embml_iqr`	通过 Givens 旋转的增量 QR
`embml_nn`	前馈多层感知机 – 反向传播、Xavier 初始化、梯度裁剪
`embml_gru`	用于时间序列推断的最小化 GRU 单元
`embml_esn`	回声状态网络 – 固定 Reservoir，RLS 训练的读出层

每个模块都是一个 .c/.h 配对文件，可直接放入你的固件项目中使用。

示例用法

下面的示例展示了真实的使用情况。它们不是伪代码——可以在 ESP32、STM32F4、RP2040 和 Arduino‑Mega 级别的硬件上编译运行。

线性回归 – 设备端温度补偿

传感器读数会随板子温度线性漂移。实时、逐样本地训练校正模型，无需服务器参与。

#include "embml.h"

#define N_FEAT 2   /* [raw_reading, board_temp] → corrected_value */

float weights[N_FEAT];
LinearModel model;

void setup(void) {
    linear_init(&model, N_FEAT, 0.01f, weights);
}

void loop(void) {
    float x[N_FEAT] = { read_sensor(), read_board_temp() };
    float y_true    = read_reference();   /* calibration reference */

    /* learn from each sample — no batch needed */
    linear_update(&model, x, y_true);

    float corrected = linear_predict(&model, x);
    log_value(corrected);
}

经过几百个样本后，模型收敛到补偿曲线。无需笔记本电脑。无需 Python。设备自行学习。

逻辑回归 – 故障检测

根据两个振动特征对电机是否健康（0）或出现早期故障迹象（1）进行分类。

#include "embml.h"

#define N_FEAT 3   /* [rms_vibration, peak_freq, temp] */

float weights[N_FEAT];
LogisticModel model;

void setup(void) {
    logistic_init(&model, N_FEAT, 0.005f, weights);
}

void loop(void) {
    float x[N_FEAT] = { rms(), peak_freq(), motor_temp() };

    /* During a known‑good commissioning window, label = 0 */
    logistic_update(&model, x, 0.0f);

    /* In operation: */
    uint8_t fault = logistic_classify(&model, x);
    float   prob  = logistic_predict(&model, x);

    if (prob > 0.75f)
        trigger_alert();
}

LMS – 背景噪声消除

最小均方（Least‑Mean‑Squares）滤波器通过每个权重一次乘加更新来适应并抑制信号中的周期性噪声——是最轻量的在线学习器。

#include "embml.h"

#define FILTER_LEN 16

float weights[FILTER_LEN];
LMSModel model;

void setup(void) {
    /* Normalised LMS: stable without tuning step size manually */
    lms_init_nlms(&model, FILTER_LEN, 0.5f, 1e-6f, weights);
}

void loop(void) {
    float noisy_signal[FILTER_LEN] = { /* circular buffer of ADC samples */ };
    float desired = read_reference_mic();

    lms_update(&model, noisy_signal, desired);
    float clean = lms_predict(&model, noisy_signal);

    output_audio(clean);
}

RLS – 快速收敛的系统辨识

RLS 的收敛速度远快于 SGD，且无需调节学习率。这里它实时辨识未知系统（例如电机的传递函数）的系数。

#include "embml.h"

#define N 5

float weights[N], P[N * N], k_scratch[N];
RLSModel model;

void setup(void) {
    /* lambda = 0.98 : moderate forgetting for a slowly drifting system */
    /* delta = 1000  : weak prior — trust the data quickly               */
    rls_init(&model, N, 0.98f, 1000.0f, weights, P);
}

void loop(void) {
    float x[N] = { u_delayed(1), u_delayed(2),
                   y_delayed(1), y_delayed(2), 1.0f };
    float y_now = read_plant_output();

    rls_update(&model, x, y_now, k_scratch);

    /* weights[] now approximate the ARX model coefficients */
    float y_pred = rls_predict(&model, x);
    float residual = y_now - y_pred;
}

增量 QR – 数值稳健的最小二乘

当输入数据条件差（例如特征高度相关）时，RLS 可能失去数值稳定性。通过 Givens 旋转实现的增量 QR 永不直接形成协方差矩阵，从而避免此问题。

#include "embml.h"

#define N 6

float R[N * N], f[N];
float w[N], scratch[2 * N];
IQRModel model;

void setup(void) {
    /* ridge = 1e-4 : small regularisation until enough samples arrive */
    iqr_init(&model, N, 0.99f, 1e-4f, R, f);
}

void loop(void) {
    float x[N] = { feature_1(), feature_2(), feature_3(),
                   feature_4()

Source: …

, feature_5(), feature_6() };
    float y   = measurement();

    iqr_update(&model, x, y, w, scratch);
    float y_est = iqr_predict(&model, x);
}

入门

克隆仓库

git clone https://github.com/hejhdiss/embml.git

复制所需的模块（*.c 和 *.h）到项目的源代码树中。
在任何使用 API 的文件中包含 embml.h。
配置模型（学习率、遗忘因子等），请参照上面的示例。
构建并烧录——代码可使用任何兼容 C99 的工具链编译（ARM‑GCC、ESP‑IDF、Arduino 等）。

License

embml 在 MIT License 下发布——欢迎在商业产品中使用、修改和发布。

前馈 MLP — 小型神经网络，设备端训练

一个包含 4 个输入、8 个隐藏神经元和 1 个输出的 3 层网络。使用 Xavier 初始化。在 MCU 上通过反向传播 + 梯度裁剪进行训练。

#include "embml.h"

#define L0 4
#define L1 8
#define L2 1

float W0[L1*L0], b0[L1], a1[L1], d1[L1];
float W1[L2*L1], b1_[L2], a2[L2], d2[L2];
float input_buf[L0];

NNLayer layers[2] = {
    { W0, b0,  a1, d1, L0, L1, EMBML_ACT_RELU    },
    { W1, b1_, a2, d2, L1, L2, EMBML_ACT_SIGMOID },
};
NNModel net;

void setup(void) {
    nn_init(&net, layers, 2, input_buf, L0, 0.01f, 1.0f);
}

void loop(void) {
    float x[L0]      = { s1(), s2(), s3(), s4() };
    float target[L2] = { ground_truth() };

    nn_train_sample(&net, x, target);

    /* Or just inference: */
    const embml_float_t *out = nn_forward(&net, x);
    float prediction = out[0];
}

GRU — 时间序列推断

一个门控循环单元（GRU）细胞逐步处理顺序传感器数据。权重从闪存加载（在主机上离线训练），隐藏状态在时间步之间保持。

#include "embml.h"

#define X_SZ 4
#define H_SZ 8

/* Weights trained offline, stored as const arrays in flash */
#include "gru_weights.h"   /* defines Wz, Wr, Wn, Uz, Ur, Un, bz, br, bn */

float h_state[H_SZ];
float scratch[3 * H_SZ];
GRUCell cell;

void setup(void) {
    gru_init(&cell, X_SZ, H_SZ,
             Wz, Wr, Wn, Uz, Ur, Un,
             bz, br, bn, h_state, scratch);
}

void loop(void) {
    float x_t[X_SZ] = { accel_x(), accel_y(), accel_z(), gyro_z() };

    gru_step(&cell, x_t);

    /* Hidden state in cell.h[] — pass to a classifier or threshold */
    float anomaly_score = cell.h[0];
    if (anomaly_score > 0.8f)
        flag_anomaly();
}

回声状态网络 — 设备端训练，无反向传播

储层（随机权重）是固定的，存储在闪存中。仅线性读出层进行训练——通过递归最小二乘（RLS），一次处理一个样本。这在适应性和计算成本之间提供了嵌入式时间序列学习的最佳平衡。

#include "embml.h"
#include "esn_reservoir.h"  /* const W_in[H*X], const W_res[H*H] in flash */

#define X_SZ  4
#define H_SZ 32
#define Y_SZ  1

float W_out[Y_SZ * H_SZ];
float state[H_SZ], scratch[H_SZ];
float P[H_SZ * H_SZ], k[H_SZ];

ESNModel esn;
RLSModel rls;

void setup(void) {
    esn_init(&esn, X_SZ, H_SZ, Y_SZ,
             W_in, W_res, 0.9f,
             state, scratch, W_out);
    esn_rls_init(&esn, &rls, 0.98f, 1000.0f, P, k);
}

void loop(void) {
    float x[X_SZ] = { s1(), s2(), s3(), s4() };
    float y[Y_SZ] = { read_target() };

    /* Training mode */
    esn_update_state(&esn, x);
    esn_rls_update(&esn, y);

    /* Inference mode */
    float y_out[Y_SZ];
    esn_update_state(&esn, x);
    esn_predict(&esn, y_out);
}

为什么这个仓库存在

这是一个示例——一个概念验证，证明这些算法可以干净地嵌入到嵌入式 C 中，API 对没有机器学习背景的固件工程师也可用，并且在设备上学习对中端 MCU 并非科幻。

如果你正在使用它构建项目、进行适配，或仅仅阅读源码以了解 RLS 或 Givens 旋转在平面 C 数组中的实际工作原理——这正是它存在的目的。

📦 示例仓库:

MIT License · 作者: @hejhdiss · 使用 Claude Sonnet 4.5 生成