Because in a Life-Threatening Situation, Every Millisecond Counts

Published: (June 11, 2026 at 11:38 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

Removing expf() from a fire detector: one header, 1.95x faster, zero accuracy loss

A smoke detector is not a demo project. When it fires, someone either evacuates in time or doesn’t. The firmware running on that microcontroller has one job, and it needs to do it without hesitation, without bloat, and without dependencies that can fail in unexpected ways. Last May 28th I published a bare-metal fire detection system built with Hasaki 刃先 — a neural network trainer that exports standalone C headers with no runtime, no Python, no TensorFlow. The model is a 12-8-4-1 MLP trained on 28,596 sensor readings. It fits in 3.8 kB of Flash and achieves 99.93% accuracy on held-out data, with a single missed fire event out of 3,599. But there was something in that header that bothered me. static inline float sigmoid(float x) { return 1.0f / (1.0f + expf(-x)); }

expf(). Right there in a life-safety application. On a microcontroller that may not have a hardware FPU. On processors with a hardware FPU — like the ESP32-C3 — expf() is fast. But the moment you deploy to an ATmega328P, an ATtiny85, or any Cortex-M0 target, that call becomes software floating-point. The CPU has to simulate the operation in firmware, cycle by cycle. It works. But it carries hidden cost: unpredictable latency, dependency on math.h, and a transcendental function sitting in the critical path of every single inference. For a smoke detector running at 1 Hz this might seem irrelevant. But inference latency compounds with sensor reads, normalization, and communication overhead. And more importantly — if you’re deploying to a truly constrained target, expf() might be the difference between fitting in Flash or not. kigu-quant(comming soon) is a new tool in the Rosito Bench ecosystem. It generates ready-to-include C headers for evaluating mathematical functions on microcontrollers — no FPU, no libm, no dependencies. One command: kigu-quant —method lut —func sigmoid —size 256 —fmt q15 -o lut_sigmoid.h

One change in the model header: // Before #include static inline float sigmoid(float x) { return 1.0f / (1.0f + expf(-x)); }

// After #include “lut_sigmoid.h” // sigmoid is now lut_sigmoid_lookup() — called directly in predict()

The generated header contains a 256-entry Q1.15 lookup table covering the range [-6, 6], an inline interpolated lookup function, and nothing else. No math.h. No expf(). No heap allocation. 512 bytes of Flash for the table. Measured with micros() on an Arduino Nano, 1000 evaluations, anti-optimization accumulator:

Method 1000 evaluations Speedup

expf() 227,292 µs baseline

lut_sigmoid_lookup() 116,512 µs 1.95x faster

Max error vs expf(): 0.000021 These are honest numbers from real hardware, not simulations. The sigmoid sits at the output layer — one evaluation per inference, converting the final logit to a probability. The LUT covers [-6, 6] with 256 points and linear interpolation. For this model, the pre-activation values at the output layer fall well within that range during normal operation. Validation on 7,150 held-out samples, never seen during training: [3547 4] ← TN FP [ 1 3598] ← FN TP

Accuracy: 99.93% FN: 1 / 3,599 fire events

Identical to the float32 baseline. The LUT approximation introduces no measurable degradation at the model level. This project is the first demonstration of Hasaki and kigu-quant working together: Hasaki 刃先 trains the model and exports a standalone C header — weights, biases, and activation functions in pure C++ kigu-quant generates the fixed-point math headers that replace the expensive activations The integration is intentionally minimal. kigu-quant doesn’t touch the model. It doesn’t rewrite the header. You drop in one #include and replace one function call. Everything else stays the same. Train → hasaki → smoke-detector-model-float.h ↓ Generate → kigu-quant → lut_sigmoid.h ↓ #include both → flash → done

The full project — modified model header, generated LUT, and Arduino sketch — is available here: hasaki-smoke-detector-v2 ├── smoke-detector-model-float.h Hasaki model — sigmoid replaced ├── lut_sigmoid.h kigu-quant Q1.15 LUT └── hasaki_kigu_smoke_detector.ino Arduino sketch

The 1.95x speedup on ATmega328P is real and measured. On targets where this matters even more — AVR running at 8MHz, Cortex-M0 with no FPU, low-power MCUs in battery-operated systems — the gap widens further. A fire detector doesn’t need to be fast to be useful. But it should never be slower than it has to be. Every millisecond you give back to the scheduler is a millisecond available for sensor reads, communication, or simply a faster response to the next sample. Every millisecond matters in a life threatening scenario. expf() was a dependency this model never needed. Built with Hasaki 刃先 and kigu-quant (a member of the Kigu 器具 family, comming soon) — Rosito Bench

0 views
Back to Blog

Related posts

Read more »

Introduction to Git

Welcome to Git Mastery, a series where we'll learn Git from the ground up, starting with the absolute basics and gradually moving toward advanced workflows, Git...