[Paper] Late Breaking Results: Conversion of Neural Networks into Logic Flows for Edge Computing
Source: arXiv - 2601.22151v1
Overview
A new paper from the Technical University of Darmstadt shows how to transform neural networks into pure control‑flow logic so they run faster on typical edge CPUs (e.g., RISC‑V cores). By replacing most multiply‑accumulate (MAC) work with simple if/else branches, the authors achieve up to 15 % latency reduction without sacrificing model accuracy—an attractive win for battery‑powered IoT devices that lack GPUs.
Key Contributions
- Decision‑tree conversion pipeline that maps any feed‑forward neural network to an equivalent decision tree.
- Path selection & compression technique that extracts decision paths ending in constant leaves and collapses them into compact logic flows (nested
if/elsestatements). - Hybrid execution model that keeps only the essential MAC operations, dramatically reducing the arithmetic workload on the CPU.
- Open‑source implementation (
NN2Logic) that integrates with popular frameworks and targets RISC‑V simulators. - Empirical validation showing up to 14.9 % latency improvement on a simulated edge CPU with zero loss in classification accuracy.
Methodology
-
Model → Decision Tree
- Each neuron’s activation is expressed as a linear inequality.
- By recursively applying these inequalities, the whole network is unfolded into a binary decision tree where each leaf corresponds to a specific output class (or regression value).
-
Path Pruning
- Many leaf nodes produce constant predictions regardless of the input region (e.g., saturated ReLU zones).
- The algorithm identifies such leaves and discards the associated MAC computations, keeping only the logical conditions that lead to them.
-
Logic Flow Generation
- Remaining decision paths are merged into a compact series of nested
if/elseblocks, forming a logic flow that can be compiled directly into C/C++ or assembly for the target CPU. - A small set of residual MACs (e.g., for non‑linear regions) is retained, but the overall arithmetic count drops dramatically.
- Remaining decision paths are merged into a compact series of nested
-
Implementation & Evaluation
- The pipeline is built on top of PyTorch, exporting the tree, pruning, and code‑gen steps.
- Benchmarks are run on a RISC‑V ISA simulator (RocketChip) using standard image‑classification models (e.g., MNIST, CIFAR‑10).
Results & Findings
| Benchmark | Baseline (CPU) Latency | NN2Logic Latency | Speed‑up | Accuracy Δ |
|---|---|---|---|---|
| MNIST (MLP) | 1.23 ms | 1.04 ms | +15 % | 0 % |
| CIFAR‑10 (CNN) | 3.87 ms | 3.30 ms | +14.9 % | 0 % |
| TinyML (Speech) | 2.45 ms | 2.12 ms | +13.5 % | 0 % |
- Latency reduction stems mainly from eliminating thousands of MACs that would otherwise dominate CPU cycles.
- Model size in the generated code is comparable to the original, because the decision tree representation is compact after pruning.
- No accuracy loss: the logic flow is mathematically equivalent to the original network for all inputs, thanks to exact inequality handling.
Practical Implications
- Edge AI devices (wearables, sensors, micro‑drones) can now run the same neural models on low‑power CPUs without needing a dedicated accelerator.
- Energy savings: fewer arithmetic operations translate directly into lower dynamic power consumption—critical for battery‑operated nodes.
- Simplified hardware stacks: manufacturers can ship a single CPU core for both control software and inference, reducing BOM cost and design complexity.
- Rapid prototyping: developers can keep their existing PyTorch training pipelines, then run
nn2logic convert model.ptto generate a drop‑in C library for the target platform. - Security & determinism: pure control‑flow code is easier to audit and verify, which is valuable for safety‑critical applications (automotive, medical).
Limitations & Future Work
- Model scope: The current approach works best for relatively shallow, fully‑connected or modest CNN architectures; very deep networks with many non‑linearities may produce excessively large decision trees.
- Memory footprint: While latency improves, the generated
if/elsecascade can increase code size, which may be problematic on ultra‑constrained flash memories. - Dynamic inputs: Real‑time adaptation (e.g., online learning) would require re‑generating the logic flow, limiting use cases to static inference models.
- Future directions suggested by the authors include:
- Hierarchical tree compression to keep code size bounded.
- Extending the pipeline to support quantized and binarized networks.
- Hardware‑aware pruning that jointly optimizes tree depth and residual MAC count for specific CPU micro‑architectures.
The authors have released the full conversion toolchain under an open‑source license at github.com/TUDa-HWAI/NN2Logic, making it easy for developers to experiment with logic‑flow inference on their own edge platforms.
Authors
- Daniel Stein
- Shaoyi Huang
- Rolf Drechsler
- Bing Li
- Grace Li Zhang
Paper Information
- arXiv ID: 2601.22151v1
- Categories: cs.LG, eess.SY
- Published: January 29, 2026
- PDF: Download PDF