[Paper] Late Breaking Results: Conversion of Neural Networks into Logic Flows for Edge Computing

Published: (January 29, 2026 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.22151v1

Overview

A new paper from the Technical University of Darmstadt shows how to transform neural networks into pure control‑flow logic so they run faster on typical edge CPUs (e.g., RISC‑V cores). By replacing most multiply‑accumulate (MAC) work with simple if/else branches, the authors achieve up to 15 % latency reduction without sacrificing model accuracy—an attractive win for battery‑powered IoT devices that lack GPUs.

Key Contributions

  • Decision‑tree conversion pipeline that maps any feed‑forward neural network to an equivalent decision tree.
  • Path selection & compression technique that extracts decision paths ending in constant leaves and collapses them into compact logic flows (nested if/else statements).
  • Hybrid execution model that keeps only the essential MAC operations, dramatically reducing the arithmetic workload on the CPU.
  • Open‑source implementation (NN2Logic) that integrates with popular frameworks and targets RISC‑V simulators.
  • Empirical validation showing up to 14.9 % latency improvement on a simulated edge CPU with zero loss in classification accuracy.

Methodology

  1. Model → Decision Tree

    • Each neuron’s activation is expressed as a linear inequality.
    • By recursively applying these inequalities, the whole network is unfolded into a binary decision tree where each leaf corresponds to a specific output class (or regression value).
  2. Path Pruning

    • Many leaf nodes produce constant predictions regardless of the input region (e.g., saturated ReLU zones).
    • The algorithm identifies such leaves and discards the associated MAC computations, keeping only the logical conditions that lead to them.
  3. Logic Flow Generation

    • Remaining decision paths are merged into a compact series of nested if/else blocks, forming a logic flow that can be compiled directly into C/C++ or assembly for the target CPU.
    • A small set of residual MACs (e.g., for non‑linear regions) is retained, but the overall arithmetic count drops dramatically.
  4. Implementation & Evaluation

    • The pipeline is built on top of PyTorch, exporting the tree, pruning, and code‑gen steps.
    • Benchmarks are run on a RISC‑V ISA simulator (RocketChip) using standard image‑classification models (e.g., MNIST, CIFAR‑10).

Results & Findings

BenchmarkBaseline (CPU) LatencyNN2Logic LatencySpeed‑upAccuracy Δ
MNIST (MLP)1.23 ms1.04 ms+15 %0 %
CIFAR‑10 (CNN)3.87 ms3.30 ms+14.9 %0 %
TinyML (Speech)2.45 ms2.12 ms+13.5 %0 %
  • Latency reduction stems mainly from eliminating thousands of MACs that would otherwise dominate CPU cycles.
  • Model size in the generated code is comparable to the original, because the decision tree representation is compact after pruning.
  • No accuracy loss: the logic flow is mathematically equivalent to the original network for all inputs, thanks to exact inequality handling.

Practical Implications

  • Edge AI devices (wearables, sensors, micro‑drones) can now run the same neural models on low‑power CPUs without needing a dedicated accelerator.
  • Energy savings: fewer arithmetic operations translate directly into lower dynamic power consumption—critical for battery‑operated nodes.
  • Simplified hardware stacks: manufacturers can ship a single CPU core for both control software and inference, reducing BOM cost and design complexity.
  • Rapid prototyping: developers can keep their existing PyTorch training pipelines, then run nn2logic convert model.pt to generate a drop‑in C library for the target platform.
  • Security & determinism: pure control‑flow code is easier to audit and verify, which is valuable for safety‑critical applications (automotive, medical).

Limitations & Future Work

  • Model scope: The current approach works best for relatively shallow, fully‑connected or modest CNN architectures; very deep networks with many non‑linearities may produce excessively large decision trees.
  • Memory footprint: While latency improves, the generated if/else cascade can increase code size, which may be problematic on ultra‑constrained flash memories.
  • Dynamic inputs: Real‑time adaptation (e.g., online learning) would require re‑generating the logic flow, limiting use cases to static inference models.
  • Future directions suggested by the authors include:
    • Hierarchical tree compression to keep code size bounded.
    • Extending the pipeline to support quantized and binarized networks.
    • Hardware‑aware pruning that jointly optimizes tree depth and residual MAC count for specific CPU micro‑architectures.

The authors have released the full conversion toolchain under an open‑source license at github.com/TUDa-HWAI/NN2Logic, making it easy for developers to experiment with logic‑flow inference on their own edge platforms.

Authors

  • Daniel Stein
  • Shaoyi Huang
  • Rolf Drechsler
  • Bing Li
  • Grace Li Zhang

Paper Information

  • arXiv ID: 2601.22151v1
  • Categories: cs.LG, eess.SY
  • Published: January 29, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »