[Paper] RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks

Published: 3 days ago (February 9, 2026 at 04:57 AM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.08446v1

Overview

The paper presents RIFLE, a new federated‑learning (FL) framework that swaps the traditional gradient‑exchange step for a logit‑based knowledge‑distillation approach. By doing so, it lets tiny IoT devices collaborate to train deep neural networks (e.g., VGG‑19, ResNet‑18) while staying within strict compute, memory, and energy budgets, and it adds a built‑in safeguard against malicious clients.

Key Contributions

Distillation‑centric FL: Replaces gradient sharing with logit (soft‑label) sharing, enabling deep‑model training on TinyML‑class hardware.
Robustness via KL‑based validation: Introduces a server‑side Kullback‑Leibler divergence check that flags unreliable or poisoned client updates without ever seeing raw data.
Extreme speed‑up: Demonstrates a >99.9 % reduction in training time for VGG‑19 on a 0.3 GFLOPS IoT node (≈ 600 days → 1.39 h).
Improved accuracy & security: Achieves up to 28.3 % higher test accuracy and cuts false‑positive detection rates by 87.5 % under severe non‑IID data splits; poisoning attack impact drops by 62.5 %.
Broad evaluation: Validated on MNIST, CIFAR‑10, and CIFAR‑100 with realistic heterogeneous client distributions and several attack scenarios.

Methodology

Local inference, not training: Each client runs a compact “student” model (TinyML) on its private data and produces logits (the raw class scores before softmax).
Logit transmission: Instead of sending weight gradients, clients upload these logits (or a compressed version) to the central server.
Server‑side distillation: The server aggregates the received logits using a knowledge‑distillation loss (cross‑entropy + KL divergence) to update a global “teacher” model that can be deep (VGG‑19, ResNet‑18).
Reliability scoring: For every client, the server computes the KL divergence between the client’s logits and the current global logits on a small, server‑held validation set. High divergence triggers a trust penalty (the client’s contribution is down‑weighted or discarded).
Model broadcast: The updated teacher model is distilled back into a lightweight student model and sent to the devices for the next round.
Iterative rounds: The process repeats for a fixed number of communication rounds (e.g., 10), gradually improving both the deep global model and the on‑device student.

The whole pipeline avoids transmitting raw gradients (which are large and privacy‑sensitive) and keeps on‑device compute to a few forward passes per round.

Results & Findings

Dataset	Baseline FL (e.g., FedAvg)	RIFLE (10 rounds)	Accuracy Δ	Training‑time reduction (VGG‑19)	Attack mitigation
MNIST (highly non‑IID)	78.2 %	92.5 %	+14.3 %	600 days → 1.39 h	FP ↓ 87.5 %
CIFAR‑10	61.0 %	78.8 %	+17.8 %	—	Poisoning impact ↓ 62.5 %
CIFAR‑100	45.3 %	73.6 %	+28.3 %	—	—

Robustness: The KL‑based validator successfully filtered out >90 % of malicious logits in simulated label‑flipping and model‑poisoning attacks.
Communication efficiency: Logits are an order of magnitude smaller than full gradient tensors, cutting bandwidth usage by ~70 %.
Scalability: Experiments with up to 100 simulated IoT clients showed linear scaling of convergence speed; the server’s validation step remained lightweight (<5 ms per client on a modest CPU).

Practical Implications

Deploy deep vision models on edge sensors: Manufacturers can now ship firmware that runs a tiny student model locally while still benefiting from a powerful global teacher—useful for smart cameras, drones, or wearables.
Secure federated updates: The KL‑based trust metric offers a plug‑and‑play “sanity check” for any FL system that needs to guard against compromised devices without adding extra cryptographic overhead.
Reduced OTA bandwidth: Since only logits (often <1 KB per batch) travel over the network, OTA (over‑the‑air) updates become cheaper and more reliable, especially in low‑power LPWAN environments.
Faster time‑to‑market: Training a production‑grade model across thousands of devices can be completed in hours rather than weeks, accelerating iterative product improvements.
Compatibility: RIFLE works with existing FL orchestration tools (e.g., TensorFlow Federated, PySyft) by swapping the aggregation function; developers can adopt it with minimal code changes.

Limitations & Future Work

Student‑teacher capacity gap: If the on‑device student model is too weak, the distilled knowledge may not fully transfer, limiting the ceiling accuracy for extremely complex tasks.
Validation set dependence: The KL‑based reliability check assumes the server holds a representative validation set; obtaining such data in privacy‑sensitive domains can be non‑trivial.
Limited attack taxonomy: Experiments focused on label‑flipping and gradient‑poisoning; more sophisticated attacks (e.g., backdoor triggers embedded in logits) remain to be evaluated.
Hardware heterogeneity: While the paper showcases a 0.3 GFLOPS device, real‑world IoT fleets often span a broader spectrum of compute capabilities; adaptive student model sizing is an open research direction.

Future research could explore dynamic student model scaling, privacy‑preserving validation (e.g., using secure enclaves), and extension to other modalities such as audio or time‑series sensor data.

Authors

Pouria Arefijamal
Mahdi Ahmadlou
Bardia Safaei
Jörg Henkel

Paper Information

arXiv ID: 2602.08446v1
Categories: cs.LG, cs.CR, cs.DC, cs.NI
Published: February 9, 2026
PDF: Download PDF

[Paper] RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Diffusion-Pretrained Dense and Contextual Embeddings

[Paper] YOR: Your Own Mobile Manipulator for Generalizable Robotics

[Paper] Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

[Paper] SCRAPL: Scattering Transform with Random Paths for Machine Learning