[Paper] BlazeAIoT: A Modular Multi-Layer Platform for Real-Time Distributed Robotics Across Edge, Fog, and Cloud Infrastructures

Published: 1 week ago (January 9, 2026 at 05:47 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.06344v1

Overview

BlazeAIoT is a new open‑source platform that lets developers stitch together edge devices, fog nodes, and cloud clusters into a single, real‑time robotics system. By abstracting away the plumbing of data transport and service orchestration, it promises to cut the engineering effort needed to build scalable, latency‑sensitive robot fleets for factories, warehouses, or smart‑city deployments.

Key Contributions

Modular multi‑layer architecture that spans edge → fog → cloud while keeping a unified programming model.
Dynamic data‑bridging layer supporting DDS, Kafka, Redis, and ROS 2, with automatic format conversion and adaptive rate‑limiting.
Kubernetes‑driven service deployment that can place compute‑intensive AI modules on the cloud and low‑latency control loops on the edge without manual re‑configuration.
Hierarchical monitoring & health‑checking (per‑node, per‑service, and system‑wide) that triggers self‑healing actions when a node drops out.
Language‑agnostic APIs (C++, Python, Java) enabling existing robotics codebases to plug into BlazeAIoT with minimal changes.
Cost‑aware scheduler that balances performance against cloud usage fees, automatically scaling services up or down based on workload.

Methodology

The authors built BlazeAIoT as a set of Docker containers orchestrated by Kubernetes. Each layer (edge, fog, cloud) runs its own lightweight K8s cluster that registers with a global configuration service. This service publishes a topology graph that describes where sensors, actuators, and compute resources live.

A data‑distribution engine sits on top of the broker stack (DDS ↔ Kafka ↔ Redis ↔ ROS 2). When a robot publishes a message, the engine consults the topology graph and decides:

Where the message should be forwarded (e.g., raw lidar to edge for SLAM, compressed map to fog for aggregation).
How to transport it (binary DDS for low latency, Kafka for reliable batch processing).
Whether to apply rate limiting or message chunking (important for large AI inference payloads).

Developers describe services (e.g., “path planner”, “object detector”) in a YAML manifest that includes resource constraints, preferred execution layer, and fallback nodes. The scheduler then deploys the service containers accordingly, monitors their health, and can migrate them if a node fails or becomes overloaded.

The platform was evaluated on two realistic robotics scenarios:

Autonomous navigation in a warehouse with multiple AGVs (Automated Guided Vehicles) requiring sub‑100 ms control loops.
AI‑driven perception where high‑resolution camera streams are processed by a deep‑learning model hosted in the cloud, with results streamed back to edge controllers.

Performance metrics (latency, throughput, CPU/memory usage) were collected using the built‑in monitoring stack and compared against a baseline where all services run either purely on edge or purely on cloud.

Results & Findings

Metric	Edge‑Only Baseline	Cloud‑Only Baseline	BlazeAIoT (Hybrid)
End‑to‑end control latency (ms)	78	212	62
Per‑frame AI inference latency (ms)	N/A (no AI)	145	98
Network bandwidth (Mbps)	12 (local)	68 (cloud upload)	34
Service downtime (seconds)	12 (node loss)	4 (cloud outage)	1.2
Cloud cost (USD/hr)	0	3.8	1.6

Latency: By keeping time‑critical loops on the edge and off‑loading heavy AI to fog/cloud, BlazeAIoT reduced control latency by ~20 % compared with an edge‑only setup.
Bandwidth: Adaptive data bridging compressed large sensor payloads before sending them upstream, halving the required bandwidth.
Resilience: Automatic failover moved a navigation service from a failed edge node to a nearby fog node within 1 s, keeping the robot operational.
Cost: The cost‑aware scheduler trimmed cloud spend by ~58 % while still delivering comparable AI performance.

Overall, the platform proved capable of meeting hard real‑time constraints while dynamically adapting to topology changes and workload spikes.

Practical Implications

Faster time‑to‑market: Robotics teams can reuse existing ROS 2 nodes and simply add a BlazeAIoT manifest to gain edge/fog/cloud elasticity—no need to rewrite communication code.
Scalable fleet management: Operators of hundreds of robots can centrally monitor health, push OTA updates, and let the scheduler balance compute across on‑premise fog nodes and public cloud bursts.
Cost optimization: The built‑in cost model lets DevOps set budget caps; the platform will automatically shift non‑critical workloads to cheaper edge resources when possible.
Cross‑domain reuse: Because the data‑distribution layer is broker‑agnostic, the same stack can be applied to smart‑city sensor networks, industrial IoT gateways, or even AR/VR edge streaming pipelines.
Security posture: Integrated TLS for all broker channels and per‑service RBAC simplify compliance with industry standards (e.g., IEC 62443 for industrial automation).

For developers, the most immediate benefit is a single API surface (blaze.publish(), blaze.subscribe()) that abstracts away whether a message travels over DDS, Kafka, or ROS 2, letting you focus on algorithmic innovation rather than infrastructure plumbing.

Limitations & Future Work

Topology discovery overhead: In highly dynamic environments (e.g., drones joining/leaving mid‑mission), the configuration service can become a bottleneck; the authors suggest a decentralized gossip protocol as a next step.
Hardware heterogeneity: While the platform supports Docker containers, it does not yet handle bare‑metal or FPGA‑accelerated workloads out‑of‑the‑box.
Security trade‑offs: TLS termination at the broker adds latency; future work will explore lightweight session keys for ultra‑low‑latency loops.
Extensibility to non‑ROS ecosystems: The current adapters focus on ROS 2; adding native support for MQTT or OPC‑UA would broaden applicability to broader IoT domains.

The paper lays a solid foundation, and the open‑source release (still in beta) invites the community to address these gaps and push the platform toward production‑grade deployments.

Authors

Cedric Melancon
Julien Gascon‑Samson
Maarouf Saad
Kuljeet Kaur
Simon Savard

Paper Information

arXiv ID: 2601.06344v1
Categories: cs.RO, cs.DC
Published: January 9, 2026
PDF: Download PDF

[Paper] BlazeAIoT: A Modular Multi-Layer Platform for Real-Time Distributed Robotics Across Edge, Fog, and Cloud Infrastructures

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Space-Optimal, Computation-Optimal, Topology-Agnostic, Throughput-Scalable Causal Delivery through Hybrid Buffering

[Paper] Konflux: Optimized Function Fusion for Serverless Applications

[Paper] AFLL: Real-time Load Stabilization for MMO Game Servers Based on Circular Causality Learning

[Paper] Breaking the Storage-Bandwidth Tradeoff in Distributed Storage with Quantum Entanglement