[Paper] BlazeAIoT: A Modular Multi-Layer Platform for Real-Time Distributed Robotics Across Edge, Fog, and Cloud Infrastructures

Published: (January 9, 2026 at 05:47 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.06344v1

Overview

BlazeAIoT is a new open‑source platform that lets developers stitch together edge devices, fog nodes, and cloud clusters into a single, real‑time robotics system. By abstracting away the plumbing of data transport and service orchestration, it promises to cut the engineering effort needed to build scalable, latency‑sensitive robot fleets for factories, warehouses, or smart‑city deployments.

Key Contributions

  • Modular multi‑layer architecture that spans edge → fog → cloud while keeping a unified programming model.
  • Dynamic data‑bridging layer supporting DDS, Kafka, Redis, and ROS 2, with automatic format conversion and adaptive rate‑limiting.
  • Kubernetes‑driven service deployment that can place compute‑intensive AI modules on the cloud and low‑latency control loops on the edge without manual re‑configuration.
  • Hierarchical monitoring & health‑checking (per‑node, per‑service, and system‑wide) that triggers self‑healing actions when a node drops out.
  • Language‑agnostic APIs (C++, Python, Java) enabling existing robotics codebases to plug into BlazeAIoT with minimal changes.
  • Cost‑aware scheduler that balances performance against cloud usage fees, automatically scaling services up or down based on workload.

Methodology

The authors built BlazeAIoT as a set of Docker containers orchestrated by Kubernetes. Each layer (edge, fog, cloud) runs its own lightweight K8s cluster that registers with a global configuration service. This service publishes a topology graph that describes where sensors, actuators, and compute resources live.

A data‑distribution engine sits on top of the broker stack (DDS ↔ Kafka ↔ Redis ↔ ROS 2). When a robot publishes a message, the engine consults the topology graph and decides:

  1. Where the message should be forwarded (e.g., raw lidar to edge for SLAM, compressed map to fog for aggregation).
  2. How to transport it (binary DDS for low latency, Kafka for reliable batch processing).
  3. Whether to apply rate limiting or message chunking (important for large AI inference payloads).

Developers describe services (e.g., “path planner”, “object detector”) in a YAML manifest that includes resource constraints, preferred execution layer, and fallback nodes. The scheduler then deploys the service containers accordingly, monitors their health, and can migrate them if a node fails or becomes overloaded.

The platform was evaluated on two realistic robotics scenarios:

  • Autonomous navigation in a warehouse with multiple AGVs (Automated Guided Vehicles) requiring sub‑100 ms control loops.
  • AI‑driven perception where high‑resolution camera streams are processed by a deep‑learning model hosted in the cloud, with results streamed back to edge controllers.

Performance metrics (latency, throughput, CPU/memory usage) were collected using the built‑in monitoring stack and compared against a baseline where all services run either purely on edge or purely on cloud.

Results & Findings

MetricEdge‑Only BaselineCloud‑Only BaselineBlazeAIoT (Hybrid)
End‑to‑end control latency (ms)7821262
Per‑frame AI inference latency (ms)N/A (no AI)14598
Network bandwidth (Mbps)12 (local)68 (cloud upload)34
Service downtime (seconds)12 (node loss)4 (cloud outage)1.2
Cloud cost (USD/hr)03.81.6
  • Latency: By keeping time‑critical loops on the edge and off‑loading heavy AI to fog/cloud, BlazeAIoT reduced control latency by ~20 % compared with an edge‑only setup.
  • Bandwidth: Adaptive data bridging compressed large sensor payloads before sending them upstream, halving the required bandwidth.
  • Resilience: Automatic failover moved a navigation service from a failed edge node to a nearby fog node within 1 s, keeping the robot operational.
  • Cost: The cost‑aware scheduler trimmed cloud spend by ~58 % while still delivering comparable AI performance.

Overall, the platform proved capable of meeting hard real‑time constraints while dynamically adapting to topology changes and workload spikes.

Practical Implications

  • Faster time‑to‑market: Robotics teams can reuse existing ROS 2 nodes and simply add a BlazeAIoT manifest to gain edge/fog/cloud elasticity—no need to rewrite communication code.
  • Scalable fleet management: Operators of hundreds of robots can centrally monitor health, push OTA updates, and let the scheduler balance compute across on‑premise fog nodes and public cloud bursts.
  • Cost optimization: The built‑in cost model lets DevOps set budget caps; the platform will automatically shift non‑critical workloads to cheaper edge resources when possible.
  • Cross‑domain reuse: Because the data‑distribution layer is broker‑agnostic, the same stack can be applied to smart‑city sensor networks, industrial IoT gateways, or even AR/VR edge streaming pipelines.
  • Security posture: Integrated TLS for all broker channels and per‑service RBAC simplify compliance with industry standards (e.g., IEC 62443 for industrial automation).

For developers, the most immediate benefit is a single API surface (blaze.publish(), blaze.subscribe()) that abstracts away whether a message travels over DDS, Kafka, or ROS 2, letting you focus on algorithmic innovation rather than infrastructure plumbing.

Limitations & Future Work

  • Topology discovery overhead: In highly dynamic environments (e.g., drones joining/leaving mid‑mission), the configuration service can become a bottleneck; the authors suggest a decentralized gossip protocol as a next step.
  • Hardware heterogeneity: While the platform supports Docker containers, it does not yet handle bare‑metal or FPGA‑accelerated workloads out‑of‑the‑box.
  • Security trade‑offs: TLS termination at the broker adds latency; future work will explore lightweight session keys for ultra‑low‑latency loops.
  • Extensibility to non‑ROS ecosystems: The current adapters focus on ROS 2; adding native support for MQTT or OPC‑UA would broaden applicability to broader IoT domains.

The paper lays a solid foundation, and the open‑source release (still in beta) invites the community to address these gaps and push the platform toward production‑grade deployments.

Authors

  • Cedric Melancon
  • Julien Gascon‑Samson
  • Maarouf Saad
  • Kuljeet Kaur
  • Simon Savard

Paper Information

  • arXiv ID: 2601.06344v1
  • Categories: cs.RO, cs.DC
  • Published: January 9, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »