Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters
Source: Meta Engineering
We’re sharing details of the role backend aggregation (BAG) plays in building Meta’s gigawatt‑scale AI clusters like Prometheus. BAG allows us to seamlessly connect thousands of GPUs across multiple data centers and regions. Our BAG implementation is connecting two different network fabrics – Disaggregated Schedule Fabric (DSF) and Non‑Scheduled Fabric (NSF). Once it’s complete our AI …