Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters
We’re sharing details of the role backend aggregation (BAG) plays in building Meta’s gigawatt-scale AI clusters like Prometheus. BAG allows us to seamlessly connect thousands of GPUs across multiple data centers and regions. Our BAG implementation is connecti…
Once its complete our AI cluster,Prometheus, will deliver 1-gigawatt of capacity to enhance and enable new and existing AI experiences across Meta products. Prometheus infrastructure will span severa… [+5110 chars]