Skip to main content

FPT Kubernetes Engine with GPU

📄️ Using GPU telemetry

FPT Cloud uses NVIDIA GPU Telemetry integrated with kube-prometheus-stack as the monitoring and observability toolset for GPU-based systems on Kubernetes. The monitoring stack includes a collector, a time-series database for storing metrics, and a visualization layer. It uses the popular open-source applications Prometheus and Grafana. Prometheus also includes Alertmanager for creating and managing alerts. Prometheus is deployed together with kube-state-metrics and node_exporter to display cluster-level metrics for Kubernetes API objects and node-level metrics such as GPU utilization.