📄️ GPU Service Overview on Kubernetes FPT Cloud
FPT Cloud provides Kubernetes with NVIDIA GPU support with the following key features:
📄️ Installing and initializing a GPU Kubernetes cluster
FPT Cloud supports the following GPU cards:
📄️ Modifying a GPU worker group
Prerequisites:
📄️ Deploying GPU applications on Kubernetes
Kubernetes manages and uses GPU resources similarly to CPU resources. Declare GPU resources for your application based on the GPU configuration selected for the worker group.
📄️ Using GPU telemetry
FPT Cloud uses NVIDIA GPU Telemetry integrated with kube-prometheus-stack as the monitoring and observability toolset for GPU-based systems on Kubernetes. The monitoring stack includes a collector, a time-series database for storing metrics, and a visualization layer. It uses the popular open-source applications Prometheus and Grafana. Prometheus also includes Alertmanager for creating and managing alerts. Prometheus is deployed together with kube-state-metrics and node_exporter to display cluster-level metrics for Kubernetes API objects and node-level metrics such as GPU utilization.
📄️ Using Autoscaler with GPU
Container-level autoscaling
📄️ Using GPU sharing modes
GPU sharing modes allow a physical GPU to be shared among multiple containers in order to optimize GPU utilization. The following GPU sharing strategies are supported:
📄️ Adding a GPU worker group
Prerequisites:
📄️ GPU driver installation guide on Kubernetes
Users can self-install the desired GPU driver on an FPT Kubernetes Engine cluster with GPU integration.
📄️ Configuring auto scale using GPU custom metrics
Kubernetes supports automatic scaling based on custom metrics such as GPU metrics by integrating with Prometheus. This guide explains how to configure auto scaling for GPU-based applications running on the FPT Kubernetes Engine platform.
📄️ FPT Kubernetes Engine with GPU
FPT Kubernetes Engine with GPU
📄️ Configuring auto scale using KEDA and Prometheus
Prerequisites