FAQ
Supported Device Vendors and Specific Models
| GPU Vendor | GPU Model | Granularity | Multi-GPU Support |
|---|---|---|---|
| NVIDIA | Almost all mainstream consumer and data center GPUs | Core 1%, Memory 1M | Supported. Multi-GPU can still be split and shared using virtualization. |
| Ascend | 910A, 910B2, 910B3, 310P | Minimum granularity depends on the card type template. Refer to the official templates. | Supported, but splitting is not supported when npu > 1. The entire card is exclusively allocated. |
| Hygon | Z100, Z100L, K100-AI | Core 1%, Memory 1M | Supported, but splitting is not supported when dcu > 1. The entire card is exclusively allocated. |
| Cambricon | 370, 590 | Core 1%, Memory 256M | Supported, but splitting is not supported when mlu > 1. The entire card is exclusively allocated. |
| Iluvatar | All | Core 1%, Memory 256M | Supported, but splitting is not supported when gpu > 1. The entire card is exclusively allocated. |
| Mthreads | MTT S4000 | Core 1 core group, Memory 512M | Supported, but splitting is not supported when gpu > 1. The entire card is exclusively allocated. |
| Metax | MXC500 | Does not support splitting, only whole card allocation is possible. | Supported, but all allocations are for whole cards. |
What is vGPU? Why can't I allocate two vGPUs on the same card despite seeing 10 vGPUs?
TL;DR
vGPU increases GPU utilization by enabling multiple tasks to share one GPU through logical splitting. A deviceSplitCount: 10 means the GPU can serve up to 10 tasks simultaneously but does not allow a single task to use multiple vGPUs from the same GPU.
Concept of vGPU
A vGPU is a logical instance of a physical GPU created using virtualization, allowing multiple tasks to share the same physical GPU. For example, setting deviceSplitCount: 10 means a physical GPU can allocate resources to up to 10 tasks. This allocation does not increase physical resources; it only defines logical visibility.
Why can't I allocate two vGPUs on the same card?
-
Significance of vGPU vGPU represents different task views of the same physical GPU. It is not a separate partition of physical resources. When a task requests
nvidia.com/gpu: 2, it is interpreted as requiring two physical GPUs, not two vGPUs from the same GPU. -
Resource Allocation Mechanism vGPU is designed to allow multiple tasks to share one GPU, not to bind multiple vGPUs to a single task on the same GPU. A
deviceSplitCount: 10configuration enables up to 10 tasks to use the same GPU concurrently but does not permit one task to use multiple vGPUs. -
Consistency Between Container and Node Views The GPU UUID inside the container matches the physical node's UUID, reflecting the same GPU. Although there may be 10 visible vGPUs, these are logical overcommit views, not additional independent resources.
-
Design Intent The design of vGPU aims to allow one GPU to be shared by multiple tasks, rather than letting one task occupy multiple vGPUs on the same GPU. The purpose of vGPU overcommitment is to improve GPU utilization, not to increase resource allocation for individual tasks.
HAMi's nvidia.com/priority field only supports two levels. How can we implement multi-level, user-defined priority-based scheduling for a queue of jobs, especially when cluster resources are limited?
TL;DR
HAMi's built-in two-level priority is for runtime preemption on a single GPU (e.g., an urgent task pausing a less critical one on the same card). For scheduling a queue of jobs based on multiple user-defined priorities, integrate HAMi with a scheduler like Volcano, which supports multi-level queue priorities for job allocation and preemption.
HAMi's native nvidia.com/priority field (0 for high, 1 for low/default) is specifically designed for runtime preemption on a single GPU. The typical scenario it addresses is when a low-priority task (e.g., training) is running, and a high-priority task (e.g., inference) needs immediate access to that same GPU. In this case, the high-priority task will cause the low-priority task to pause, effectively ceding compute resources. Once the high-priority task completes, the low-priority task resumes. This mechanism is focused on immediate resource contention on a specific device, rather than for sorting a queue of many pending jobs with multiple priority levels for initial scheduling.
Regarding the scenario where resources are insufficient, 'n' jobs are waiting, and you need to sort them for scheduling based on multiple user-submitted priorities, HAMi's two-level system isn't intended for this broader scheduling requirement.
However, achieving multi-level priority scheduling is feasible. The recommended approach is to integrate HAMi with a more comprehensive scheduler like Volcano:
- Volcano for Multi-Level Scheduling Priority:
- Volcano allows you to define multiple queues with different priority levels.
- It uses these queue priorities to determine the order in which jobs are allocated resources (including HAMi-managed vGPUs) and can manage preemption between jobs based on these wider scheduling priorities. This directly addresses the need for sorting the job queue based on multiple priority levels.
- HAMi for GPU Sharing & Its Runtime Priority:
- HAMi integrates with Volcano via the volcano-vgpu-device-plugin.
- It continues to manage the vGPU sharing and its own two-level runtime priority for tasks contending on the same physical GPU, as described earlier.
In summary, while HAMi's own priority serves a different, device-specific purpose (runtime preemption on a single card), implementing multi-level job scheduling priority is achievable by using Volcano in conjunction with HAMi. Volcano would handle which job from the queue is prioritized for resource allocation based on multiple priority levels, and HAMi would manage the GPU sharing and its specific on-device preemption.
Integration with Other Open-Source Tools
Currently Supported:
-
Volcano: Can be integrated with Volcano by using the
volcano-vgpu-device-pluginunder the HAMi project for GPU resource scheduling and management. -
Koordinator: HAMi can also be integrated with Koordinator to provide end-to-end GPU sharing solutions. By deploying HAMi-core on nodes and configuring the appropriate labels and resource requests in Pods, Koordinator can leverage HAMi’s GPU isolation capabilities, allowing multiple Pods to share the same GPU and significantly improve GPU resource utilization.
For detailed configuration and usage instructions, refer to the Koordinator documentation: Device Scheduling - GPU Share With HAMi
Currently Not Supported:
- KubeVirt & Kata Containers: Incompatible due to their reliance on virtualization for resource isolation, whereas HAMi’s GPU Device Plugin depends on direct GPU mounting into containers. Supporting these would require adapting the device allocation logic, balancing performance overhead and implementation complexity. HAMi prioritizes high-performance scenarios with direct GPU mounting and thus does not currently support these virtualization solutions.