Enable Iluvatar GCU sharing
Introduction
HAMi now supports iluvatar.ai/gpu(i.e MR-V100、BI-V150、BI-V100) by implementing most device-sharing features as NVIDIA GPUs, including:
GPU sharing: Each task can allocate a portion of GPU instead of a whole GPU card, thus GPU can be shared among multiple tasks.
Device Memory Control: GPUs can be allocated with certain device memory size and have made it that it does not exceed the boundary.
Device Core Control: GPUs can be allocated with limited compute cores and have made it that it does not exceed the boundary.
Device UUID Selection: You can specify which GPU devices to use or exclude using annotations.
Very Easy to use: You don't need to modify your task yaml to use the HAMi scheduler. All your GPU jobs will be automatically supported after installation.
Prerequisites
- Iluvatar gpu-manager (please consult your device provider)
- driver version > 3.1.0
Enabling GPU-sharing Support
- Deploy gpu-manager on iluvatar nodes (Please consult your device provider to acquire its package and document)
NOTICE: Install only gpu-manager, don't install gpu-admission package.
-
Identify the resource name about core and memory usage(i.e 'iluvatar.ai/vcuda-core', 'iluvatar.ai/vcuda-memory')
-
set the 'iluvatarResourceMem' and 'iluvatarResourceCore' parameters when install hami
helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag={your kubernetes version} --set iluvatarResourceMem=iluvatar.ai/vcuda-memory --set iluvatarResourceCore=iluvatar.ai/vcuda-core -n kube-system
The default resource names are:
iluvatar.ai/vgpufor GPU countiluvatar.ai/vcuda-memoryfor memory allocationiluvatar.ai/vcuda-corefor core allocation
You can customize these names using the parameters above.
Device Granularity
HAMi divides each Iluvatar GPU into 100 units for resource allocation. When you request a portion of a GPU, you're actually requesting a certain number of these units.
Memory Allocation
- Each unit of
iluvatar.ai/vcuda-memoryrepresents 256MB of device memory - If you don't specify a memory request, the system will default to using 100% of the available memory
- Memory allocation is enforced with hard limits to ensure tasks don't exceed their allocated memory