Enable Enflame GCU sharing
Introduction
HAMi now supports sharing on enflame.com/gcu(i.e S60) by implementing most device-sharing features as NVIDIA GPUs, including:
GCU sharing: Each task can allocate a portion of GCU instead of a whole GCU card, thus GCU can be shared among multiple tasks.
Device Memory and Core Control: GCUs can be allocated with certain percentage of device memory and core, HAMi ensures it does not exceed the boundary.
Device UUID Selection: You can specify which GCU devices to use or exclude using annotations.
Very Easy to use: You don't need to modify your task yaml to use the HAMi scheduler. All your GPU jobs will be automatically supported after installation.
Prerequisites
- Enflame gcushare-device-plugin >= 2.1.6 (please consult your device provider, gcushare has two components: gcushare-scheduler-plugin and gcushare-device-plugin, only gcushare-device-plugin is needed here )
- driver version >= 1.2.3.14
- kubernetes >= 1.24
- enflame-container-toolkit >=2.0.50
Enabling GCU-sharing Support
- Deploy gcushare-device-plugin on enflame nodes (Please consult your device provider to acquire its package and document)
Install only gpushare-device-plugin, don't install gpu-scheduler-plugin package.
The default resource names are:
enflame.com/vgcufor GCU count, only support 1 now.enflame.com/vgcu-percentagefor the percentage of memory and cores in a gcu slice.
You can customize these names by modifying hami-scheduler-device configMap above.
- Set 'devices.enflame.enabled=true' when deploy HAMi
helm install hami hami-charts/hami --set devices.enflame.enabled=true -n kube-system
Device Granularity
HAMi divides each Enflame GCU into 100 units for resource allocation. When you request a portion of a GPU, you're actually requesting a certain number of these units.
GCU Slice Allocation
- Each unit of
enflame.com/vgcu-percentagerepresents 1% device memory and 1% core - If you don't specify a memory request, the system will default to using 100% of the available memory
- Memory allocation is enforced with hard limits to ensure tasks don't exceed their allocated memory
- Core allocation is enforced with hard limits to ensure tasks don't exceed their allocated cores
Running Enflame jobs
Enflame GCUs can now be requested by a container
using the enflame.com/vgcu and enflame.com/vgcu-percentage resource type:
apiVersion: v1
kind: Pod
metadata:
name: gcushare-pod-2
namespace: kube-system
spec:
terminationGracePeriodSeconds: 0
containers:
- name: pod-gcu-example1
image: ubuntu:18.04
imagePullPolicy: IfNotPresent
command:
- sleep
args:
- '100000'
resources:
limits:
enflame.com/vgcu: 1
enflame.com/vgcu-percentage: 22
You can find more examples in examples folder