public class NvidiaGPUPluginForRuntimeV2 extends Object implements DevicePlugin, DevicePluginScheduler
| Modifier and Type | Class and Description |
|---|---|
static class |
NvidiaGPUPluginForRuntimeV2.DeviceLinkType
Different type of link.
|
class |
NvidiaGPUPluginForRuntimeV2.NvidiaCommandExecutor
A shell wrapper class easy for test.
|
| Modifier and Type | Field and Description |
|---|---|
static org.slf4j.Logger |
LOG |
static String |
NV_RESOURCE_NAME |
static String |
TOPOLOGY_POLICY_ENV_KEY
The container can set this environment variable.
|
static String |
TOPOLOGY_POLICY_PACK
Schedule policy that prefer the faster GPU-GPU communication.
|
static String |
TOPOLOGY_POLICY_SPREAD
Schedule policy that prefer the faster CPU-GPU communication.
|
| Constructor and Description |
|---|
NvidiaGPUPluginForRuntimeV2() |
| Modifier and Type | Method and Description |
|---|---|
Set<Device> |
allocateDevices(Set<Device> availableDevices,
int count,
Map<String,String> envs)
Called when allocating devices.
|
void |
basicSchedule(Set<Device> allocation,
int count,
Set<Device> availableDevices) |
int |
computeCostOfDevices(Device[] devices)
The cost function used to calculate costs of a sub set of devices.
|
Map<Integer,List<Map.Entry<Set<Device>,Integer>>> |
getCostTable() |
Map<String,Integer> |
getDevicePairToWeight() |
Set<Device> |
getDevices()
Called when update node resource.
|
DeviceRegisterRequest |
getRegisterRequestInfo()
Called first when device plugin framework wants to register.
|
void |
initCostTable() |
boolean |
isTopoInitialized() |
DeviceRuntimeSpec |
onDevicesAllocated(Set<Device> allocatedDevices,
YarnRuntimeType yarnRuntime)
Asking how these devices should be prepared/used
before/when container launch.
|
void |
onDevicesReleased(Set<Device> releasedDevices)
Called after device released.
|
void |
parseTopo(String topo,
Map<String,Integer> deviceLinkToWeight)
A typical sample topo output:
GPU0 GPU1 GPU2 GPU3 CPU Affinity
GPU0 X PHB SOC SOC 0-31
GPU1 PHB X SOC SOC 0-31
GPU2 SOC SOC X PHB 0-31
GPU3 SOC SOC PHB X 0-31
Legend:
X = Self
SOC = Connection traversing PCIe as well as the SMP link between
CPU sockets(e.g.
|
void |
setPathOfGpuBinary(String pOfGpuBinary) |
void |
setShellExecutor(NvidiaGPUPluginForRuntimeV2.NvidiaCommandExecutor shellExecutor) |
void |
topologyAwareSchedule(Set<Device> allocation,
int count,
Map<String,String> envs,
Set<Device> availableDevices,
Map<Integer,List<Map.Entry<Set<Device>,Integer>>> cTable)
Topology Aware schedule algorithm.
|
public static final org.slf4j.Logger LOG
public static final String NV_RESOURCE_NAME
public static final String TOPOLOGY_POLICY_ENV_KEY
public static final String TOPOLOGY_POLICY_PACK
public static final String TOPOLOGY_POLICY_SPREAD
public DeviceRegisterRequest getRegisterRequestInfo() throws Exception
DevicePlugingetRegisterRequestInfo in interface DevicePluginDeviceRegisterRequestExceptionpublic Set<Device> getDevices() throws Exception
DevicePlugingetDevices in interface DevicePluginDevice, TreeSet recommendedExceptionpublic DeviceRuntimeSpec onDevicesAllocated(Set<Device> allocatedDevices, YarnRuntimeType yarnRuntime) throws Exception
DevicePluginVolumeSpec to let the
framework to create volume before running container.onDevicesAllocated in interface DevicePluginallocatedDevices - A set of allocated Device.yarnRuntime - Indicate which runtime YARN will use
Could be RUNTIME_DEFAULT or RUNTIME_DOCKER
in DeviceRuntimeSpec constants. The default means YARN's
non-docker container runtime is used. The docker means YARN's
docker container runtime is used.DeviceRuntimeSpec description about environment,
VolumeSpec, MountVolumeSpec. etcExceptionpublic void onDevicesReleased(Set<Device> releasedDevices) throws Exception
DevicePluginonDevicesReleased in interface DevicePluginreleasedDevices - A set of released devicesExceptionpublic Set<Device> allocateDevices(Set<Device> availableDevices, int count, Map<String,String> envs)
DevicePluginSchedulerallocateDevices in interface DevicePluginScheduleravailableDevices - Devices allowed to be chosen from.count - Number of device to be allocated.envs - Environment variables of the container.Device allocatedpublic void initCostTable()
throws IOException
IOExceptionpublic int computeCostOfDevices(Device[] devices)
public void topologyAwareSchedule(Set<Device> allocation, int count, Map<String,String> envs, Set<Device> availableDevices, Map<Integer,List<Map.Entry<Set<Device>,Integer>>> cTable)
public void basicSchedule(Set<Device> allocation, int count, Set<Device> availableDevices)
public void parseTopo(String topo, Map<String,Integer> deviceLinkToWeight)
public void setPathOfGpuBinary(String pOfGpuBinary)
public void setShellExecutor(NvidiaGPUPluginForRuntimeV2.NvidiaCommandExecutor shellExecutor)
public boolean isTopoInitialized()
Copyright © 2008–2023 Apache Software Foundation. All rights reserved.