nvitop.select module

CUDA visible devices selection tool.

Command line usage:

# All devices but sorted
nvisel       # or use `python3 -m nvitop.select`

# A simple example to select 4 devices
nvisel -n 4  # or use `python3 -m nvitop.select -n 4`

# Select available devices that satisfy the given constraints
nvisel --min-count 2 --max-count 3 --min-free-memory 5GiB --max-gpu-utilization 60

# Set `CUDA_VISIBLE_DEVICES` environment variable using `nvisel`
export CUDA_DEVICE_ORDER="PCI_BUS_ID" CUDA_VISIBLE_DEVICES="$(nvisel -c 1 -f 10GiB)"

# Use UUID strings in `CUDA_VISIBLE_DEVICES` environment variable
export CUDA_VISIBLE_DEVICES="$(nvisel -O uuid -c 2 -f 5000M)"

# Pipe output to other shell utilities
nvisel -0 -O uuid -c 2 -f 4GiB | xargs -0 -I {} nvidia-smi --id={} --query-gpu=index,memory.free --format=csv

# Normalize the `CUDA_VISIBLE_DEVICES` environment variable (e.g. convert UUIDs to indices or get full UUIDs for an abbreviated form)
nvisel -i -S

Python API:

# Put this at the top of the Python script
import os
from nvitop import select_devices

os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(
    select_devices(format='uuid', min_count=4, min_free_memory='8GiB')
)
nvitop.select.select_devices(devices: Iterable[Device] | None, *, format: Literal['index'], force_index: bool, min_count: int, max_count: int | None, min_free_memory: int | str | None, min_total_memory: int | str | None, max_gpu_utilization: int | None, max_memory_utilization: int | None, tolerance: int, free_accounts: list[str] | None, sort: bool, **kwargs: Any) list[int] | list[tuple[int, int]][source]
nvitop.select.select_devices(devices: Iterable[Device] | None, *, format: Literal['uuid'], force_index: bool, min_count: int, max_count: int | None, min_free_memory: int | str | None, min_total_memory: int | str | None, max_gpu_utilization: int | None, max_memory_utilization: int | None, tolerance: int, free_accounts: list[str] | None, sort: bool, **kwargs: Any) list[int] | list[tuple[int, int]]
nvitop.select.select_devices(devices: Iterable[Device] | None, *, format: Literal['device'], force_index: bool, min_count: int, max_count: int | None, min_free_memory: int | str | None, min_total_memory: int | str | None, max_gpu_utilization: int | None, max_memory_utilization: int | None, tolerance: int, free_accounts: list[str] | None, sort: bool, **kwargs: Any) list[Device]

Select a subset of devices satisfying the specified criteria.

Note

The min count constraint may not be satisfied if the no enough devices are available. This constraint is only enforced when there are both MIG and non-MIG devices present.

Examples

Put the following lines to the top of your script:

import os
from nvitop import select_devices

os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(
    select_devices(format='uuid', min_count=4, min_free_memory='8GiB')
)
Parameters:
  • devices (Iterable[Device]) – The device superset to select from. If not specified, use all devices as the superset.

  • format (str) – The format of the output. One of 'index', 'uuid', or 'device'. If gets any MIG device with format 'index' set, falls back to the 'uuid' format.

  • force_index (bool) – If True, always use the device index as the output format when gets any MIG device.

  • min_count (int) – The minimum number of devices to select.

  • max_count (Optional[int]) – The maximum number of devices to select.

  • min_free_memory (Optional[Union[int, str]]) – The minimum free memory (an int in bytes or a str in human readable form) of the selected devices.

  • min_total_memory (Optional[Union[int, str]]) – The minimum total memory (an int in bytes or a str in human readable form) of the selected devices.

  • max_gpu_utilization (Optional[int]) – The maximum GPU utilization rate (in percentage) of the selected devices.

  • max_memory_utilization (Optional[int]) – The maximum memory bandwidth utilization rate (in percentage) of the selected devices.

  • tolerance (int) – The tolerance rate (in percentage) to loose the constraints.

  • free_accounts (List[str]) – A list of accounts whose used GPU memory needs be considered as free memory.

  • sort (bool) – If True, sort the selected devices by memory usage and GPU utilization.

Returns:

A list of the device identifiers.