Welcome to nvitop’s documentation!
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

The CLI from nvitop
.
Installation
It is highly recommended to install nvitop in an isolated virtual environment. Simple installation and run via pipx:
pipx run nvitop
pip3 install --upgrade nvitop
Note
Python 3.7+ is required, and Python versions lower than 3.7 is not supported.
conda install -c conda-forge nvitop
Install the latest version from GitHub ():
pip3 install --upgrade pip setuptools
pip3 install git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop
Or, clone this repo and install manually:
git clone --depth=1 https://github.com/XuehaiPan/nvitop.git
cd nvitop
pip3 install .
If this repo is useful to you, please star ⭐️ it to let more people know 🤗.
Quick Start
A minimal script to monitor the GPU devices based on APIs from nvitop
:
from nvitop import Device
devices = Device.all() # or Device.cuda.all()
for device in devices:
processes = device.processes() # type: Dict[int, GpuProcess]
sorted_pids = sorted(processes)
print(device)
print(f' - Fan speed: {device.fan_speed()}%')
print(f' - Temperature: {device.temperature()}C')
print(f' - GPU utilization: {device.gpu_utilization()}%')
print(f' - Total memory: {device.memory_total_human()}')
print(f' - Used memory: {device.memory_used_human()}')
print(f' - Free memory: {device.memory_free_human()}')
print(f' - Processes ({len(processes)}): {sorted_pids}')
for pid in sorted_pids:
print(f' - {processes[pid]}')
print('-' * 120)
Another more advanced approach with coloring:
import time
from nvitop import Device, GpuProcess, NA, colored
print(colored(time.strftime('%a %b %d %H:%M:%S %Y'), color='red', attrs=('bold',)))
devices = Device.cuda.all() # or `Device.all()` to use NVML ordinal instead
separator = False
for device in devices:
processes = device.processes() # type: Dict[int, GpuProcess]
print(colored(str(device), color='green', attrs=('bold',)))
print(colored(' - Fan speed: ', color='blue', attrs=('bold',)) + f'{device.fan_speed()}%')
print(colored(' - Temperature: ', color='blue', attrs=('bold',)) + f'{device.temperature()}C')
print(colored(' - GPU utilization: ', color='blue', attrs=('bold',)) + f'{device.gpu_utilization()}%')
print(colored(' - Total memory: ', color='blue', attrs=('bold',)) + f'{device.memory_total_human()}')
print(colored(' - Used memory: ', color='blue', attrs=('bold',)) + f'{device.memory_used_human()}')
print(colored(' - Free memory: ', color='blue', attrs=('bold',)) + f'{device.memory_free_human()}')
if len(processes) > 0:
processes = GpuProcess.take_snapshots(processes.values(), failsafe=True)
processes.sort(key=lambda process: (process.username, process.pid))
print(colored(f' - Processes ({len(processes)}):', color='blue', attrs=('bold',)))
fmt = ' {pid:<5} {username:<8} {cpu:>5} {host_memory:>8} {time:>8} {gpu_memory:>8} {sm:>3} {command:<}'.format
print(colored(fmt(pid='PID', username='USERNAME',
cpu='CPU%', host_memory='HOST-MEM', time='TIME',
gpu_memory='GPU-MEM', sm='SM%',
command='COMMAND'),
attrs=('bold',)))
for snapshot in processes:
print(fmt(pid=snapshot.pid,
username=snapshot.username[:7] + ('+' if len(snapshot.username) > 8 else snapshot.username[7:8]),
cpu=snapshot.cpu_percent, host_memory=snapshot.host_memory_human,
time=snapshot.running_time_human,
gpu_memory=(snapshot.gpu_memory_human if snapshot.gpu_memory_human is not NA else 'WDDM:N/A'),
sm=snapshot.gpu_sm_utilization,
command=snapshot.command))
else:
print(colored(' - No Running Processes', attrs=('bold',)))
if separator:
print('-' * 120)
separator = True

An example monitoring script built with APIs from nvitop
.
Please refer to section More than a Monitor in README for more examples.
API Reference
- nvitop.device module
Device
Device.UUID_PATTERN
Device.GPU_PROCESS_CLASS
Device.cuda
Device.is_available()
Device.driver_version()
Device.cuda_driver_version()
Device.max_cuda_version()
Device.cuda_runtime_version()
Device.cudart_version()
Device.count()
Device.all()
Device.from_indices()
Device.from_cuda_visible_devices()
Device.from_cuda_indices()
Device.parse_cuda_visible_devices()
Device.normalize_cuda_visible_devices()
Device.__new__()
Device.__init__()
Device.__repr__()
Device.__eq__()
Device.__hash__()
Device.__getattr__()
Device.__reduce__()
Device.index
Device.nvml_index
Device.physical_index
Device.handle
Device.cuda_index
Device.name()
Device.uuid()
Device.bus_id()
Device.serial()
Device.memory_info()
Device.memory_total()
Device.memory_used()
Device.memory_free()
Device.memory_total_human()
Device.memory_used_human()
Device.memory_free_human()
Device.memory_percent()
Device.memory_usage()
Device.bar1_memory_info()
Device.bar1_memory_total()
Device.bar1_memory_used()
Device.bar1_memory_free()
Device.bar1_memory_total_human()
Device.bar1_memory_used_human()
Device.bar1_memory_free_human()
Device.bar1_memory_percent()
Device.bar1_memory_usage()
Device.utilization_rates()
Device.gpu_utilization()
Device.gpu_percent()
Device.memory_utilization()
Device.encoder_utilization()
Device.decoder_utilization()
Device.clock_infos()
Device.clocks()
Device.max_clock_infos()
Device.max_clocks()
Device.clock_speed_infos()
Device.graphics_clock()
Device.sm_clock()
Device.memory_clock()
Device.video_clock()
Device.max_graphics_clock()
Device.max_sm_clock()
Device.max_memory_clock()
Device.max_video_clock()
Device.fan_speed()
Device.temperature()
Device.power_usage()
Device.power_draw()
Device.power_limit()
Device.power_status()
Device.pcie_throughput()
Device.pcie_tx_throughput()
Device.pcie_rx_throughput()
Device.pcie_tx_throughput_human()
Device.pcie_rx_throughput_human()
Device.nvlink_link_count()
Device.nvlink_throughput()
Device.nvlink_mean_throughput()
Device.nvlink_tx_throughput()
Device.nvlink_mean_tx_throughput()
Device.nvlink_rx_throughput()
Device.nvlink_mean_rx_throughput()
Device.nvlink_tx_throughput_human()
Device.nvlink_mean_tx_throughput_human()
Device.nvlink_rx_throughput_human()
Device.nvlink_mean_rx_throughput_human()
Device.display_active()
Device.display_mode()
Device.current_driver_model()
Device.driver_model()
Device.persistence_mode()
Device.performance_state()
Device.total_volatile_uncorrected_ecc_errors()
Device.compute_mode()
Device.cuda_compute_capability()
Device.is_mig_device()
Device.mig_mode()
Device.is_mig_mode_enabled()
Device.max_mig_device_count()
Device.mig_devices()
Device.is_leaf_device()
Device.to_leaf_devices()
Device.processes()
Device.as_snapshot()
Device.SNAPSHOT_KEYS
Device.oneshot()
PhysicalDevice
MigDevice
CudaDevice
CudaMigDevice
parse_cuda_visible_devices()
normalize_cuda_visible_devices()
- nvitop.process module
HostProcess
HostProcess.INSTANCE_LOCK
HostProcess.INSTANCES
HostProcess.__new__()
HostProcess.__init__()
HostProcess.__repr__()
HostProcess.__reduce__()
HostProcess.username()
HostProcess.cmdline()
HostProcess.command()
HostProcess.running_time()
HostProcess.running_time_human()
HostProcess.running_time_in_seconds()
HostProcess.elapsed_time()
HostProcess.elapsed_time_human()
HostProcess.elapsed_time_in_seconds()
HostProcess.rss_memory()
HostProcess.parent()
HostProcess.children()
HostProcess.oneshot()
HostProcess.as_snapshot()
HostProcess.as_dict()
HostProcess.connections()
HostProcess.cpu_affinity()
HostProcess.cpu_num()
HostProcess.cpu_percent()
HostProcess.cpu_times()
HostProcess.create_time()
HostProcess.cwd()
HostProcess.environ()
HostProcess.exe()
HostProcess.gids()
HostProcess.io_counters()
HostProcess.ionice()
HostProcess.is_running()
HostProcess.kill()
HostProcess.memory_full_info()
HostProcess.memory_info()
HostProcess.memory_info_ex()
HostProcess.memory_maps()
HostProcess.memory_percent()
HostProcess.name()
HostProcess.nice()
HostProcess.num_ctx_switches()
HostProcess.num_fds()
HostProcess.num_threads()
HostProcess.open_files()
HostProcess.parents()
HostProcess.pid
HostProcess.ppid()
HostProcess.resume()
HostProcess.rlimit()
HostProcess.send_signal()
HostProcess.status()
HostProcess.suspend()
HostProcess.terminal()
HostProcess.terminate()
HostProcess.threads()
HostProcess.uids()
HostProcess.wait()
GpuProcess
GpuProcess.INSTANCE_LOCK
GpuProcess.INSTANCES
GpuProcess.__new__()
GpuProcess.__init__()
GpuProcess.__repr__()
GpuProcess.__eq__()
GpuProcess.__hash__()
GpuProcess.__getattr__()
GpuProcess.pid
GpuProcess.host
GpuProcess.device
GpuProcess.gpu_instance_id()
GpuProcess.compute_instance_id()
GpuProcess.gpu_memory()
GpuProcess.gpu_memory_human()
GpuProcess.gpu_memory_percent()
GpuProcess.gpu_sm_utilization()
GpuProcess.gpu_memory_utilization()
GpuProcess.gpu_encoder_utilization()
GpuProcess.gpu_decoder_utilization()
GpuProcess.set_gpu_memory()
GpuProcess.set_gpu_utilization()
GpuProcess.update_gpu_status()
GpuProcess.type
GpuProcess.is_running()
GpuProcess.status()
GpuProcess.create_time()
GpuProcess.running_time()
GpuProcess.running_time_human()
GpuProcess.running_time_in_seconds()
GpuProcess.elapsed_time()
GpuProcess.elapsed_time_human()
GpuProcess.elapsed_time_in_seconds()
GpuProcess.username()
GpuProcess.name()
GpuProcess.cpu_percent()
GpuProcess.memory_percent()
GpuProcess.host_memory_percent()
GpuProcess.host_memory()
GpuProcess.host_memory_human()
GpuProcess.rss_memory()
GpuProcess.cmdline()
GpuProcess.command()
GpuProcess.host_snapshot()
GpuProcess.as_snapshot()
GpuProcess.take_snapshots()
GpuProcess.failsafe()
command_join()
- nvitop.host module
PsutilError
NoSuchProcess
ZombieProcess
AccessDenied
TimeoutExpired
Process
Process.pid
Process.oneshot()
Process.as_dict()
Process.parent()
Process.parents()
Process.is_running()
Process.ppid()
Process.name()
Process.exe()
Process.cmdline()
Process.status()
Process.username()
Process.create_time()
Process.cwd()
Process.nice()
Process.uids()
Process.gids()
Process.terminal()
Process.num_fds()
Process.io_counters()
Process.ionice()
Process.rlimit()
Process.cpu_affinity()
Process.cpu_num()
Process.environ()
Process.num_ctx_switches()
Process.num_threads()
Process.threads()
Process.children()
Process.cpu_percent()
Process.cpu_times()
Process.memory_info()
Process.memory_info_ex()
Process.memory_full_info()
Process.memory_percent()
Process.memory_maps()
Process.open_files()
Process.connections()
Process.send_signal()
Process.suspend()
Process.resume()
Process.terminate()
Process.kill()
Process.wait()
Popen
pid_exists()
pids()
process_iter()
wait_procs()
virtual_memory()
swap_memory()
cpu_times()
cpu_percent()
cpu_times_percent()
cpu_count()
cpu_stats()
net_io_counters()
net_connections()
net_if_addrs()
net_if_stats()
disk_io_counters()
disk_partitions()
disk_usage()
users()
boot_time()
cpu_freq()
getloadavg()
sensors_temperatures()
sensors_fans()
sensors_battery()
load_average()
uptime()
memory_percent()
swap_percent()
ppid_map()
reverse_ppid_map()
WINDOWS_SUBSYSTEM_FOR_LINUX
- nvitop.collector module
take_snapshots()
collect_in_background()
ResourceMetricCollector
ResourceMetricCollector.DEVICE_METRICS
ResourceMetricCollector.PROCESS_METRICS
ResourceMetricCollector.__init__()
ResourceMetricCollector.activate()
ResourceMetricCollector.start()
ResourceMetricCollector.deactivate()
ResourceMetricCollector.stop()
ResourceMetricCollector.context()
ResourceMetricCollector.__call__()
ResourceMetricCollector.clear()
ResourceMetricCollector.collect()
ResourceMetricCollector.daemonize()
ResourceMetricCollector.__del__()
ResourceMetricCollector.take_snapshots()
- nvitop.libnvml module
- Constants
NVML_ERROR_UNINITIALIZED
NVML_ERROR_INVALID_ARGUMENT
NVML_ERROR_NOT_SUPPORTED
NVML_ERROR_NO_PERMISSION
NVML_ERROR_ALREADY_INITIALIZED
NVML_ERROR_NOT_FOUND
NVML_ERROR_INSUFFICIENT_SIZE
NVML_ERROR_INSUFFICIENT_POWER
NVML_ERROR_DRIVER_NOT_LOADED
NVML_ERROR_TIMEOUT
NVML_ERROR_IRQ_ISSUE
NVML_ERROR_LIBRARY_NOT_FOUND
NVML_ERROR_FUNCTION_NOT_FOUND
NVML_ERROR_CORRUPTED_INFOROM
NVML_ERROR_GPU_IS_LOST
NVML_ERROR_RESET_REQUIRED
NVML_ERROR_OPERATING_SYSTEM
NVML_ERROR_LIB_RM_VERSION_MISMATCH
NVML_ERROR_IN_USE
NVML_ERROR_MEMORY
NVML_ERROR_NO_DATA
NVML_ERROR_VGPU_ECC_NOT_SUPPORTED
NVML_ERROR_INSUFFICIENT_RESOURCES
NVML_ERROR_FREQ_NOT_SUPPORTED
NVML_ERROR_ARGUMENT_VERSION_MISMATCH
NVML_ERROR_DEPRECATED
NVML_ERROR_NOT_READY
NVML_ERROR_UNKNOWN
NVML_FEATURE_DISABLED
NVML_FEATURE_ENABLED
NVML_BRAND_UNKNOWN
NVML_BRAND_QUADRO
NVML_BRAND_TESLA
NVML_BRAND_NVS
NVML_BRAND_GRID
NVML_BRAND_GEFORCE
NVML_BRAND_TITAN
NVML_BRAND_NVIDIA_VAPPS
NVML_BRAND_NVIDIA_VPC
NVML_BRAND_NVIDIA_VCS
NVML_BRAND_NVIDIA_VWS
NVML_BRAND_NVIDIA_CLOUD_GAMING
NVML_BRAND_NVIDIA_VGAMING
NVML_BRAND_QUADRO_RTX
NVML_BRAND_NVIDIA_RTX
NVML_BRAND_NVIDIA
NVML_BRAND_GEFORCE_RTX
NVML_BRAND_TITAN_RTX
NVML_BRAND_COUNT
NVML_TEMPERATURE_THRESHOLD_SHUTDOWN
NVML_TEMPERATURE_THRESHOLD_SLOWDOWN
NVML_TEMPERATURE_THRESHOLD_MEM_MAX
NVML_TEMPERATURE_THRESHOLD_GPU_MAX
NVML_TEMPERATURE_THRESHOLD_ACOUSTIC_MIN
NVML_TEMPERATURE_THRESHOLD_ACOUSTIC_CURR
NVML_TEMPERATURE_THRESHOLD_ACOUSTIC_MAX
NVML_TEMPERATURE_THRESHOLD_COUNT
NVML_TEMPERATURE_GPU
NVML_TEMPERATURE_COUNT
NVML_COMPUTEMODE_DEFAULT
NVML_COMPUTEMODE_EXCLUSIVE_THREAD
NVML_COMPUTEMODE_PROHIBITED
NVML_COMPUTEMODE_EXCLUSIVE_PROCESS
NVML_COMPUTEMODE_COUNT
NVML_MEMORY_LOCATION_L1_CACHE
NVML_MEMORY_LOCATION_L2_CACHE
NVML_MEMORY_LOCATION_DEVICE_MEMORY
NVML_MEMORY_LOCATION_DRAM
NVML_MEMORY_LOCATION_REGISTER_FILE
NVML_MEMORY_LOCATION_TEXTURE_MEMORY
NVML_MEMORY_LOCATION_TEXTURE_SHM
NVML_MEMORY_LOCATION_CBU
NVML_MEMORY_LOCATION_SRAM
NVML_MEMORY_LOCATION_COUNT
NVML_NVLINK_MAX_LINKS
NVML_NVLINK_MAX_LANES
NVML_NVLINK_ERROR_DL_REPLAY
NVML_NVLINK_ERROR_DL_RECOVERY
NVML_NVLINK_ERROR_DL_CRC_FLIT
NVML_NVLINK_ERROR_DL_CRC_DATA
NVML_NVLINK_ERROR_DL_ECC_DATA
NVML_NVLINK_ERROR_COUNT
NVML_NVLINK_ERROR_DL_ECC_LANE0
NVML_NVLINK_ERROR_DL_ECC_LANE1
NVML_NVLINK_ERROR_DL_ECC_LANE2
NVML_NVLINK_ERROR_DL_ECC_LANE3
NVML_NVLINK_ERROR_DL_ECC_COUNT
NVML_NVLINK_CAP_P2P_SUPPORTED
NVML_NVLINK_CAP_SYSMEM_ACCESS
NVML_NVLINK_CAP_P2P_ATOMICS
NVML_NVLINK_CAP_SYSMEM_ATOMICS
NVML_NVLINK_CAP_SLI_BRIDGE
NVML_NVLINK_CAP_VALID
NVML_NVLINK_CAP_COUNT
NVML_NVLINK_COUNTER_PKTFILTER_NOP
NVML_NVLINK_COUNTER_PKTFILTER_READ
NVML_NVLINK_COUNTER_PKTFILTER_WRITE
NVML_NVLINK_COUNTER_PKTFILTER_RATOM
NVML_NVLINK_COUNTER_PKTFILTER_NRATOM
NVML_NVLINK_COUNTER_PKTFILTER_FLUSH
NVML_NVLINK_COUNTER_PKTFILTER_RESPDATA
NVML_NVLINK_COUNTER_PKTFILTER_RESPNODATA
NVML_NVLINK_COUNTER_PKTFILTER_ALL
NVML_NVLINK_COUNTER_UNIT_CYCLES
NVML_NVLINK_COUNTER_UNIT_PACKETS
NVML_NVLINK_COUNTER_UNIT_BYTES
NVML_NVLINK_COUNTER_UNIT_RESERVED
NVML_NVLINK_COUNTER_UNIT_COUNT
NVML_NVLINK_DEVICE_TYPE_GPU
NVML_NVLINK_DEVICE_TYPE_IBMNPU
NVML_NVLINK_DEVICE_TYPE_SWITCH
NVML_NVLINK_DEVICE_TYPE_UNKNOWN
NVML_SINGLE_BIT_ECC
NVML_DOUBLE_BIT_ECC
NVML_ECC_ERROR_TYPE_COUNT
NVML_VOLATILE_ECC
NVML_AGGREGATE_ECC
NVML_ECC_COUNTER_TYPE_COUNT
NVML_MEMORY_ERROR_TYPE_CORRECTED
NVML_MEMORY_ERROR_TYPE_UNCORRECTED
NVML_MEMORY_ERROR_TYPE_COUNT
NVML_CLOCK_GRAPHICS
NVML_CLOCK_SM
NVML_CLOCK_MEM
NVML_CLOCK_VIDEO
NVML_CLOCK_COUNT
NVML_CLOCK_ID_CURRENT
NVML_CLOCK_ID_APP_CLOCK_TARGET
NVML_CLOCK_ID_APP_CLOCK_DEFAULT
NVML_CLOCK_ID_CUSTOMER_BOOST_MAX
NVML_CLOCK_ID_COUNT
NVML_DRIVER_WDDM
NVML_DRIVER_WDM
NVML_DRIVER_MCDM
NVML_MAX_GPU_PERF_PSTATES
NVML_PSTATE_0
NVML_PSTATE_1
NVML_PSTATE_2
NVML_PSTATE_3
NVML_PSTATE_4
NVML_PSTATE_5
NVML_PSTATE_6
NVML_PSTATE_7
NVML_PSTATE_8
NVML_PSTATE_9
NVML_PSTATE_10
NVML_PSTATE_11
NVML_PSTATE_12
NVML_PSTATE_13
NVML_PSTATE_14
NVML_PSTATE_15
NVML_PSTATE_UNKNOWN
NVML_INFOROM_OEM
NVML_INFOROM_ECC
NVML_INFOROM_POWER
NVML_INFOROM_COUNT
NVML_SUCCESS
NVML_FAN_NORMAL
NVML_FAN_FAILED
NVML_FAN_POLICY_TEMPERATURE_CONTINOUS_SW
NVML_FAN_POLICY_MANUAL
NVML_LED_COLOR_GREEN
NVML_LED_COLOR_AMBER
NVML_GOM_ALL_ON
NVML_GOM_COMPUTE
NVML_GOM_LOW_DP
NVML_PAGE_RETIREMENT_CAUSE_MULTIPLE_SINGLE_BIT_ECC_ERRORS
NVML_PAGE_RETIREMENT_CAUSE_DOUBLE_BIT_ECC_ERROR
NVML_PAGE_RETIREMENT_CAUSE_COUNT
NVML_RESTRICTED_API_SET_APPLICATION_CLOCKS
NVML_RESTRICTED_API_SET_AUTO_BOOSTED_CLOCKS
NVML_RESTRICTED_API_COUNT
NVML_BRIDGE_CHIP_PLX
NVML_BRIDGE_CHIP_BRO4
NVML_MAX_PHYSICAL_BRIDGE
NVML_VALUE_TYPE_DOUBLE
NVML_VALUE_TYPE_UNSIGNED_INT
NVML_VALUE_TYPE_UNSIGNED_LONG
NVML_VALUE_TYPE_UNSIGNED_LONG_LONG
NVML_VALUE_TYPE_SIGNED_LONG_LONG
NVML_VALUE_TYPE_SIGNED_INT
NVML_VALUE_TYPE_COUNT
NVML_PERF_POLICY_POWER
NVML_PERF_POLICY_THERMAL
NVML_PERF_POLICY_SYNC_BOOST
NVML_PERF_POLICY_BOARD_LIMIT
NVML_PERF_POLICY_LOW_UTILIZATION
NVML_PERF_POLICY_RELIABILITY
NVML_PERF_POLICY_TOTAL_APP_CLOCKS
NVML_PERF_POLICY_TOTAL_BASE_CLOCKS
NVML_PERF_POLICY_COUNT
NVML_ENCODER_QUERY_H264
NVML_ENCODER_QUERY_HEVC
NVML_ENCODER_QUERY_AV1
NVML_ENCODER_QUERY_UNKNOWN
NVML_FBC_SESSION_TYPE_UNKNOWN
NVML_FBC_SESSION_TYPE_TOSYS
NVML_FBC_SESSION_TYPE_CUDA
NVML_FBC_SESSION_TYPE_VID
NVML_FBC_SESSION_TYPE_HWENC
NVML_DETACH_GPU_KEEP
NVML_DETACH_GPU_REMOVE
NVML_PCIE_LINK_KEEP
NVML_PCIE_LINK_SHUT_DOWN
NVML_TOTAL_POWER_SAMPLES
NVML_GPU_UTILIZATION_SAMPLES
NVML_MEMORY_UTILIZATION_SAMPLES
NVML_ENC_UTILIZATION_SAMPLES
NVML_DEC_UTILIZATION_SAMPLES
NVML_PROCESSOR_CLK_SAMPLES
NVML_MEMORY_CLK_SAMPLES
NVML_MODULE_POWER_SAMPLES
NVML_SAMPLINGTYPE_COUNT
NVML_PCIE_UTIL_TX_BYTES
NVML_PCIE_UTIL_RX_BYTES
NVML_PCIE_UTIL_COUNT
NVML_TOPOLOGY_INTERNAL
NVML_TOPOLOGY_SINGLE
NVML_TOPOLOGY_MULTIPLE
NVML_TOPOLOGY_HOSTBRIDGE
NVML_TOPOLOGY_NODE
NVML_TOPOLOGY_CPU
NVML_TOPOLOGY_SYSTEM
NVML_P2P_CAPS_INDEX_READ
NVML_P2P_CAPS_INDEX_WRITE
NVML_P2P_CAPS_INDEX_NVLINK
NVML_P2P_CAPS_INDEX_ATOMICS
NVML_P2P_CAPS_INDEX_PROP
NVML_P2P_CAPS_INDEX_LOOPBACK
NVML_P2P_CAPS_INDEX_UNKNOWN
NVML_P2P_STATUS_OK
NVML_P2P_STATUS_CHIPSET_NOT_SUPPORED
NVML_P2P_STATUS_CHIPSET_NOT_SUPPORTED
NVML_P2P_STATUS_GPU_NOT_SUPPORTED
NVML_P2P_STATUS_IOH_TOPOLOGY_NOT_SUPPORTED
NVML_P2P_STATUS_DISABLED_BY_REGKEY
NVML_P2P_STATUS_NOT_SUPPORTED
NVML_P2P_STATUS_UNKNOWN
NVML_DEVICE_ARCH_KEPLER
NVML_DEVICE_ARCH_MAXWELL
NVML_DEVICE_ARCH_PASCAL
NVML_DEVICE_ARCH_VOLTA
NVML_DEVICE_ARCH_TURING
NVML_DEVICE_ARCH_AMPERE
NVML_DEVICE_ARCH_ADA
NVML_DEVICE_ARCH_HOPPER
NVML_DEVICE_ARCH_UNKNOWN
NVML_BUS_TYPE_UNKNOWN
NVML_BUS_TYPE_PCI
NVML_BUS_TYPE_PCIE
NVML_BUS_TYPE_FPCI
NVML_BUS_TYPE_AGP
NVML_POWER_SOURCE_AC
NVML_POWER_SOURCE_BATTERY
NVML_POWER_SOURCE_UNDERSIZED
NVML_ADAPTIVE_CLOCKING_INFO_STATUS_DISABLED
NVML_ADAPTIVE_CLOCKING_INFO_STATUS_ENABLED
NVML_CLOCK_LIMIT_ID_RANGE_START
NVML_CLOCK_LIMIT_ID_TDP
NVML_CLOCK_LIMIT_ID_UNLIMITED
NVML_PCIE_LINK_MAX_SPEED_INVALID
NVML_PCIE_LINK_MAX_SPEED_2500MBPS
NVML_PCIE_LINK_MAX_SPEED_5000MBPS
NVML_PCIE_LINK_MAX_SPEED_8000MBPS
NVML_PCIE_LINK_MAX_SPEED_16000MBPS
NVML_PCIE_LINK_MAX_SPEED_32000MBPS
NVML_PCIE_LINK_MAX_SPEED_64000MBPS
NVML_AFFINITY_SCOPE_NODE
NVML_AFFINITY_SCOPE_SOCKET
NVML_INIT_FLAG_NO_GPUS
NVML_INIT_FLAG_NO_ATTACH
NVML_MAX_GPC_COUNT
NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE
NVML_DEVICE_UUID_BUFFER_SIZE
NVML_DEVICE_UUID_V2_BUFFER_SIZE
NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE
NVML_SYSTEM_NVML_VERSION_BUFFER_SIZE
NVML_DEVICE_NAME_BUFFER_SIZE
NVML_DEVICE_NAME_V2_BUFFER_SIZE
NVML_DEVICE_SERIAL_BUFFER_SIZE
NVML_DEVICE_PART_NUMBER_BUFFER_SIZE
NVML_DEVICE_GPU_PART_NUMBER_BUFFER_SIZE
NVML_DEVICE_VBIOS_VERSION_BUFFER_SIZE
NVML_DEVICE_PCI_BUS_ID_BUFFER_SIZE
NVML_DEVICE_PCI_BUS_ID_BUFFER_V2_SIZE
NVML_GRID_LICENSE_BUFFER_SIZE
NVML_VGPU_NAME_BUFFER_SIZE
NVML_GRID_LICENSE_FEATURE_MAX_COUNT
NVML_VGPU_METADATA_OPAQUE_DATA_SIZE
NVML_VGPU_PGPU_METADATA_OPAQUE_DATA_SIZE
NVML_DEVICE_GPU_FRU_PART_NUMBER_BUFFER_SIZE
NVML_DEVICE_PCI_BUS_ID_LEGACY_FMT
NVML_DEVICE_PCI_BUS_ID_FMT
NVML_VALUE_NOT_AVAILABLE_ulonglong
NVML_VALUE_NOT_AVAILABLE_uint
NVML_FI_DEV_ECC_CURRENT
NVML_FI_DEV_ECC_PENDING
NVML_FI_DEV_ECC_SBE_VOL_TOTAL
NVML_FI_DEV_ECC_DBE_VOL_TOTAL
NVML_FI_DEV_ECC_SBE_AGG_TOTAL
NVML_FI_DEV_ECC_DBE_AGG_TOTAL
NVML_FI_DEV_ECC_SBE_VOL_L1
NVML_FI_DEV_ECC_DBE_VOL_L1
NVML_FI_DEV_ECC_SBE_VOL_L2
NVML_FI_DEV_ECC_DBE_VOL_L2
NVML_FI_DEV_ECC_SBE_VOL_DEV
NVML_FI_DEV_ECC_DBE_VOL_DEV
NVML_FI_DEV_ECC_SBE_VOL_REG
NVML_FI_DEV_ECC_DBE_VOL_REG
NVML_FI_DEV_ECC_SBE_VOL_TEX
NVML_FI_DEV_ECC_DBE_VOL_TEX
NVML_FI_DEV_ECC_DBE_VOL_CBU
NVML_FI_DEV_ECC_SBE_AGG_L1
NVML_FI_DEV_ECC_DBE_AGG_L1
NVML_FI_DEV_ECC_SBE_AGG_L2
NVML_FI_DEV_ECC_DBE_AGG_L2
NVML_FI_DEV_ECC_SBE_AGG_DEV
NVML_FI_DEV_ECC_DBE_AGG_DEV
NVML_FI_DEV_ECC_SBE_AGG_REG
NVML_FI_DEV_ECC_DBE_AGG_REG
NVML_FI_DEV_ECC_SBE_AGG_TEX
NVML_FI_DEV_ECC_DBE_AGG_TEX
NVML_FI_DEV_ECC_DBE_AGG_CBU
NVML_FI_DEV_RETIRED_SBE
NVML_FI_DEV_RETIRED_DBE
NVML_FI_DEV_RETIRED_PENDING
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L0
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L1
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L2
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L3
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L4
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L5
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_TOTAL
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L0
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L1
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L2
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L3
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L4
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L5
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_TOTAL
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L0
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L1
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L2
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L3
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L4
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L5
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_TOTAL
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L0
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L1
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L2
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L3
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L4
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L5
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_TOTAL
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L0
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L1
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L2
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L3
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L4
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L5
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_TOTAL
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L0
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L1
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L2
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L3
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L4
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L5
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_TOTAL
NVML_FI_DEV_PERF_POLICY_POWER
NVML_FI_DEV_PERF_POLICY_THERMAL
NVML_FI_DEV_PERF_POLICY_SYNC_BOOST
NVML_FI_DEV_PERF_POLICY_BOARD_LIMIT
NVML_FI_DEV_PERF_POLICY_LOW_UTILIZATION
NVML_FI_DEV_PERF_POLICY_RELIABILITY
NVML_FI_DEV_PERF_POLICY_TOTAL_APP_CLOCKS
NVML_FI_DEV_PERF_POLICY_TOTAL_BASE_CLOCKS
NVML_FI_DEV_MEMORY_TEMP
NVML_FI_DEV_TOTAL_ENERGY_CONSUMPTION
NVML_FI_DEV_NVLINK_SPEED_MBPS_L0
NVML_FI_DEV_NVLINK_SPEED_MBPS_L1
NVML_FI_DEV_NVLINK_SPEED_MBPS_L2
NVML_FI_DEV_NVLINK_SPEED_MBPS_L3
NVML_FI_DEV_NVLINK_SPEED_MBPS_L4
NVML_FI_DEV_NVLINK_SPEED_MBPS_L5
NVML_FI_DEV_NVLINK_SPEED_MBPS_COMMON
NVML_FI_DEV_NVLINK_LINK_COUNT
NVML_FI_DEV_RETIRED_PENDING_SBE
NVML_FI_DEV_RETIRED_PENDING_DBE
NVML_FI_DEV_PCIE_REPLAY_COUNTER
NVML_FI_DEV_PCIE_REPLAY_ROLLOVER_COUNTER
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L6
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L7
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L8
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L9
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L10
NVML_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_L11
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L6
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L7
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L8
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L9
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L10
NVML_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_L11
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L6
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L7
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L8
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L9
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L10
NVML_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_L11
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L6
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L7
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L8
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L9
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L10
NVML_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_L11
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L6
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L7
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L8
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L9
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L10
NVML_FI_DEV_NVLINK_BANDWIDTH_C0_L11
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L6
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L7
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L8
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L9
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L10
NVML_FI_DEV_NVLINK_BANDWIDTH_C1_L11
NVML_FI_DEV_NVLINK_SPEED_MBPS_L6
NVML_FI_DEV_NVLINK_SPEED_MBPS_L7
NVML_FI_DEV_NVLINK_SPEED_MBPS_L8
NVML_FI_DEV_NVLINK_SPEED_MBPS_L9
NVML_FI_DEV_NVLINK_SPEED_MBPS_L10
NVML_FI_DEV_NVLINK_SPEED_MBPS_L11
NVML_FI_DEV_NVLINK_THROUGHPUT_DATA_TX
NVML_FI_DEV_NVLINK_THROUGHPUT_DATA_RX
NVML_FI_DEV_NVLINK_THROUGHPUT_RAW_TX
NVML_FI_DEV_NVLINK_THROUGHPUT_RAW_RX
NVML_FI_DEV_REMAPPED_COR
NVML_FI_DEV_REMAPPED_UNC
NVML_FI_DEV_REMAPPED_PENDING
NVML_FI_DEV_REMAPPED_FAILURE
NVML_FI_DEV_NVLINK_REMOTE_NVLINK_ID
NVML_FI_DEV_NVSWITCH_CONNECTED_LINK_COUNT
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L0
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L1
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L2
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L3
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L4
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L5
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L6
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L7
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L8
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L9
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L10
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_L11
NVML_FI_DEV_NVLINK_ECC_DATA_ERROR_COUNT_TOTAL
NVML_FI_DEV_NVLINK_ERROR_DL_REPLAY
NVML_FI_DEV_NVLINK_ERROR_DL_RECOVERY
NVML_FI_DEV_NVLINK_ERROR_DL_CRC
NVML_FI_DEV_NVLINK_GET_SPEED
NVML_FI_DEV_NVLINK_GET_STATE
NVML_FI_DEV_NVLINK_GET_VERSION
NVML_FI_DEV_NVLINK_GET_POWER_STATE
NVML_FI_DEV_NVLINK_GET_POWER_THRESHOLD
NVML_FI_DEV_PCIE_L0_TO_RECOVERY_COUNTER
NVML_FI_DEV_C2C_LINK_COUNT
NVML_FI_DEV_C2C_LINK_GET_STATUS
NVML_FI_DEV_C2C_LINK_GET_MAX_BW
NVML_FI_DEV_PCIE_COUNT_CORRECTABLE_ERRORS
NVML_FI_DEV_PCIE_COUNT_NAKS_RECEIVED
NVML_FI_DEV_PCIE_COUNT_RECEIVER_ERROR
NVML_FI_DEV_PCIE_COUNT_BAD_TLP
NVML_FI_DEV_PCIE_COUNT_NAKS_SENT
NVML_FI_DEV_PCIE_COUNT_BAD_DLLP
NVML_FI_DEV_PCIE_COUNT_NON_FATAL_ERROR
NVML_FI_DEV_PCIE_COUNT_FATAL_ERROR
NVML_FI_DEV_PCIE_COUNT_UNSUPPORTED_REQ
NVML_FI_DEV_PCIE_COUNT_LCRC_ERROR
NVML_FI_DEV_PCIE_COUNT_LANE_ERROR
NVML_FI_DEV_IS_RESETLESS_MIG_SUPPORTED
NVML_FI_DEV_POWER_AVERAGE
NVML_FI_DEV_POWER_INSTANT
NVML_FI_DEV_POWER_MIN_LIMIT
NVML_FI_DEV_POWER_MAX_LIMIT
NVML_FI_DEV_POWER_DEFAULT_LIMIT
NVML_FI_DEV_POWER_CURRENT_LIMIT
NVML_FI_DEV_ENERGY
NVML_FI_DEV_POWER_REQUESTED_LIMIT
NVML_FI_DEV_TEMPERATURE_SHUTDOWN_TLIMIT
NVML_FI_DEV_TEMPERATURE_SLOWDOWN_TLIMIT
NVML_FI_DEV_TEMPERATURE_MEM_MAX_TLIMIT
NVML_FI_DEV_TEMPERATURE_GPU_MAX_TLIMIT
NVML_FI_MAX
NVML_GPU_VIRTUALIZATION_MODE_NONE
NVML_GPU_VIRTUALIZATION_MODE_PASSTHROUGH
NVML_GPU_VIRTUALIZATION_MODE_VGPU
NVML_GPU_VIRTUALIZATION_MODE_HOST_VGPU
NVML_GPU_VIRTUALIZATION_MODE_HOST_VSGA
NVML_VGPU_VM_ID_DOMAIN_ID
NVML_VGPU_VM_ID_UUID
NVML_GRID_LICENSE_FEATURE_CODE_UNKNOWN
NVML_GRID_LICENSE_FEATURE_CODE_VGPU
NVML_GRID_LICENSE_FEATURE_CODE_NVIDIA_RTX
NVML_GRID_LICENSE_FEATURE_CODE_VWORKSTATION
NVML_GRID_LICENSE_FEATURE_CODE_GAMING
NVML_GRID_LICENSE_FEATURE_CODE_COMPUTE
NVML_GRID_LICENSE_EXPIRY_NOT_AVAILABLE
NVML_GRID_LICENSE_EXPIRY_INVALID
NVML_GRID_LICENSE_EXPIRY_VALID
NVML_GRID_LICENSE_EXPIRY_NOT_APPLICABLE
NVML_GRID_LICENSE_EXPIRY_PERMANENT
NVML_VGPU_CAP_NVLINK_P2P
NVML_VGPU_CAP_GPUDIRECT
NVML_VGPU_CAP_MULTI_VGPU_EXCLUSIVE
NVML_VGPU_CAP_EXCLUSIVE_TYPE
NVML_VGPU_CAP_EXCLUSIVE_SIZE
NVML_VGPU_CAP_COUNT
NVML_VGPU_DRIVER_CAP_HETEROGENEOUS_MULTI_VGPU
NVML_VGPU_DRIVER_CAP_COUNT
NVML_DEVICE_VGPU_CAP_FRACTIONAL_MULTI_VGPU
NVML_DEVICE_VGPU_CAP_HETEROGENEOUS_TIMESLICE_PROFILES
NVML_DEVICE_VGPU_CAP_HETEROGENEOUS_TIMESLICE_SIZES
NVML_DEVICE_VGPU_CAP_READ_DEVICE_BUFFER_BW
NVML_DEVICE_VGPU_CAP_WRITE_DEVICE_BUFFER_BW
NVML_DEVICE_VGPU_CAP_COUNT
NVML_VGPU_INSTANCE_GUEST_INFO_STATE_UNINITIALIZED
NVML_VGPU_INSTANCE_GUEST_INFO_STATE_INITIALIZED
NVML_VGPU_VM_COMPATIBILITY_NONE
NVML_VGPU_VM_COMPATIBILITY_COLD
NVML_VGPU_VM_COMPATIBILITY_HIBERNATE
NVML_VGPU_VM_COMPATIBILITY_SLEEP
NVML_VGPU_VM_COMPATIBILITY_LIVE
NVML_VGPU_COMPATIBILITY_LIMIT_NONE
NVML_VGPU_COMPATIBILITY_LIMIT_HOST_DRIVER
NVML_VGPU_COMPATIBILITY_LIMIT_GUEST_DRIVER
NVML_VGPU_COMPATIBILITY_LIMIT_GPU
NVML_VGPU_COMPATIBILITY_LIMIT_OTHER
NVML_HOST_VGPU_MODE_NON_SRIOV
NVML_HOST_VGPU_MODE_SRIOV
NVML_CC_ACCEPTING_CLIENT_REQUESTS_FALSE
NVML_CC_ACCEPTING_CLIENT_REQUESTS_TRUE
NVML_CC_SYSTEM_GPUS_CC_NOT_CAPABLE
NVML_CC_SYSTEM_GPUS_CC_CAPABLE
NVML_CC_SYSTEM_CPU_CAPS_NONE
NVML_CC_SYSTEM_CPU_CAPS_AMD_SEV
NVML_CC_SYSTEM_CPU_CAPS_INTEL_TDX
NVML_CC_SYSTEM_DEVTOOLS_MODE_OFF
NVML_CC_SYSTEM_DEVTOOLS_MODE_ON
NVML_CC_SYSTEM_ENVIRONMENT_UNAVAILABLE
NVML_CC_SYSTEM_ENVIRONMENT_SIM
NVML_CC_SYSTEM_ENVIRONMENT_PROD
NVML_CC_SYSTEM_FEATURE_DISABLED
NVML_CC_SYSTEM_FEATURE_ENABLED
NVML_GSP_FIRMWARE_VERSION_BUF_SIZE
NVML_PROCESS_MODE_COMPUTE
NVML_PROCESS_MODE_GRAPHICS
NVML_PROCESS_MODE_MPS
NVML_GRID_LICENSE_STATE_UNKNOWN
NVML_GRID_LICENSE_STATE_UNINITIALIZED
NVML_GRID_LICENSE_STATE_UNLICENSED_UNRESTRICTED
NVML_GRID_LICENSE_STATE_UNLICENSED_RESTRICTED
NVML_GRID_LICENSE_STATE_UNLICENSED
NVML_GRID_LICENSE_STATE_LICENSED
NVML_VGPU_SCHEDULER_POLICY_UNKNOWN
NVML_VGPU_SCHEDULER_POLICY_BEST_EFFORT
NVML_VGPU_SCHEDULER_POLICY_EQUAL_SHARE
NVML_VGPU_SCHEDULER_POLICY_FIXED_SHARE
NVML_SUPPORTED_VGPU_SCHEDULER_POLICY_COUNT
NVML_SCHEDULER_SW_MAX_LOG_ENTRIES
NVML_VGPU_SCHEDULER_ARR_DEFAULT
NVML_VGPU_SCHEDULER_ARR_DISABLE
NVML_VGPU_SCHEDULER_ARR_ENABLE
NVML_DEVICE_MIG_DISABLE
NVML_DEVICE_MIG_ENABLE
NVML_GPU_INSTANCE_PROFILE_1_SLICE
NVML_GPU_INSTANCE_PROFILE_2_SLICE
NVML_GPU_INSTANCE_PROFILE_3_SLICE
NVML_GPU_INSTANCE_PROFILE_4_SLICE
NVML_GPU_INSTANCE_PROFILE_7_SLICE
NVML_GPU_INSTANCE_PROFILE_8_SLICE
NVML_GPU_INSTANCE_PROFILE_6_SLICE
NVML_GPU_INSTANCE_PROFILE_1_SLICE_REV1
NVML_GPU_INSTANCE_PROFILE_2_SLICE_REV1
NVML_GPU_INSTANCE_PROFILE_1_SLICE_REV2
NVML_GPU_INSTANCE_PROFILE_COUNT
NVML_COMPUTE_INSTANCE_PROFILE_1_SLICE
NVML_COMPUTE_INSTANCE_PROFILE_2_SLICE
NVML_COMPUTE_INSTANCE_PROFILE_3_SLICE
NVML_COMPUTE_INSTANCE_PROFILE_4_SLICE
NVML_COMPUTE_INSTANCE_PROFILE_7_SLICE
NVML_COMPUTE_INSTANCE_PROFILE_8_SLICE
NVML_COMPUTE_INSTANCE_PROFILE_6_SLICE
NVML_COMPUTE_INSTANCE_PROFILE_1_SLICE_REV1
NVML_COMPUTE_INSTANCE_PROFILE_COUNT
NVML_COMPUTE_INSTANCE_ENGINE_PROFILE_SHARED
NVML_COMPUTE_INSTANCE_ENGINE_PROFILE_COUNT
NVML_MAX_GPU_UTILIZATIONS
NVML_GPU_UTILIZATION_DOMAIN_GPU
NVML_GPU_UTILIZATION_DOMAIN_FB
NVML_GPU_UTILIZATION_DOMAIN_VID
NVML_GPU_UTILIZATION_DOMAIN_BUS
NVML_MAX_THERMAL_SENSORS_PER_GPU
NVML_THERMAL_TARGET_NONE
NVML_THERMAL_TARGET_GPU
NVML_THERMAL_TARGET_MEMORY
NVML_THERMAL_TARGET_POWER_SUPPLY
NVML_THERMAL_TARGET_BOARD
NVML_THERMAL_TARGET_VCD_BOARD
NVML_THERMAL_TARGET_VCD_INLET
NVML_THERMAL_TARGET_VCD_OUTLET
NVML_THERMAL_TARGET_ALL
NVML_THERMAL_TARGET_UNKNOWN
NVML_THERMAL_CONTROLLER_NONE
NVML_THERMAL_CONTROLLER_GPU_INTERNAL
NVML_THERMAL_CONTROLLER_ADM1032
NVML_THERMAL_CONTROLLER_ADT7461
NVML_THERMAL_CONTROLLER_MAX6649
NVML_THERMAL_CONTROLLER_MAX1617
NVML_THERMAL_CONTROLLER_LM99
NVML_THERMAL_CONTROLLER_LM89
NVML_THERMAL_CONTROLLER_LM64
NVML_THERMAL_CONTROLLER_G781
NVML_THERMAL_CONTROLLER_ADT7473
NVML_THERMAL_CONTROLLER_SBMAX6649
NVML_THERMAL_CONTROLLER_VBIOSEVT
NVML_THERMAL_CONTROLLER_OS
NVML_THERMAL_CONTROLLER_NVSYSCON_CANOAS
NVML_THERMAL_CONTROLLER_NVSYSCON_E551
NVML_THERMAL_CONTROLLER_MAX6649R
NVML_THERMAL_CONTROLLER_ADT7473S
NVML_THERMAL_CONTROLLER_UNKNOWN
NVML_GPU_CERT_CHAIN_SIZE
NVML_GPU_ATTESTATION_CERT_CHAIN_SIZE
NVML_CC_GPU_CEC_NONCE_SIZE
NVML_CC_GPU_ATTESTATION_REPORT_SIZE
NVML_CC_GPU_CEC_ATTESTATION_REPORT_SIZE
NVML_CC_CEC_ATTESTATION_REPORT_NOT_PRESENT
NVML_CC_CEC_ATTESTATION_REPORT_PRESENT
NVML_GPM_METRIC_GRAPHICS_UTIL
NVML_GPM_METRIC_SM_UTIL
NVML_GPM_METRIC_SM_OCCUPANCY
NVML_GPM_METRIC_INTEGER_UTIL
NVML_GPM_METRIC_ANY_TENSOR_UTIL
NVML_GPM_METRIC_DFMA_TENSOR_UTIL
NVML_GPM_METRIC_HMMA_TENSOR_UTIL
NVML_GPM_METRIC_IMMA_TENSOR_UTIL
NVML_GPM_METRIC_DRAM_BW_UTIL
NVML_GPM_METRIC_FP64_UTIL
NVML_GPM_METRIC_FP32_UTIL
NVML_GPM_METRIC_FP16_UTIL
NVML_GPM_METRIC_PCIE_TX_PER_SEC
NVML_GPM_METRIC_PCIE_RX_PER_SEC
NVML_GPM_METRIC_NVDEC_0_UTIL
NVML_GPM_METRIC_NVDEC_1_UTIL
NVML_GPM_METRIC_NVDEC_2_UTIL
NVML_GPM_METRIC_NVDEC_3_UTIL
NVML_GPM_METRIC_NVDEC_4_UTIL
NVML_GPM_METRIC_NVDEC_5_UTIL
NVML_GPM_METRIC_NVDEC_6_UTIL
NVML_GPM_METRIC_NVDEC_7_UTIL
NVML_GPM_METRIC_NVJPG_0_UTIL
NVML_GPM_METRIC_NVJPG_1_UTIL
NVML_GPM_METRIC_NVJPG_2_UTIL
NVML_GPM_METRIC_NVJPG_3_UTIL
NVML_GPM_METRIC_NVJPG_4_UTIL
NVML_GPM_METRIC_NVJPG_5_UTIL
NVML_GPM_METRIC_NVJPG_6_UTIL
NVML_GPM_METRIC_NVJPG_7_UTIL
NVML_GPM_METRIC_NVOFA_0_UTIL
NVML_GPM_METRIC_NVLINK_TOTAL_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_TOTAL_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L0_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L0_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L1_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L1_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L2_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L2_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L3_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L3_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L4_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L4_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L5_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L5_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L6_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L6_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L7_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L7_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L8_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L8_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L9_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L9_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L10_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L10_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L11_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L11_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L12_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L12_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L13_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L13_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L14_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L14_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L15_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L15_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L16_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L16_TX_PER_SEC
NVML_GPM_METRIC_NVLINK_L17_RX_PER_SEC
NVML_GPM_METRIC_NVLINK_L17_TX_PER_SEC
NVML_GPM_METRIC_MAX
NVML_GPM_METRICS_GET_VERSION
NVML_GPM_SUPPORT_VERSION
NVML_GPU_FABRIC_STATE_NOT_SUPPORTED
NVML_GPU_FABRIC_STATE_NOT_STARTED
NVML_GPU_FABRIC_STATE_IN_PROGRESS
NVML_GPU_FABRIC_STATE_COMPLETED
NVML_GPU_NVLINK_BW_MODE_FULL
NVML_GPU_NVLINK_BW_MODE_OFF
NVML_GPU_NVLINK_BW_MODE_MIN
NVML_GPU_NVLINK_BW_MODE_HALF
NVML_GPU_NVLINK_BW_MODE_3QUARTER
NVML_GPU_NVLINK_BW_MODE_COUNT
NVML_POWER_SCOPE_GPU
NVML_POWER_SCOPE_MODULE
- Functions and Exceptions
nvmlCheckReturn()
nvmlQuery()
nvmlQueryFieldValues()
nvmlInit()
nvmlInitWithFlags()
nvmlShutdown()
NVMLError
NVMLError_AlreadyInitialized
NVMLError_ArgumentVersionMismatch
NVMLError_CorruptedInforom
NVMLError_Deprecated
NVMLError_DriverNotLoaded
NVMLError_FreqNotSupported
NVMLError_FunctionNotFound
NVMLError_GpuIsLost
NVMLError_InsufficientPower
NVMLError_InsufficientResources
NVMLError_InsufficientSize
NVMLError_InvalidArgument
NVMLError_InUse
NVMLError_IrqIssue
NVMLError_LibraryNotFound
NVMLError_LibRmVersionMismatch
NVMLError_Memory
NVMLError_NotFound
NVMLError_NotReady
NVMLError_NotSupported
NVMLError_NoData
NVMLError_NoPermission
NVMLError_OperatingSystem
NVMLError_ResetRequired
NVMLError_Timeout
NVMLError_Uninitialized
NVMLError_Unknown
NVMLError_VgpuEccNotSupported
nvmlExceptionClass()
nvmlStructToFriendlyObject()
nvmlFriendlyObjectToStruct()
nvmlErrorString()
nvmlSystemGetNVMLVersion()
nvmlSystemGetCudaDriverVersion()
nvmlSystemGetCudaDriverVersion_v2()
nvmlSystemGetProcessName()
nvmlSystemGetDriverVersion()
nvmlSystemGetHicVersion()
nvmlUnitGetCount()
nvmlUnitGetHandleByIndex()
nvmlUnitGetUnitInfo()
nvmlUnitGetLedState()
nvmlUnitGetPsuInfo()
nvmlUnitGetTemperature()
nvmlUnitGetFanSpeedInfo()
nvmlUnitGetDeviceCount()
nvmlUnitGetDevices()
nvmlDeviceGetCount()
nvmlDeviceGetHandleByIndex()
nvmlDeviceGetHandleBySerial()
nvmlDeviceGetHandleByUUID()
nvmlDeviceGetHandleByPciBusId()
nvmlDeviceGetName()
nvmlDeviceGetBoardId()
nvmlDeviceGetMultiGpuBoard()
nvmlDeviceGetBrand()
nvmlDeviceGetBoardPartNumber()
nvmlDeviceGetSerial()
nvmlDeviceGetModuleId()
nvmlDeviceGetMemoryAffinity()
nvmlDeviceGetCpuAffinityWithinScope()
nvmlDeviceGetCpuAffinity()
nvmlDeviceSetCpuAffinity()
nvmlDeviceClearCpuAffinity()
nvmlDeviceGetMinorNumber()
nvmlDeviceGetUUID()
nvmlDeviceGetInforomVersion()
nvmlDeviceGetInforomImageVersion()
nvmlDeviceGetInforomConfigurationChecksum()
nvmlDeviceValidateInforom()
nvmlDeviceGetLastBBXFlushTime()
nvmlDeviceGetDisplayMode()
nvmlDeviceGetDisplayActive()
nvmlDeviceGetPersistenceMode()
nvmlDeviceGetPciInfo_v3()
nvmlDeviceGetPciInfo()
nvmlDeviceGetClockInfo()
nvmlDeviceGetMaxClockInfo()
nvmlDeviceGetApplicationsClock()
nvmlDeviceGetMaxCustomerBoostClock()
nvmlDeviceGetClock()
nvmlDeviceGetDefaultApplicationsClock()
nvmlDeviceGetSupportedMemoryClocks()
nvmlDeviceGetSupportedGraphicsClocks()
nvmlDeviceGetFanSpeed()
nvmlDeviceGetFanSpeed_v2()
nvmlDeviceGetTargetFanSpeed()
nvmlDeviceGetNumFans()
nvmlDeviceSetDefaultFanSpeed_v2()
nvmlDeviceGetMinMaxFanSpeed()
nvmlDeviceGetFanControlPolicy_v2()
nvmlDeviceSetFanControlPolicy()
nvmlDeviceGetTemperature()
nvmlDeviceGetTemperatureThreshold()
nvmlDeviceSetTemperatureThreshold()
nvmlDeviceGetPowerState()
nvmlDeviceGetPerformanceState()
nvmlDeviceGetPowerManagementMode()
nvmlDeviceGetPowerManagementLimit()
nvmlDeviceGetPowerManagementLimitConstraints()
nvmlDeviceGetPowerManagementDefaultLimit()
nvmlDeviceGetEnforcedPowerLimit()
nvmlDeviceGetPowerUsage()
nvmlDeviceGetTotalEnergyConsumption()
nvmlDeviceGetGpuOperationMode()
nvmlDeviceGetCurrentGpuOperationMode()
nvmlDeviceGetPendingGpuOperationMode()
nvmlDeviceGetMemoryInfo()
nvmlDeviceGetBAR1MemoryInfo()
nvmlDeviceGetComputeMode()
nvmlDeviceGetCudaComputeCapability()
nvmlDeviceGetEccMode()
nvmlDeviceGetCurrentEccMode()
nvmlDeviceGetPendingEccMode()
nvmlDeviceGetDefaultEccMode()
nvmlDeviceGetTotalEccErrors()
nvmlDeviceGetDetailedEccErrors()
nvmlDeviceGetMemoryErrorCounter()
nvmlDeviceGetUtilizationRates()
nvmlDeviceGetEncoderUtilization()
nvmlDeviceGetDecoderUtilization()
nvmlDeviceGetJpgUtilization()
nvmlDeviceGetOfaUtilization()
nvmlDeviceGetPcieReplayCounter()
nvmlDeviceGetDriverModel()
nvmlDeviceGetCurrentDriverModel()
nvmlDeviceGetPendingDriverModel()
nvmlDeviceGetVbiosVersion()
nvmlDeviceGetComputeRunningProcesses_v3()
nvmlDeviceGetComputeRunningProcesses()
nvmlDeviceGetGraphicsRunningProcesses_v3()
nvmlDeviceGetGraphicsRunningProcesses()
nvmlDeviceGetMPSComputeRunningProcesses()
nvmlDeviceGetMPSComputeRunningProcesses_v3()
nvmlDeviceGetRunningProcessDetailList()
nvmlDeviceGetAutoBoostedClocksEnabled()
nvmlUnitSetLedState()
nvmlDeviceSetPersistenceMode()
nvmlDeviceSetComputeMode()
nvmlDeviceSetEccMode()
nvmlDeviceClearEccErrorCounts()
nvmlDeviceSetDriverModel()
nvmlDeviceSetAutoBoostedClocksEnabled()
nvmlDeviceSetDefaultAutoBoostedClocksEnabled()
nvmlDeviceSetGpuLockedClocks()
nvmlDeviceResetGpuLockedClocks()
nvmlDeviceSetMemoryLockedClocks()
nvmlDeviceResetMemoryLockedClocks()
nvmlDeviceGetClkMonStatus()
nvmlDeviceSetApplicationsClocks()
nvmlDeviceResetApplicationsClocks()
nvmlDeviceSetPowerManagementLimit()
nvmlDeviceSetGpuOperationMode()
nvmlEventSetCreate()
nvmlDeviceRegisterEvents()
nvmlDeviceGetSupportedEventTypes()
nvmlEventSetWait_v2()
nvmlEventSetWait()
nvmlEventSetFree()
nvmlDeviceOnSameBoard()
nvmlDeviceGetCurrPcieLinkGeneration()
nvmlDeviceGetMaxPcieLinkGeneration()
nvmlDeviceGetCurrPcieLinkWidth()
nvmlDeviceGetMaxPcieLinkWidth()
nvmlDeviceGetGpuMaxPcieLinkGeneration()
nvmlDeviceGetSupportedClocksThrottleReasons()
nvmlDeviceGetSupportedClocksEventReasons()
nvmlDeviceGetCurrentClocksThrottleReasons()
nvmlDeviceGetCurrentClocksEventReasons()
nvmlDeviceGetIndex()
nvmlDeviceGetAccountingMode()
nvmlDeviceSetAccountingMode()
nvmlDeviceClearAccountingPids()
nvmlDeviceGetAccountingStats()
nvmlDeviceGetAccountingPids()
nvmlDeviceGetAccountingBufferSize()
nvmlDeviceGetRetiredPages()
nvmlDeviceGetRetiredPages_v2()
nvmlDeviceGetRetiredPagesPendingStatus()
nvmlDeviceGetAPIRestriction()
nvmlDeviceSetAPIRestriction()
nvmlDeviceGetBridgeChipInfo()
nvmlDeviceGetSamples()
nvmlDeviceGetViolationStatus()
nvmlDeviceGetPcieThroughput()
nvmlSystemGetTopologyGpuSet()
nvmlDeviceGetTopologyNearestGpus()
nvmlDeviceGetTopologyCommonAncestor()
nvmlDeviceGetNvLinkUtilizationCounter()
nvmlDeviceFreezeNvLinkUtilizationCounter()
nvmlDeviceResetNvLinkUtilizationCounter()
nvmlDeviceSetNvLinkUtilizationControl()
nvmlDeviceGetNvLinkUtilizationControl()
nvmlDeviceGetNvLinkCapability()
nvmlDeviceGetNvLinkErrorCounter()
nvmlDeviceResetNvLinkErrorCounters()
nvmlDeviceGetNvLinkRemotePciInfo()
nvmlDeviceGetNvLinkRemoteDeviceType()
nvmlDeviceGetNvLinkState()
nvmlDeviceGetNvLinkVersion()
nvmlDeviceModifyDrainState()
nvmlDeviceQueryDrainState()
nvmlDeviceRemoveGpu()
nvmlDeviceDiscoverGpus()
nvmlDeviceGetFieldValues()
nvmlDeviceClearFieldValues()
nvmlDeviceGetVirtualizationMode()
nvmlDeviceSetVirtualizationMode()
nvmlGetVgpuDriverCapabilities()
nvmlDeviceGetVgpuCapabilities()
nvmlDeviceGetSupportedVgpus()
nvmlDeviceGetCreatableVgpus()
nvmlVgpuTypeGetGpuInstanceProfileId()
nvmlVgpuTypeGetClass()
nvmlVgpuTypeGetName()
nvmlVgpuTypeGetDeviceID()
nvmlVgpuTypeGetFramebufferSize()
nvmlVgpuTypeGetNumDisplayHeads()
nvmlVgpuTypeGetResolution()
nvmlVgpuTypeGetLicense()
nvmlVgpuTypeGetFrameRateLimit()
nvmlVgpuTypeGetMaxInstances()
nvmlVgpuTypeGetMaxInstancesPerVm()
nvmlDeviceGetActiveVgpus()
nvmlVgpuInstanceGetVmID()
nvmlVgpuInstanceGetUUID()
nvmlVgpuInstanceGetMdevUUID()
nvmlVgpuInstanceGetVmDriverVersion()
nvmlVgpuInstanceGetLicenseStatus()
nvmlVgpuInstanceGetLicenseInfo_v2()
nvmlVgpuInstanceGetLicenseInfo()
nvmlVgpuInstanceGetFrameRateLimit()
nvmlVgpuInstanceGetEccMode()
nvmlVgpuInstanceGetType()
nvmlVgpuInstanceGetEncoderCapacity()
nvmlVgpuInstanceSetEncoderCapacity()
nvmlVgpuInstanceGetFbUsage()
nvmlVgpuTypeGetCapabilities()
nvmlVgpuInstanceGetGpuInstanceId()
nvmlVgpuInstanceGetGpuPciId()
nvmlDeviceGetVgpuUtilization()
nvmlDeviceGetP2PStatus()
nvmlDeviceGetGridLicensableFeatures_v4()
nvmlDeviceGetGridLicensableFeatures()
nvmlDeviceGetGspFirmwareVersion()
nvmlDeviceGetGspFirmwareMode()
nvmlDeviceGetEncoderCapacity()
nvmlDeviceGetVgpuProcessUtilization()
nvmlDeviceGetEncoderStats()
nvmlDeviceGetEncoderSessions()
nvmlDeviceGetFBCStats()
nvmlDeviceGetFBCSessions()
nvmlVgpuInstanceGetEncoderStats()
nvmlVgpuInstanceGetEncoderSessions()
nvmlVgpuInstanceGetFBCStats()
nvmlVgpuInstanceGetFBCSessions()
nvmlDeviceGetProcessUtilization()
nvmlVgpuInstanceGetMetadata()
nvmlDeviceGetVgpuMetadata()
nvmlGetVgpuCompatibility()
nvmlDeviceGetPgpuMetadataString()
nvmlDeviceGetVgpuSchedulerLog()
nvmlDeviceGetVgpuSchedulerState()
nvmlDeviceGetVgpuSchedulerCapabilities()
nvmlDeviceSetVgpuSchedulerState()
nvmlSetVgpuVersion()
nvmlGetVgpuVersion()
nvmlVgpuInstanceGetAccountingMode()
nvmlVgpuInstanceGetAccountingPids()
nvmlVgpuInstanceGetAccountingStats()
nvmlVgpuInstanceClearAccountingPids()
nvmlGetExcludedDeviceCount()
nvmlGetExcludedDeviceInfoByIndex()
nvmlDeviceGetHostVgpuMode()
nvmlDeviceSetMigMode()
nvmlDeviceGetMigMode()
nvmlDeviceGetGpuInstanceProfileInfo()
nvmlDeviceGetGpuInstanceProfileInfoV()
nvmlDeviceGetGpuInstanceRemainingCapacity()
nvmlDeviceGetGpuInstancePossiblePlacements()
nvmlDeviceCreateGpuInstance()
nvmlDeviceCreateGpuInstanceWithPlacement()
nvmlGpuInstanceDestroy()
nvmlDeviceGetGpuInstances()
nvmlDeviceGetGpuInstanceById()
nvmlGpuInstanceGetInfo()
nvmlGpuInstanceGetComputeInstanceProfileInfo()
nvmlGpuInstanceGetComputeInstanceProfileInfoV()
nvmlGpuInstanceGetComputeInstanceRemainingCapacity()
nvmlGpuInstanceGetComputeInstancePossiblePlacements()
nvmlGpuInstanceCreateComputeInstance()
nvmlGpuInstanceCreateComputeInstanceWithPlacement()
nvmlComputeInstanceDestroy()
nvmlGpuInstanceGetComputeInstances()
nvmlGpuInstanceGetComputeInstanceById()
nvmlComputeInstanceGetInfo_v2()
nvmlComputeInstanceGetInfo()
nvmlDeviceIsMigDeviceHandle()
nvmlDeviceGetGpuInstanceId()
nvmlDeviceGetComputeInstanceId()
nvmlDeviceGetMaxMigDeviceCount()
nvmlDeviceGetMigDeviceHandleByIndex()
nvmlDeviceGetDeviceHandleFromMigDeviceHandle()
nvmlDeviceGetAttributes_v2()
nvmlDeviceGetAttributes()
nvmlDeviceGetRemappedRows()
nvmlDeviceGetRowRemapperHistogram()
nvmlDeviceGetArchitecture()
nvmlDeviceGetBusType()
nvmlDeviceGetIrqNum()
nvmlDeviceGetNumGpuCores()
nvmlDeviceGetPowerSource()
nvmlDeviceGetMemoryBusWidth()
nvmlDeviceGetPcieLinkMaxSpeed()
nvmlDeviceGetAdaptiveClockInfoStatus()
nvmlDeviceGetPcieSpeed()
nvmlDeviceGetDynamicPstatesInfo()
nvmlDeviceSetFanSpeed_v2()
nvmlDeviceGetThermalSettings()
nvmlDeviceGetMinMaxClockOfPState()
nvmlDeviceGetSupportedPerformanceStates()
nvmlDeviceGetGpcClkVfOffset()
nvmlDeviceSetGpcClkVfOffset()
nvmlDeviceGetGpcClkMinMaxVfOffset()
nvmlDeviceGetMemClkVfOffset()
nvmlDeviceSetMemClkVfOffset()
nvmlDeviceGetMemClkMinMaxVfOffset()
nvmlSystemSetConfComputeGpusReadyState()
nvmlSystemGetConfComputeGpusReadyState()
nvmlSystemGetConfComputeCapabilities()
nvmlSystemGetConfComputeState()
nvmlDeviceSetConfComputeUnprotectedMemSize()
nvmlDeviceGetConfComputeMemSizeInfo()
nvmlDeviceGetConfComputeProtectedMemoryUsage()
nvmlDeviceGetConfComputeGpuCertificate()
nvmlDeviceGetConfComputeGpuAttestationReport()
nvmlGpmMetricsGet()
nvmlGpmSampleFree()
nvmlGpmSampleAlloc()
nvmlGpmSampleGet()
nvmlGpmMigSampleGet()
nvmlGpmQueryDeviceSupport()
nvmlGpmSetStreamingEnabled()
nvmlGpmQueryIfStreamingEnabled()
nvmlDeviceSetNvLinkDeviceLowPowerThreshold()
nvmlDeviceGetGpuFabricInfo()
nvmlSystemSetNvlinkBwMode()
nvmlSystemGetNvlinkBwMode()
nvmlDeviceSetPowerManagementLimit_v2()
- Constants
- nvitop.libcuda module
CUDA_SUCCESS
CUDA_ERROR_INVALID_VALUE
CUDA_ERROR_OUT_OF_MEMORY
CUDA_ERROR_NOT_INITIALIZED
CUDA_ERROR_DEINITIALIZED
CUDA_ERROR_PROFILER_DISABLED
CUDA_ERROR_STUB_LIBRARY
CUDA_ERROR_DEVICE_UNAVAILABLE
CUDA_ERROR_NO_DEVICE
CUDA_ERROR_INVALID_DEVICE
CUDA_ERROR_DEVICE_NOT_LICENSED
CUDA_ERROR_INVALID_IMAGE
CUDA_ERROR_INVALID_CONTEXT
CUDA_ERROR_MAP_FAILED
CUDA_ERROR_UNMAP_FAILED
CUDA_ERROR_ARRAY_IS_MAPPED
CUDA_ERROR_ALREADY_MAPPED
CUDA_ERROR_NO_BINARY_FOR_GPU
CUDA_ERROR_ALREADY_ACQUIRED
CUDA_ERROR_NOT_MAPPED
CUDA_ERROR_NOT_MAPPED_AS_ARRAY
CUDA_ERROR_NOT_MAPPED_AS_POINTER
CUDA_ERROR_ECC_UNCORRECTABLE
CUDA_ERROR_UNSUPPORTED_LIMIT
CUDA_ERROR_CONTEXT_ALREADY_IN_USE
CUDA_ERROR_PEER_ACCESS_UNSUPPORTED
CUDA_ERROR_INVALID_PTX
CUDA_ERROR_INVALID_GRAPHICS_CONTEXT
CUDA_ERROR_NVLINK_UNCORRECTABLE
CUDA_ERROR_JIT_COMPILER_NOT_FOUND
CUDA_ERROR_UNSUPPORTED_PTX_VERSION
CUDA_ERROR_JIT_COMPILATION_DISABLED
CUDA_ERROR_UNSUPPORTED_EXEC_AFFINITY
CUDA_ERROR_INVALID_SOURCE
CUDA_ERROR_FILE_NOT_FOUND
CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND
CUDA_ERROR_SHARED_OBJECT_INIT_FAILED
CUDA_ERROR_OPERATING_SYSTEM
CUDA_ERROR_INVALID_HANDLE
CUDA_ERROR_ILLEGAL_STATE
CUDA_ERROR_NOT_FOUND
CUDA_ERROR_NOT_READY
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES
CUDA_ERROR_LAUNCH_TIMEOUT
CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING
CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED
CUDA_ERROR_PEER_ACCESS_NOT_ENABLED
CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE
CUDA_ERROR_CONTEXT_IS_DESTROYED
CUDA_ERROR_ASSERT
CUDA_ERROR_TOO_MANY_PEERS
CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED
CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED
CUDA_ERROR_HARDWARE_STACK_ERROR
CUDA_ERROR_ILLEGAL_INSTRUCTION
CUDA_ERROR_MISALIGNED_ADDRESS
CUDA_ERROR_INVALID_ADDRESS_SPACE
CUDA_ERROR_INVALID_PC
CUDA_ERROR_LAUNCH_FAILED
CUDA_ERROR_COOPERATIVE_LAUNCH_TOO_LARGE
CUDA_ERROR_NOT_PERMITTED
CUDA_ERROR_NOT_SUPPORTED
CUDA_ERROR_SYSTEM_NOT_READY
CUDA_ERROR_SYSTEM_DRIVER_MISMATCH
CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE
CUDA_ERROR_MPS_CONNECTION_FAILED
CUDA_ERROR_MPS_RPC_FAILURE
CUDA_ERROR_MPS_SERVER_NOT_READY
CUDA_ERROR_MPS_MAX_CLIENTS_REACHED
CUDA_ERROR_MPS_MAX_CONNECTIONS_REACHED
CUDA_ERROR_STREAM_CAPTURE_UNSUPPORTED
CUDA_ERROR_STREAM_CAPTURE_INVALIDATED
CUDA_ERROR_STREAM_CAPTURE_MERGE
CUDA_ERROR_STREAM_CAPTURE_UNMATCHED
CUDA_ERROR_STREAM_CAPTURE_UNJOINED
CUDA_ERROR_STREAM_CAPTURE_ISOLATION
CUDA_ERROR_STREAM_CAPTURE_IMPLICIT
CUDA_ERROR_CAPTURED_EVENT
CUDA_ERROR_STREAM_CAPTURE_WRONG_THREAD
CUDA_ERROR_TIMEOUT
CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE
CUDA_ERROR_EXTERNAL_DEVICE
CUDA_ERROR_UNKNOWN
- nvitop.libcudart module
cudaSuccess
cudaErrorInvalidValue
cudaErrorMemoryAllocation
cudaErrorInitializationError
cudaErrorCudartUnloading
cudaErrorProfilerDisabled
cudaErrorInvalidConfiguration
cudaErrorInvalidPitchValue
cudaErrorInvalidSymbol
cudaErrorInvalidTexture
cudaErrorInvalidTextureBinding
cudaErrorInvalidChannelDescriptor
cudaErrorInvalidMemcpyDirection
cudaErrorInvalidFilterSetting
cudaErrorInvalidNormSetting
cudaErrorStubLibrary
cudaErrorInsufficientDriver
cudaErrorCallRequiresNewerDriver
cudaErrorInvalidSurface
cudaErrorDuplicateVariableName
cudaErrorDuplicateTextureName
cudaErrorDuplicateSurfaceName
cudaErrorDevicesUnavailable
cudaErrorIncompatibleDriverContext
cudaErrorMissingConfiguration
cudaErrorLaunchMaxDepthExceeded
cudaErrorLaunchFileScopedTex
cudaErrorLaunchFileScopedSurf
cudaErrorSyncDepthExceeded
cudaErrorLaunchPendingCountExceeded
cudaErrorInvalidDeviceFunction
cudaErrorNoDevice
cudaErrorInvalidDevice
cudaErrorDeviceNotLicensed
cudaErrorSoftwareValidityNotEstablished
cudaErrorStartupFailure
cudaErrorInvalidKernelImage
cudaErrorDeviceUninitialized
cudaErrorMapBufferObjectFailed
cudaErrorUnmapBufferObjectFailed
cudaErrorArrayIsMapped
cudaErrorAlreadyMapped
cudaErrorNoKernelImageForDevice
cudaErrorAlreadyAcquired
cudaErrorNotMapped
cudaErrorNotMappedAsArray
cudaErrorNotMappedAsPointer
cudaErrorECCUncorrectable
cudaErrorUnsupportedLimit
cudaErrorDeviceAlreadyInUse
cudaErrorPeerAccessUnsupported
cudaErrorInvalidPtx
cudaErrorInvalidGraphicsContext
cudaErrorNvlinkUncorrectable
cudaErrorJitCompilerNotFound
cudaErrorUnsupportedPtxVersion
cudaErrorJitCompilationDisabled
cudaErrorUnsupportedExecAffinity
cudaErrorInvalidSource
cudaErrorFileNotFound
cudaErrorSharedObjectSymbolNotFound
cudaErrorSharedObjectInitFailed
cudaErrorOperatingSystem
cudaErrorInvalidResourceHandle
cudaErrorIllegalState
cudaErrorSymbolNotFound
cudaErrorNotReady
cudaErrorIllegalAddress
cudaErrorLaunchOutOfResources
cudaErrorLaunchTimeout
cudaErrorLaunchIncompatibleTexturing
cudaErrorPeerAccessAlreadyEnabled
cudaErrorPeerAccessNotEnabled
cudaErrorSetOnActiveProcess
cudaErrorContextIsDestroyed
cudaErrorAssert
cudaErrorTooManyPeers
cudaErrorHostMemoryAlreadyRegistered
cudaErrorHostMemoryNotRegistered
cudaErrorHardwareStackError
cudaErrorIllegalInstruction
cudaErrorMisalignedAddress
cudaErrorInvalidAddressSpace
cudaErrorInvalidPc
cudaErrorLaunchFailure
cudaErrorCooperativeLaunchTooLarge
cudaErrorNotPermitted
cudaErrorNotSupported
cudaErrorSystemNotReady
cudaErrorSystemDriverMismatch
cudaErrorCompatNotSupportedOnDevice
cudaErrorMpsConnectionFailed
cudaErrorMpsRpcFailure
cudaErrorMpsServerNotReady
cudaErrorMpsMaxClientsReached
cudaErrorMpsMaxConnectionsReached
cudaErrorMpsClientTerminated
cudaErrorCdpNotSupported
cudaErrorCdpVersionMismatch
cudaErrorStreamCaptureUnsupported
cudaErrorStreamCaptureInvalidated
cudaErrorStreamCaptureMerge
cudaErrorStreamCaptureUnmatched
cudaErrorStreamCaptureUnjoined
cudaErrorStreamCaptureIsolation
cudaErrorStreamCaptureImplicit
cudaErrorCapturedEvent
cudaErrorStreamCaptureWrongThread
cudaErrorTimeout
cudaErrorGraphExecUpdateFailure
cudaErrorExternalDevice
cudaErrorInvalidClusterSize
cudaErrorUnknown
- nvitop.utils module
- nvitop.select module
- nvitop.callbacks package
Module Contents
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
- nvitop.version.PYNVML_VERSION_CANDIDATES = ('11.450.51', '11.450.129', '11.460.79', '11.470.66', '11.495.46', '11.510.69', '11.515.48', '11.515.75', '11.525.84', '11.525.112', '11.525.131', '12.535.77', '12.535.108', '12.535.133')
The list of supported
nvidia-ml-py
versions. See also: nvidia-ml-py’s Release History.To install
nvitop
with a specific version ofnvidia-ml-py
, usenvitop[pynvml-xx.yyy.zzz]
, for example:pip3 install 'nvitop[pynvml-11.450.51]'
or
pip3 install nvitop nvidia-ml-py==11.450.51
Note
The package
nvidia-ml-py
is not backward compatible over releases. This may cause problems such as “Function Not Found” errors with old versions of NVIDIA drivers (e.g. the NVIDIA R430 driver on Ubuntu 16.04 LTS). The ideal solution is to let the user install the best-fit version ofnvidia-ml-py
. See also: nvidia-ml-py’s Release History.nvidia-ml-py==11.450.51
is the last version supports the NVIDIA R430 driver (CUDA 10.x). Sincenvidia-ml-py>=11.450.129
, the definition of structnvmlProcessInfo_t
has introduced two new fieldsgpuInstanceId
andcomputeInstanceId
(GI ID and CI ID in newernvidia-smi
) which are incompatible with some old NVIDIA drivers.nvitop
may not display the processes correctly due to this incompatibility.
- class nvitop.NaType[source]
Bases:
str
A singleton (
str: 'N/A'
) class represents a not applicable value.The
NA
instance behaves like astr
instance ('N/A'
) when doing string manipulation (e.g. concatenation). For arithmetic operations, for exampleNA / 1024 / 1024
, it acts like themath.nan
.Examples
>>> NA 'N/A'
>>> 'memory usage: {}'.format(NA) # NA is an instance of `str` 'memory usage: N/A' >>> NA.lower() # NA is an instance of `str` 'n/a' >>> NA.ljust(5) # NA is an instance of `str` 'N/A ' >>> NA + ' str' # string contamination if the operand is a string 'N/A str'
>>> float(NA) # explicit conversion to float (`math.nan`) nan >>> NA + 1 # auto-casting to float if the operand is a number nan >>> NA * 1024 # auto-casting to float if the operand is a number nan >>> NA / (1024 * 1024) # auto-casting to float if the operand is a number nan
- __float__() float [source]
Convert
NA
tofloat
and returnmath.nan
.>>> float(NA) nan >>> float(NA) is math.nan True
- __add__(other: object) str | float [source]
Return
math.nan
if the operand is a number or uses string concatenation if the operand is a string (NA + other
).A special case is when the operand is
nvitop.NA
itself, the result ismath.nan
instead of'N/AN/A'
.>>> NA + ' str' 'N/A str' >>> NA + NA nan >>> NA + 1 nan >>> NA + 1.0 nan
- __radd__(other: object) str | float [source]
Return
math.nan
if the operand is a number or uses string concatenation if the operand is a string (other + NA
).>>> 'str' + NA 'strN/A' >>> 1 + NA nan >>> 1.0 + NA nan
- __sub__(other: object) float [source]
Return
math.nan
if the operand is a number (NA - other
).>>> NA - 'str' TypeError: unsupported operand type(s) for -: 'NaType' and 'str' >>> NA - NA 'N/AN/A' >>> NA + 1 nan >>> NA + 1.0 nan
- __rsub__(other: object) float [source]
Return
math.nan
if the operand is a number (other - NA
).>>> 'str' - NA TypeError: unsupported operand type(s) for -: 'str' and 'NaType' >>> 1 - NA nan >>> 1.0 - NA nan
- __mul__(other: object) float [source]
Return
math.nan
if the operand is a number (NA * other
).A special case is when the operand is
nvitop.NA
itself, the result is alsomath.nan
.>>> NA * 1024 nan >>> NA * 1024.0 nan >>> NA * NA nan
- __rmul__(other: object) float [source]
Return
math.nan
if the operand is a number (other * NA
).>>> 1024 * NA nan >>> 1024.0 * NA nan
- __truediv__(other: object) float [source]
Return
math.nan
if the operand is a number (NA / other
).>>> NA / 1024 nan >>> NA / 1024.0 nan >>> NA / 0 ZeroDivisionError: float division by zero >>> NA / 0.0 ZeroDivisionError: float division by zero >>> NA / NA nan
- __rtruediv__(other: object) float [source]
Return
math.nan
if the operand is a number (other / NA
).>>> 1024 / NA nan >>> 1024.0 / NA nan
- __floordiv__(other: object) float [source]
Return
math.nan
if the operand is a number (NA // other
).>>> NA // 1024 nan >>> NA // 1024.0 nan >>> NA / 0 ZeroDivisionError: float division by zero >>> NA / 0.0 ZeroDivisionError: float division by zero >>> NA // NA nan
- __rfloordiv__(other: object) float [source]
Return
math.nan
if the operand is a number (other // NA
).>>> 1024 // NA nan >>> 1024.0 // NA nan
- __mod__(other: object) float [source]
Return
math.nan
if the operand is a number (NA % other
).>>> NA % 1024 nan >>> NA % 1024.0 nan >>> NA % 0 ZeroDivisionError: float modulo >>> NA % 0.0 ZeroDivisionError: float modulo
- __rmod__(other: object) float [source]
Return
math.nan
if the operand is a number (other % NA
).>>> 1024 % NA nan >>> 1024.0 % NA nan
- __divmod__(other: object) tuple[float, float] [source]
The pair
(NA // other, NA % other)
(divmod(NA, other)
).>>> divmod(NA, 1024) (nan, nan) >>> divmod(NA, 1024.0) (nan, nan) >>> divmod(NA, 0) ZeroDivisionError: float floor division by zero >>> divmod(NA, 0.0) ZeroDivisionError: float floor division by zero
- __rdivmod__(other: object) tuple[float, float] [source]
The pair
(other // NA, other % NA)
(divmod(other, NA)
).>>> divmod(1024, NA) (nan, nan) >>> divmod(1024.0, NA) (nan, nan)
- __round__(ndigits: int | None = None) int | float [source]
Round
nvitop.NA
tondigits
decimal places, defaulting to0
.If
ndigits
is omitted orNone
, returns0
, otherwise returnsmath.nan
.>>> round(NA) 0 >>> round(NA, 0) nan >>> round(NA, 1) nan
- __lt__(x: object) bool [source]
The
nvitop.NA
is always greater than any number, or uses the dictionary order for string.
- __le__(x: object) bool [source]
The
nvitop.NA
is always greater than any number, or uses the dictionary order for string.
- __gt__(x: object) bool [source]
The
nvitop.NA
is always greater than any number, or uses the dictionary order for string.
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
- exception nvitop.NVMLError(value)[source]
Bases:
Exception
Base exception class for NVML query errors.
- static __new__(typ, value)[source]
Map value to a proper subclass of
NVMLError
.
- nvitop.nvmlCheckReturn(retval: _Any, types: type | tuple[type, ...] | None = None) bool [source]
Check whether the return value is not
nvitop.NA
and is one of the given types.
- class nvitop.Device(index: int | tuple[int, int] | str | None = None, *, uuid: str | None = None, bus_id: str | None = None)[source]
Bases:
object
Live class of the GPU devices, different from the device snapshots.
Device.__new__()
returns different types depending on the given arguments.- (index: int) -> PhysicalDevice - (index: (int, int)) -> MigDevice - (uuid: str) -> Union[PhysicalDevice, MigDevice] # depending on the UUID value - (bus_id: str) -> PhysicalDevice
Examples
>>> Device.driver_version() # version of the installed NVIDIA display driver '470.129.06'
>>> Device.count() # number of NVIDIA GPUs in the system 10
>>> Device.all() # all physical devices in the system [ PhysicalDevice(index=0, ...), PhysicalDevice(index=1, ...), ... ]
>>> nvidia0 = Device(index=0) # -> PhysicalDevice >>> mig10 = Device(index=(1, 0)) # -> MigDevice >>> nvidia2 = Device(uuid='GPU-xxxxxx') # -> PhysicalDevice >>> mig30 = Device(uuid='MIG-xxxxxx') # -> MigDevice
>>> nvidia0.memory_free() # total free memory in bytes 11550654464 >>> nvidia0.memory_free_human() # total free memory in human readable format '11016MiB'
>>> nvidia2.as_snapshot() # takes an onetime snapshot of the device PhysicalDeviceSnapshot( real=PhysicalDevice(index=2, ...), ... )
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
libnvml.NVMLError_NotFound – If the device is not found for the given NVML identifier.
libnvml.NVMLError_InvalidArgument – If the device index is out of range.
TypeError – If the number of non-None arguments is not exactly 1.
TypeError – If the given index is a tuple but is not consist of two integers.
- UUID_PATTERN: re.Pattern = re.compile('^ # full match\n (?:(?P<MigMode>MIG)-)? # prefix for MIG UUID\n (?:(?P<GpuUuid>GPU)-)? # prefix for GPU UUID\n (?, re.VERBOSE)
- GPU_PROCESS_CLASS
alias of
GpuProcess
- cuda
alias of
CudaDevice
- classmethod is_available() bool [source]
Test whether there are any devices and the NVML library is successfully loaded.
- static driver_version() str | NaType [source]
The version of the installed NVIDIA display driver. This is an alphanumeric string.
Command line equivalent:
nvidia-smi --id=0 --format=csv,noheader,nounits --query-gpu=driver_version
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
- static cuda_driver_version() str | NaType [source]
The maximum CUDA version supported by the NVIDIA display driver. This is an alphanumeric string.
This can be different from the version of the CUDA Runtime. See also
cuda_runtime_version()
.- Returns: Union[str, NaType]
The maximum CUDA version supported by the NVIDIA display driver.
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
- static max_cuda_version() str | NaType
The maximum CUDA version supported by the NVIDIA display driver. This is an alphanumeric string.
This can be different from the version of the CUDA Runtime. See also
cuda_runtime_version()
.- Returns: Union[str, NaType]
The maximum CUDA version supported by the NVIDIA display driver.
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
- static cuda_runtime_version() str | NaType [source]
The CUDA Runtime version. This is an alphanumeric string.
This can be different from the CUDA driver version. See also
cuda_driver_version()
.- Returns: Union[str, NaType]
The CUDA Runtime version, or
nvitop.NA
when no CUDA Runtime is available or no CUDA-capable devices are present.
- static cudart_version() str | NaType
The CUDA Runtime version. This is an alphanumeric string.
This can be different from the CUDA driver version. See also
cuda_driver_version()
.- Returns: Union[str, NaType]
The CUDA Runtime version, or
nvitop.NA
when no CUDA Runtime is available or no CUDA-capable devices are present.
- classmethod count() int [source]
The number of NVIDIA GPUs in the system.
Command line equivalent:
nvidia-smi --id=0 --format=csv,noheader,nounits --query-gpu=count
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
- classmethod all() list[PhysicalDevice] [source]
Return a list of all physical devices in the system.
- classmethod from_indices(indices: int | Iterable[int | tuple[int, int]] | None = None) list[PhysicalDevice | MigDevice] [source]
Return a list of devices of the given indices.
- Parameters:
indices (Iterable[Union[int, Tuple[int, int]]]) – Indices of the devices. For each index, get
PhysicalDevice
for single int andMigDevice
for tuple (int, int). That is: - (int) -> PhysicalDevice - ((int, int)) -> MigDevice
- Returns: List[Union[PhysicalDevice, MigDevice]]
A list of
PhysicalDevice
and/orMigDevice
instances of the given indices.
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
libnvml.NVMLError_NotFound – If the device is not found for the given NVML identifier.
libnvml.NVMLError_InvalidArgument – If the device index is out of range.
- static from_cuda_visible_devices() list[CudaDevice] [source]
Return a list of all CUDA visible devices.
The CUDA ordinal will be enumerate from the
CUDA_VISIBLE_DEVICES
environment variable.Note
The result could be empty if the
CUDA_VISIBLE_DEVICES
environment variable is invalid.- See also for CUDA Device Enumeration:
- Returns: List[CudaDevice]
A list of
CudaDevice
instances.
- static from_cuda_indices(cuda_indices: int | Iterable[int] | None = None) list[CudaDevice] [source]
Return a list of CUDA devices of the given CUDA indices.
The CUDA ordinal will be enumerate from the
CUDA_VISIBLE_DEVICES
environment variable.- See also for CUDA Device Enumeration:
- Parameters:
cuda_indices (Iterable[int]) – The indices of the GPU in CUDA ordinal, if not given, returns all visible CUDA devices.
- Returns: List[CudaDevice]
A list of
CudaDevice
of the given CUDA indices.
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
RuntimeError – If the index is out of range for the given
CUDA_VISIBLE_DEVICES
environment variable.
- static parse_cuda_visible_devices(cuda_visible_devices: str | None = <VALUE OMITTED>) list[int] | list[tuple[int, int]] [source]
Parse the given
CUDA_VISIBLE_DEVICES
value into a list of NVML device indices.This is a alias of
parse_cuda_visible_devices()
.Note
The result could be empty if the
CUDA_VISIBLE_DEVICES
environment variable is invalid.- See also for CUDA Device Enumeration:
- Parameters:
cuda_visible_devices (Optional[str]) – The value of the
CUDA_VISIBLE_DEVICES
variable. If not given, the value from the environment will be used. If explicitly given byNone
, theCUDA_VISIBLE_DEVICES
environment variable will be unset before parsing.
- Returns: Union[List[int], List[Tuple[int, int]]]
A list of int (physical device) or a list of tuple of two integers (MIG device) for the corresponding real device indices.
- static normalize_cuda_visible_devices(cuda_visible_devices: str | None = <VALUE OMITTED>) str [source]
Parse the given
CUDA_VISIBLE_DEVICES
value and convert it into a comma-separated string of UUIDs.This is an alias of
normalize_cuda_visible_devices()
.Note
The result could be empty string if the
CUDA_VISIBLE_DEVICES
environment variable is invalid.- See also for CUDA Device Enumeration:
- Parameters:
cuda_visible_devices (Optional[str]) – The value of the
CUDA_VISIBLE_DEVICES
variable. If not given, the value from the environment will be used. If explicitly given byNone
, theCUDA_VISIBLE_DEVICES
environment variable will be unset before parsing.
- Returns: str
The comma-separated string (GPU UUIDs) of the
CUDA_VISIBLE_DEVICES
environment variable.
- static __new__(cls, index: int | tuple[int, int] | str | None = None, *, uuid: str | None = None, bus_id: str | None = None) Self [source]
Create a new instance of Device.
The type of the result is determined by the given argument.
- (index: int) -> PhysicalDevice - (index: (int, int)) -> MigDevice - (uuid: str) -> Union[PhysicalDevice, MigDevice] # depending on the UUID value - (bus_id: str) -> PhysicalDevice
Note: This method takes exact 1 non-None argument.
- Returns: Union[PhysicalDevice, MigDevice]
A
PhysicalDevice
instance or aMigDevice
instance.
- __init__(index: int | str | None = None, *, uuid: str | None = None, bus_id: str | None = None) None [source]
Initialize the instance created by
__new__()
.- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
libnvml.NVMLError_NotFound – If the device is not found for the given NVML identifier.
libnvml.NVMLError_InvalidArgument – If the device index is out of range.
- __getattr__(name: str) Any | Callable[..., Any] [source]
Get the object attribute.
If the attribute is not defined, make a method from
pynvml.nvmlDeviceGet<AttributeName>(handle)
. The attribute name will be converted to PascalCase string.- Raises:
AttributeError – If the attribute is not defined in
pynvml.py
.
Examples
>>> device = Device(0)
>>> # Method `cuda_compute_capability` is not implemented in the class definition >>> PhysicalDevice.cuda_compute_capability AttributeError: type object 'Device' has no attribute 'cuda_compute_capability'
>>> # Dynamically create a new method from `pynvml.nvmlDeviceGetCudaComputeCapability(device.handle, *args, **kwargs)` >>> device.cuda_compute_capability <function PhysicalDevice.cuda_compute_capability at 0x7fbfddf5d9d0>
>>> device.cuda_compute_capability() (8, 6)
- __reduce__() tuple[type[Device], tuple[int | tuple[int, int]]] [source]
Return state information for pickling.
- property index: int | tuple[int, int]
The NVML index of the device.
- Returns: Union[int, Tuple[int, int]]
Returns an int for physical device and tuple of two integers for MIG device.
- property nvml_index: int | tuple[int, int]
The NVML index of the device.
- Returns: Union[int, Tuple[int, int]]
Returns an int for physical device and tuple of two integers for MIG device.
- property physical_index: int
The index of the physical device.
- Returns: int
An int for the physical device index. For MIG devices, returns the index of the parent physical device.
- property handle: LP_struct_c_nvmlDevice_t
The NVML device handle.
- property cuda_index: int
The CUDA device index.
The value will be evaluated on the first call.
- Raises:
RuntimeError – If the current device is not visible to CUDA applications (i.e. not listed in the
CUDA_VISIBLE_DEVICES
environment variable or the environment variable is invalid).
- name() str | NaType [source]
The official product name of the GPU. This is an alphanumeric string. For all products.
- Returns: Union[str, NaType]
The official product name, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=name
- uuid() str | NaType [source]
This value is the globally unique immutable alphanumeric identifier of the GPU.
It does not correspond to any physical label on the board.
- Returns: Union[str, NaType]
The UUID of the device, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=name
- bus_id() str | NaType [source]
PCI bus ID as “domain:bus:device.function”, in hex.
- Returns: Union[str, NaType]
The PCI bus ID of the device, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=pci.bus_id
- serial() str | NaType [source]
This number matches the serial number physically printed on each board.
It is a globally unique immutable alphanumeric value.
- Returns: Union[str, NaType]
The serial number of the device, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=serial
- memory_info() MemoryInfo [source]
Return a named tuple with memory information (in bytes) for the device.
- Returns: MemoryInfo(total, free, used)
A named tuple with memory information, the item could be
nvitop.NA
when not applicable.
- memory_total() int | NaType [source]
Total installed GPU memory in bytes.
- Returns: Union[int, NaType]
Total installed GPU memory in bytes, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=memory.total
- memory_used() int | NaType [source]
Total memory allocated by active contexts in bytes.
- Returns: Union[int, NaType]
Total memory allocated by active contexts in bytes, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=memory.used
- memory_free() int | NaType [source]
Total free memory in bytes.
- Returns: Union[int, NaType]
Total free memory in bytes, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=memory.free
- memory_total_human() str | NaType [source]
Total installed GPU memory in human readable format.
- Returns: Union[str, NaType]
Total installed GPU memory in human readable format, or
nvitop.NA
when not applicable.
- memory_used_human() str | NaType [source]
Total memory allocated by active contexts in human readable format.
- Returns: Union[int, NaType]
Total memory allocated by active contexts in human readable format, or
nvitop.NA
when not applicable.
- memory_free_human() str | NaType [source]
Total free memory in human readable format.
- Returns: Union[int, NaType]
Total free memory in human readable format, or
nvitop.NA
when not applicable.
- memory_percent() float | NaType [source]
The percentage of used memory over total memory (
0 <= p <= 100
).- Returns: Union[float, NaType]
The percentage of used memory over total memory, or
nvitop.NA
when not applicable.
- memory_usage() str [source]
The used memory over total memory in human readable format.
- Returns: str
The used memory over total memory in human readable format, or
'N/A / N/A'
when not applicable.
- bar1_memory_info() MemoryInfo [source]
Return a named tuple with BAR1 memory information (in bytes) for the device.
- Returns: MemoryInfo(total, free, used)
A named tuple with BAR1 memory information, the item could be
nvitop.NA
when not applicable.
- bar1_memory_total() int | NaType [source]
Total BAR1 memory in bytes.
- Returns: Union[int, NaType]
Total BAR1 memory in bytes, or
nvitop.NA
when not applicable.
- bar1_memory_used() int | NaType [source]
Total used BAR1 memory in bytes.
- Returns: Union[int, NaType]
Total used BAR1 memory in bytes, or
nvitop.NA
when not applicable.
- bar1_memory_free() int | NaType [source]
Total free BAR1 memory in bytes.
- Returns: Union[int, NaType]
Total free BAR1 memory in bytes, or
nvitop.NA
when not applicable.
- bar1_memory_total_human() str | NaType [source]
Total BAR1 memory in human readable format.
- Returns: Union[int, NaType]
Total BAR1 memory in human readable format, or
nvitop.NA
when not applicable.
- bar1_memory_used_human() str | NaType [source]
Total used BAR1 memory in human readable format.
- Returns: Union[int, NaType]
Total used BAR1 memory in human readable format, or
nvitop.NA
when not applicable.
- bar1_memory_free_human() str | NaType [source]
Total free BAR1 memory in human readable format.
- Returns: Union[int, NaType]
Total free BAR1 memory in human readable format, or
nvitop.NA
when not applicable.
- bar1_memory_percent() float | NaType [source]
The percentage of used BAR1 memory over total BAR1 memory (0 <= p <= 100).
- Returns: Union[float, NaType]
The percentage of used BAR1 memory over total BAR1 memory, or
nvitop.NA
when not applicable.
- bar1_memory_usage() str [source]
The used BAR1 memory over total BAR1 memory in human readable format.
- Returns: str
The used BAR1 memory over total BAR1 memory in human readable format, or
'N/A / N/A'
when not applicable.
- utilization_rates() UtilizationRates [source]
Return a named tuple with GPU utilization rates (in percentage) for the device.
- Returns: UtilizationRates(gpu, memory, encoder, decoder)
A named tuple with GPU utilization rates (in percentage) for the device, the item could be
nvitop.NA
when not applicable.
- gpu_utilization() int | NaType [source]
Percent of time over the past sample period during which one or more kernels was executing on the GPU.
The sample period may be between 1 second and 1/6 second depending on the product.
- Returns: Union[int, NaType]
The GPU utilization rate in percentage, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=utilization.gpu
- gpu_percent() int | NaType
Percent of time over the past sample period during which one or more kernels was executing on the GPU.
The sample period may be between 1 second and 1/6 second depending on the product.
- Returns: Union[int, NaType]
The GPU utilization rate in percentage, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=utilization.gpu
- memory_utilization() int | NaType [source]
Percent of time over the past sample period during which global (device) memory was being read or written.
The sample period may be between 1 second and 1/6 second depending on the product.
- Returns: Union[int, NaType]
The memory bandwidth utilization rate of the GPU in percentage, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=utilization.memory
- encoder_utilization() int | NaType [source]
The encoder utilization rate in percentage.
- Returns: Union[int, NaType]
The encoder utilization rate in percentage, or
nvitop.NA
when not applicable.
- decoder_utilization() int | NaType [source]
The decoder utilization rate in percentage.
- Returns: Union[int, NaType]
The decoder utilization rate in percentage, or
nvitop.NA
when not applicable.
- clock_infos() ClockInfos [source]
Return a named tuple with current clock speeds (in MHz) for the device.
- Returns: ClockInfos(graphics, sm, memory, video)
A named tuple with current clock speeds (in MHz) for the device, the item could be
nvitop.NA
when not applicable.
- clocks() ClockInfos
Return a named tuple with current clock speeds (in MHz) for the device.
- Returns: ClockInfos(graphics, sm, memory, video)
A named tuple with current clock speeds (in MHz) for the device, the item could be
nvitop.NA
when not applicable.
- max_clock_infos() ClockInfos [source]
Return a named tuple with maximum clock speeds (in MHz) for the device.
- Returns: ClockInfos(graphics, sm, memory, video)
A named tuple with maximum clock speeds (in MHz) for the device, the item could be
nvitop.NA
when not applicable.
- max_clocks() ClockInfos
Return a named tuple with maximum clock speeds (in MHz) for the device.
- Returns: ClockInfos(graphics, sm, memory, video)
A named tuple with maximum clock speeds (in MHz) for the device, the item could be
nvitop.NA
when not applicable.
- clock_speed_infos() ClockSpeedInfos [source]
Return a named tuple with the current and the maximum clock speeds (in MHz) for the device.
- Returns: ClockSpeedInfos(current, max)
A named tuple with the current and the maximum clock speeds (in MHz) for the device.
- graphics_clock() int | NaType [source]
Current frequency of graphics (shader) clock in MHz.
- Returns: Union[int, NaType]
The current frequency of graphics (shader) clock in MHz, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=clocks.current.graphics
- sm_clock() int | NaType [source]
Current frequency of SM (Streaming Multiprocessor) clock in MHz.
- Returns: Union[int, NaType]
The current frequency of SM (Streaming Multiprocessor) clock in MHz, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=clocks.current.sm
- memory_clock() int | NaType [source]
Current frequency of memory clock in MHz.
- Returns: Union[int, NaType]
The current frequency of memory clock in MHz, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=clocks.current.memory
- video_clock() int | NaType [source]
Current frequency of video encoder/decoder clock in MHz.
- Returns: Union[int, NaType]
The current frequency of video encoder/decoder clock in MHz, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=clocks.current.video
- max_graphics_clock() int | NaType [source]
Maximum frequency of graphics (shader) clock in MHz.
- Returns: Union[int, NaType]
The maximum frequency of graphics (shader) clock in MHz, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=clocks.max.graphics
- max_sm_clock() int | NaType [source]
Maximum frequency of SM (Streaming Multiprocessor) clock in MHz.
- Returns: Union[int, NaType]
The maximum frequency of SM (Streaming Multiprocessor) clock in MHz, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=clocks.max.sm
- max_memory_clock() int | NaType [source]
Maximum frequency of memory clock in MHz.
- Returns: Union[int, NaType]
The maximum frequency of memory clock in MHz, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=clocks.max.memory
- max_video_clock() int | NaType [source]
Maximum frequency of video encoder/decoder clock in MHz.
- Returns: Union[int, NaType]
The maximum frequency of video encoder/decoder clock in MHz, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=clocks.max.video
- fan_speed() int | NaType [source]
The fan speed value is the percent of the product’s maximum noise tolerance fan speed that the device’s fan is currently intended to run at.
This value may exceed 100% in certain cases. Note: The reported speed is the intended fan speed. If the fan is physically blocked and unable to spin, this output will not match the actual fan speed. Many parts do not report fan speeds because they rely on cooling via fans in the surrounding enclosure.
- Returns: Union[int, NaType]
The fan speed value in percentage, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=fan.speed
- temperature() int | NaType [source]
Core GPU temperature in degrees C.
- Returns: Union[int, NaType]
The core GPU temperature in Celsius degrees, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=temperature.gpu
- power_usage() int | NaType [source]
The last measured power draw for the entire board in milliwatts.
- Returns: Union[int, NaType]
The power draw for the entire board in milliwatts, or
nvitop.NA
when not applicable.
Command line equivalent:
$(( "$(nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=power.draw)" * 1000 ))
- power_draw() int | NaType
The last measured power draw for the entire board in milliwatts.
- Returns: Union[int, NaType]
The power draw for the entire board in milliwatts, or
nvitop.NA
when not applicable.
Command line equivalent:
$(( "$(nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=power.draw)" * 1000 ))
- power_limit() int | NaType [source]
The software power limit in milliwatts.
Set by software like nvidia-smi.
- Returns: Union[int, NaType]
The software power limit in milliwatts, or
nvitop.NA
when not applicable.
Command line equivalent:
$(( "$(nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=power.limit)" * 1000 ))
- power_status() str [source]
The string of power usage over power limit in watts.
- Returns: str
The string of power usage over power limit in watts, or
'N/A / N/A'
when not applicable.
- pcie_throughput() ThroughputInfo [source]
The current PCIe throughput in KiB/s.
This function is querying a byte counter over a 20ms interval and thus is the PCIe throughput over that interval.
- Returns: ThroughputInfo(tx, rx)
A named tuple with current PCIe throughput in KiB/s, the item could be
nvitop.NA
when not applicable.
- pcie_tx_throughput() int | NaType [source]
The current PCIe transmit throughput in KiB/s.
This function is querying a byte counter over a 20ms interval and thus is the PCIe throughput over that interval.
- Returns: Union[int, NaType]
The current PCIe transmit throughput in KiB/s, or
nvitop.NA
when not applicable.
- pcie_rx_throughput() int | NaType [source]
The current PCIe receive throughput in KiB/s.
This function is querying a byte counter over a 20ms interval and thus is the PCIe throughput over that interval.
- Returns: Union[int, NaType]
The current PCIe receive throughput in KiB/s, or
nvitop.NA
when not applicable.
- pcie_tx_throughput_human() str | NaType [source]
The current PCIe transmit throughput in human readable format.
This function is querying a byte counter over a 20ms interval and thus is the PCIe throughput over that interval.
- Returns: Union[str, NaType]
The current PCIe transmit throughput in human readable format, or
nvitop.NA
when not applicable.
- pcie_rx_throughput_human() str | NaType [source]
The current PCIe receive throughput in human readable format.
This function is querying a byte counter over a 20ms interval and thus is the PCIe throughput over that interval.
- Returns: Union[str, NaType]
The current PCIe receive throughput in human readable format, or
nvitop.NA
when not applicable.
- nvlink_link_count() int [source]
The number of NVLinks that the GPU has.
- Returns: Union[int, NaType]
The number of NVLinks that the GPU has.
- nvlink_throughput(interval: float | None = None) list[ThroughputInfo] [source]
The current NVLink throughput for each NVLink in KiB/s.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: List[ThroughputInfo(tx, rx)]
A list of named tuples with current NVLink throughput for each NVLink in KiB/s, the item could be
nvitop.NA
when not applicable.
- nvlink_mean_throughput(interval: float | None = None) ThroughputInfo [source]
The mean NVLink throughput for all NVLinks in KiB/s.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: ThroughputInfo(tx, rx)
A named tuple with the mean NVLink throughput for all NVLinks in KiB/s, the item could be
nvitop.NA
when not applicable.
- nvlink_tx_throughput(interval: float | None = None) list[int | NaType] [source]
The current NVLink transmit data throughput in KiB/s for each NVLink.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: List[Union[int, NaType]]
The current NVLink transmit data throughput in KiB/s for each NVLink, or
nvitop.NA
when not applicable.
- nvlink_mean_tx_throughput(interval: float | None = None) int | NaType [source]
The mean NVLink transmit data throughput for all NVLinks in KiB/s.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: Union[int, NaType]
The mean NVLink transmit data throughput for all NVLinks in KiB/s, or
nvitop.NA
when not applicable.
- nvlink_rx_throughput(interval: float | None = None) list[int | NaType] [source]
The current NVLink receive data throughput for each NVLink in KiB/s.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: Union[int, NaType]
The current NVLink receive data throughput for each NVLink in KiB/s, or
nvitop.NA
when not applicable.
- nvlink_mean_rx_throughput(interval: float | None = None) int | NaType [source]
The mean NVLink receive data throughput for all NVLinks in KiB/s.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: Union[int, NaType]
The mean NVLink receive data throughput for all NVLinks in KiB/s, or
nvitop.NA
when not applicable.
- nvlink_tx_throughput_human(interval: float | None = None) list[str | NaType] [source]
The current NVLink transmit data throughput for each NVLink in human readable format.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: Union[str, NaType]
The current NVLink transmit data throughput for each NVLink in human readable format, or
nvitop.NA
when not applicable.
- nvlink_mean_tx_throughput_human(interval: float | None = None) str | NaType [source]
The mean NVLink transmit data throughput for all NVLinks in human readable format.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: Union[str, NaType]
The mean NVLink transmit data throughput for all NVLinks in human readable format, or
nvitop.NA
when not applicable.
- nvlink_rx_throughput_human(interval: float | None = None) list[str | NaType] [source]
The current NVLink receive data throughput for each NVLink in human readable format.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: Union[str, NaType]
The current NVLink receive data throughput for each NVLink in human readable format, or
nvitop.NA
when not applicable.
- nvlink_mean_rx_throughput_human(interval: float | None = None) str | NaType [source]
The mean NVLink receive data throughput for all NVLinks in human readable format.
This function is querying data counters between methods calls and thus is the NVLink throughput over that interval. For the first call, the function is blocking for 20ms to get the first data counters.
- Parameters:
interval (Optional[float]) – The interval in seconds between two calls to get the NVLink throughput. If
interval
is a positive number, compares throughput counters before and after the interval (blocking). Ifinterval
is :const`0.0` orNone
, compares throughput counters since the last call, returning immediately (non-blocking).
- Returns: Union[str, NaType]
The mean NVLink receive data throughput for all NVLinks in human readable format, or
nvitop.NA
when not applicable.
- display_active() str | NaType [source]
A flag that indicates whether a display is initialized on the GPU’s (e.g. memory is allocated on the device for display).
Display can be active even when no monitor is physically attached. “Enabled” indicates an active display. “Disabled” indicates otherwise.
- Returns: Union[str, NaType]
'Disabled'
: if not an active display device.'Enabled'
: if an active display device.nvitop.NA
: if not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=display_active
- display_mode() str | NaType [source]
A flag that indicates whether a physical display (e.g. monitor) is currently connected to any of the GPU’s connectors.
“Enabled” indicates an attached display. “Disabled” indicates otherwise.
- Returns: Union[str, NaType]
'Disabled'
: if the display mode is disabled.'Enabled'
: if the display mode is enabled.nvitop.NA
: if not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=display_mode
- current_driver_model() str | NaType [source]
The driver model currently in use.
Always “N/A” on Linux. On Windows, the TCC (WDM) and WDDM driver models are supported. The TCC driver model is optimized for compute applications. I.E. kernel launch times will be quicker with TCC. The WDDM driver model is designed for graphics applications and is not recommended for compute applications. Linux does not support multiple driver models, and will always have the value of “N/A”.
- Returns: Union[str, NaType]
'WDDM'
: for WDDM driver model on Windows.'WDM'
: for TTC (WDM) driver model on Windows.nvitop.NA
: if not applicable, e.g. on Linux.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=driver_model.current
- driver_model() str | NaType
The driver model currently in use.
Always “N/A” on Linux. On Windows, the TCC (WDM) and WDDM driver models are supported. The TCC driver model is optimized for compute applications. I.E. kernel launch times will be quicker with TCC. The WDDM driver model is designed for graphics applications and is not recommended for compute applications. Linux does not support multiple driver models, and will always have the value of “N/A”.
- Returns: Union[str, NaType]
'WDDM'
: for WDDM driver model on Windows.'WDM'
: for TTC (WDM) driver model on Windows.nvitop.NA
: if not applicable, e.g. on Linux.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=driver_model.current
- persistence_mode() str | NaType [source]
A flag that indicates whether persistence mode is enabled for the GPU. Value is either “Enabled” or “Disabled”.
When persistence mode is enabled the NVIDIA driver remains loaded even when no active clients, such as X11 or nvidia-smi, exist. This minimizes the driver load latency associated with running dependent apps, such as CUDA programs. Linux only.
- Returns: Union[str, NaType]
'Disabled'
: if the persistence mode is disabled.'Enabled'
: if the persistence mode is enabled.nvitop.NA
: if not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=persistence_mode
- performance_state() str | NaType [source]
The current performance state for the GPU. States range from P0 (maximum performance) to P12 (minimum performance).
- Returns: Union[str, NaType]
The current performance state in format
P<int>
, ornvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=pstate
- total_volatile_uncorrected_ecc_errors() int | NaType [source]
Total errors detected across entire chip.
- Returns: Union[int, NaType]
The total number of uncorrected errors in volatile ECC memory, or
nvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=ecc.errors.uncorrected.volatile.total
- compute_mode() str | NaType [source]
The compute mode flag indicates whether individual or multiple compute applications may run on the GPU.
- Returns: Union[str, NaType]
'Default'
: means multiple contexts are allowed per device.'Exclusive Thread'
: deprecated, use Exclusive Process instead'Prohibited'
: means no contexts are allowed per device (no compute apps).'Exclusive Process'
: means only one context is allowed per device, usable from multiple threads at a time.nvitop.NA
: if not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=compute_mode
- cuda_compute_capability() tuple[int, int] | NaType [source]
The CUDA compute capability for the device.
- Returns: Union[Tuple[int, int], NaType]
The CUDA compute capability version in format
(major, minor)
, ornvitop.NA
when not applicable.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=compute_cap
- mig_mode() str | NaType [source]
The MIG mode that the GPU is currently operating under.
- Returns: Union[str, NaType]
'Disabled'
: if the MIG mode is disabled.'Enabled'
: if the MIG mode is enabled.nvitop.NA
: if not applicable, e.g. the GPU does not support MIG mode.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=mig.mode.current
- is_mig_mode_enabled() bool [source]
Test whether the MIG mode is enabled on the device.
Return
False
if MIG mode is disabled or the device does not support MIG mode.
- max_mig_device_count() int [source]
Return the maximum number of MIG instances the device supports.
This method will return 0 if the device does not support MIG mode.
- mig_devices() list[MigDevice] [source]
Return a list of children MIG devices of the current device.
This method will return an empty list if the MIG mode is disabled or the device does not support MIG mode.
- is_leaf_device() bool [source]
Test whether the device is a physical device with MIG mode disabled or a MIG device.
Return
True
if the device is a physical device with MIG mode disabled or a MIG device. Otherwise, returnFalse
if the device is a physical device with MIG mode enabled.
- to_leaf_devices() list[PhysicalDevice] | list[MigDevice] | list[CudaDevice] | list[CudaMigDevice] [source]
Return a list of leaf devices.
Note that a CUDA device is always a leaf device.
- processes() dict[int, GpuProcess] [source]
Return a dictionary of processes running on the GPU.
- Returns: Dict[int, GpuProcess]
A dictionary mapping PID to GPU process instance.
- as_snapshot() Snapshot [source]
Return a onetime snapshot of the device.
The attributes are defined in
SNAPSHOT_KEYS
.
- SNAPSHOT_KEYS: ClassVar[list[str]] = ['name', 'uuid', 'bus_id', 'memory_info', 'memory_used', 'memory_free', 'memory_total', 'memory_used_human', 'memory_free_human', 'memory_total_human', 'memory_percent', 'memory_usage', 'utilization_rates', 'gpu_utilization', 'memory_utilization', 'encoder_utilization', 'decoder_utilization', 'clock_infos', 'max_clock_infos', 'clock_speed_infos', 'sm_clock', 'memory_clock', 'fan_speed', 'temperature', 'power_usage', 'power_limit', 'power_status', 'pcie_throughput', 'pcie_tx_throughput', 'pcie_rx_throughput', 'pcie_tx_throughput_human', 'pcie_rx_throughput_human', 'display_active', 'display_mode', 'current_driver_model', 'persistence_mode', 'performance_state', 'total_volatile_uncorrected_ecc_errors', 'compute_mode', 'cuda_compute_capability', 'mig_mode']
- oneshot() Generator[None, None, None] [source]
A utility context manager which considerably speeds up the retrieval of multiple device information at the same time.
Internally different device info (e.g. memory_info, utilization_rates, …) may be fetched by using the same routine, but only one information is returned and the others are discarded. When using this context manager the internal routine is executed once (in the example below on memory_info()) and the other info are cached.
The cache is cleared when exiting the context manager block. The advice is to use this every time you retrieve more than one information about the device.
Examples
>>> from nvitop import Device >>> device = Device(0) >>> with device.oneshot(): ... device.memory_info() # collect multiple info ... device.memory_used() # return cached value ... device.memory_free_human() # return cached value ... device.memory_percent() # return cached value
- class nvitop.PhysicalDevice(index: int | tuple[int, int] | str | None = None, *, uuid: str | None = None, bus_id: str | None = None)[source]
Bases:
Device
Class for physical devices.
This is the real GPU installed in the system.
- property physical_index: int
Zero based index of the GPU. Can change at each boot.
Command line equivalent:
nvidia-smi --id=<IDENTIFIER> --format=csv,noheader,nounits --query-gpu=index
- max_mig_device_count() int [source]
Return the maximum number of MIG instances the device supports.
This method will return 0 if the device does not support MIG mode.
- mig_device(mig_index: int) MigDevice [source]
Return a child MIG device of the given index.
- Raises:
libnvml.NVMLError – If the device does not support MIG mode or the given MIG device does not exist.
- class nvitop.MigDevice(index: int | tuple[int, int] | str | None = None, *, uuid: str | None = None, bus_id: str | None = None)[source]
Bases:
Device
Class for MIG devices.
- classmethod count() int [source]
The number of total MIG devices aggregated over all physical devices.
- classmethod all() list[MigDevice] [source]
Return a list of MIG devices aggregated over all physical devices.
- classmethod from_indices(indices: Iterable[tuple[int, int]]) list[MigDevice] [source]
Return a list of MIG devices of the given indices.
- Parameters:
indices (Iterable[Tuple[int, int]]) – Indices of the MIG devices. Each index is a tuple of two integers.
- Returns: List[MigDevice]
A list of
MigDevice
instances of the given indices.
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
libnvml.NVMLError_NotFound – If the device is not found for the given NVML identifier.
- __init__(index: tuple[int, int] | str | None = None, *, uuid: str | None = None) None [source]
Initialize the instance created by
__new__()
.- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
libnvml.NVMLError_NotFound – If the device is not found for the given NVML identifier.
- property physical_index: int
The index of the parent physical device.
- property mig_index: int
The index of the MIG device over the all MIG devices of the parent device.
- property parent: PhysicalDevice
The parent physical device.
- gpu_instance_id() int | NaType [source]
The gpu instance ID of the MIG device.
- Returns: Union[int, NaType]
The gpu instance ID of the MIG device, or
nvitop.NA
when not applicable.
- compute_instance_id() int | NaType [source]
The compute instance ID of the MIG device.
- Returns: Union[int, NaType]
The compute instance ID of the MIG device, or
nvitop.NA
when not applicable.
- as_snapshot() Snapshot [source]
Return a onetime snapshot of the device.
The attributes are defined in
SNAPSHOT_KEYS
.
- SNAPSHOT_KEYS: ClassVar[list[str]] = ['name', 'uuid', 'bus_id', 'memory_info', 'memory_used', 'memory_free', 'memory_total', 'memory_used_human', 'memory_free_human', 'memory_total_human', 'memory_percent', 'memory_usage', 'utilization_rates', 'gpu_utilization', 'memory_utilization', 'encoder_utilization', 'decoder_utilization', 'clock_infos', 'max_clock_infos', 'clock_speed_infos', 'sm_clock', 'memory_clock', 'fan_speed', 'temperature', 'power_usage', 'power_limit', 'power_status', 'pcie_throughput', 'pcie_tx_throughput', 'pcie_rx_throughput', 'pcie_tx_throughput_human', 'pcie_rx_throughput_human', 'display_active', 'display_mode', 'current_driver_model', 'persistence_mode', 'performance_state', 'total_volatile_uncorrected_ecc_errors', 'compute_mode', 'cuda_compute_capability', 'mig_mode', 'gpu_instance_id', 'compute_instance_id']
- class nvitop.CudaDevice(cuda_index: int | None = None, *, nvml_index: int | tuple[int, int] | None = None, uuid: str | None = None)[source]
Bases:
Device
Class for devices enumerated over the CUDA ordinal.
The order can be vary for different
CUDA_VISIBLE_DEVICES
environment variable.- See also for CUDA Device Enumeration:
CudaDevice.__new__()
returns different types depending on the given arguments.- (cuda_index: int) -> Union[CudaDevice, CudaMigDevice] # depending on `CUDA_VISIBLE_DEVICES` - (uuid: str) -> Union[CudaDevice, CudaMigDevice] # depending on `CUDA_VISIBLE_DEVICES` - (nvml_index: int) -> CudaDevice - (nvml_index: (int, int)) -> CudaMigDevice
Examples
>>> import os >>> os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' >>> os.environ['CUDA_VISIBLE_DEVICES'] = '3,2,1,0'
>>> CudaDevice.count() # number of NVIDIA GPUs visible to CUDA applications 4 >>> Device.cuda.count() # use alias in class `Device` 4
>>> CudaDevice.all() # all CUDA visible devices (or `Device.cuda.all()`) [ CudaDevice(cuda_index=0, nvml_index=3, ...), CudaDevice(cuda_index=1, nvml_index=2, ...), ... ]
>>> cuda0 = CudaDevice(cuda_index=0) # use CUDA ordinal (or `Device.cuda(0)`) >>> cuda1 = CudaDevice(nvml_index=2) # use NVML ordinal >>> cuda2 = CudaDevice(uuid='GPU-xxxxxx') # use UUID string
>>> cuda0.memory_free() # total free memory in bytes 11550654464 >>> cuda0.memory_free_human() # total free memory in human readable format '11016MiB'
>>> cuda1.as_snapshot() # takes an onetime snapshot of the device CudaDeviceSnapshot( real=CudaDevice(cuda_index=1, nvml_index=2, ...), ... )
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
libnvml.NVMLError_NotFound – If the device is not found for the given NVML identifier.
libnvml.NVMLError_InvalidArgument – If the NVML index is out of range.
TypeError – If the number of non-None arguments is not exactly 1.
TypeError – If the given NVML index is a tuple but is not consist of two integers.
RuntimeError – If the index is out of range for the given
CUDA_VISIBLE_DEVICES
environment variable.
- classmethod all() list[CudaDevice] [source]
All CUDA visible devices.
Note
The result could be empty if the
CUDA_VISIBLE_DEVICES
environment variable is invalid.
- classmethod from_indices(indices: int | Iterable[int] | None = None) list[CudaDevice] [source]
Return a list of CUDA devices of the given CUDA indices.
The CUDA ordinal will be enumerate from the
CUDA_VISIBLE_DEVICES
environment variable.- See also for CUDA Device Enumeration:
- Parameters:
cuda_indices (Iterable[int]) – The indices of the GPU in CUDA ordinal, if not given, returns all visible CUDA devices.
- Returns: List[CudaDevice]
A list of
CudaDevice
of the given CUDA indices.
- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
RuntimeError – If the index is out of range for the given
CUDA_VISIBLE_DEVICES
environment variable.
- static __new__(cls, cuda_index: int | None = None, *, nvml_index: int | tuple[int, int] | None = None, uuid: str | None = None) Self [source]
Create a new instance of CudaDevice.
The type of the result is determined by the given argument.
- (cuda_index: int) -> Union[CudaDevice, CudaMigDevice] # depending on `CUDA_VISIBLE_DEVICES` - (uuid: str) -> Union[CudaDevice, CudaMigDevice] # depending on `CUDA_VISIBLE_DEVICES` - (nvml_index: int) -> CudaDevice - (nvml_index: (int, int)) -> CudaMigDevice
Note: This method takes exact 1 non-None argument.
- Returns: Union[CudaDevice, CudaMigDevice]
A
CudaDevice
instance or aCudaMigDevice
instance.
- Raises:
TypeError – If the number of non-None arguments is not exactly 1.
TypeError – If the given NVML index is a tuple but is not consist of two integers.
RuntimeError – If the index is out of range for the given
CUDA_VISIBLE_DEVICES
environment variable.
- __init__(cuda_index: int | None = None, *, nvml_index: int | tuple[int, int] | None = None, uuid: str | None = None) None [source]
Initialize the instance created by
__new__()
.- Raises:
libnvml.NVMLError_LibraryNotFound – If cannot find the NVML library, usually the NVIDIA driver is not installed.
libnvml.NVMLError_DriverNotLoaded – If NVIDIA driver is not loaded.
libnvml.NVMLError_LibRmVersionMismatch – If RM detects a driver/library version mismatch, usually after an upgrade for NVIDIA driver without reloading the kernel module.
libnvml.NVMLError_NotFound – If the device is not found for the given NVML identifier.
libnvml.NVMLError_InvalidArgument – If the NVML index is out of range.
RuntimeError – If the given device is not visible to CUDA applications (i.e. not listed in the
CUDA_VISIBLE_DEVICES
environment variable or the environment variable is invalid).
- class nvitop.CudaMigDevice(cuda_index: int | None = None, *, nvml_index: int | tuple[int, int] | None = None, uuid: str | None = None)[source]
Bases:
CudaDevice
,MigDevice
Class for CUDA devices that are MIG devices.
- nvitop.parse_cuda_visible_devices(cuda_visible_devices: str | None = <VALUE OMITTED>) list[int] | list[tuple[int, int]] [source]
Parse the given
CUDA_VISIBLE_DEVICES
value into a list of NVML device indices.This function is aliased by
Device.parse_cuda_visible_devices()
.Note
The result could be empty if the
CUDA_VISIBLE_DEVICES
environment variable is invalid.- See also for CUDA Device Enumeration:
- Parameters:
cuda_visible_devices (Optional[str]) – The value of the
CUDA_VISIBLE_DEVICES
variable. If not given, the value from the environment will be used. If explicitly given byNone
, theCUDA_VISIBLE_DEVICES
environment variable will be unset before parsing.
- Returns: Union[List[int], List[Tuple[int, int]]]
A list of int (physical device) or a list of tuple of two integers (MIG device) for the corresponding real device indices.
Examples
>>> import os >>> os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' >>> os.environ['CUDA_VISIBLE_DEVICES'] = '6,5' >>> parse_cuda_visible_devices() # parse the `CUDA_VISIBLE_DEVICES` environment variable to NVML indices [6, 5]
>>> parse_cuda_visible_devices('0,4') # pass the `CUDA_VISIBLE_DEVICES` value explicitly [0, 4]
>>> parse_cuda_visible_devices('GPU-18ef14e9,GPU-849d5a8d') # accept abbreviated UUIDs [5, 6]
>>> parse_cuda_visible_devices(None) # get all devices when the `CUDA_VISIBLE_DEVICES` environment variable unset [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> parse_cuda_visible_devices('MIG-d184f67c-c95f-5ef2-a935-195bd0094fbd') # MIG device support (MIG UUID) [(0, 0)] >>> parse_cuda_visible_devices('MIG-GPU-3eb79704-1571-707c-aee8-f43ce747313d/13/0') # MIG device support (GPU UUID) [(0, 1)] >>> parse_cuda_visible_devices('MIG-GPU-3eb79704/13/0') # MIG device support (abbreviated GPU UUID) [(0, 1)]
>>> parse_cuda_visible_devices('') # empty string [] >>> parse_cuda_visible_devices('0,0') # invalid `CUDA_VISIBLE_DEVICES` (duplicate device ordinal) [] >>> parse_cuda_visible_devices('16') # invalid `CUDA_VISIBLE_DEVICES` (device ordinal out of range) []
- nvitop.normalize_cuda_visible_devices(cuda_visible_devices: str | None = <VALUE OMITTED>) str [source]
Parse the given
CUDA_VISIBLE_DEVICES
value and convert it into a comma-separated string of UUIDs.This function is aliased by
Device.normalize_cuda_visible_devices()
.Note
The result could be empty string if the
CUDA_VISIBLE_DEVICES
environment variable is invalid.- See also for CUDA Device Enumeration:
- Parameters:
cuda_visible_devices (Optional[str]) – The value of the
CUDA_VISIBLE_DEVICES
variable. If not given, the value from the environment will be used. If explicitly given byNone
, theCUDA_VISIBLE_DEVICES
environment variable will be unset before parsing.
- Returns: str
The comma-separated string (GPU UUIDs) of the
CUDA_VISIBLE_DEVICES
environment variable.
Examples
>>> import os >>> os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' >>> os.environ['CUDA_VISIBLE_DEVICES'] = '6,5' >>> normalize_cuda_visible_devices() # normalize the `CUDA_VISIBLE_DEVICES` environment variable to UUID strings 'GPU-849d5a8d-610e-eeea-1fd4-81ff44a23794,GPU-18ef14e9-dec6-1d7e-1284-3010c6ce98b1'
>>> normalize_cuda_visible_devices('4') # pass the `CUDA_VISIBLE_DEVICES` value explicitly 'GPU-96de99c9-d68f-84c8-424c-7c75e59cc0a0'
>>> normalize_cuda_visible_devices('GPU-18ef14e9,GPU-849d5a8d') # normalize abbreviated UUIDs 'GPU-18ef14e9-dec6-1d7e-1284-3010c6ce98b1,GPU-849d5a8d-610e-eeea-1fd4-81ff44a23794'
>>> normalize_cuda_visible_devices(None) # get all devices when the `CUDA_VISIBLE_DEVICES` environment variable unset 'GPU-<GPU0-UUID>,GPU-<GPU1-UUID>,...' # all GPU UUIDs
>>> normalize_cuda_visible_devices('MIG-d184f67c-c95f-5ef2-a935-195bd0094fbd') # MIG device support (MIG UUID) 'MIG-d184f67c-c95f-5ef2-a935-195bd0094fbd' >>> normalize_cuda_visible_devices('MIG-GPU-3eb79704-1571-707c-aee8-f43ce747313d/13/0') # MIG device support (GPU UUID) 'MIG-37b51284-1df4-5451-979d-3231ccb0822e' >>> normalize_cuda_visible_devices('MIG-GPU-3eb79704/13/0') # MIG device support (abbreviated GPU UUID) 'MIG-37b51284-1df4-5451-979d-3231ccb0822e'
>>> normalize_cuda_visible_devices('') # empty string '' >>> normalize_cuda_visible_devices('0,0') # invalid `CUDA_VISIBLE_DEVICES` (duplicate device ordinal) '' >>> normalize_cuda_visible_devices('16') # invalid `CUDA_VISIBLE_DEVICES` (device ordinal out of range) ''
- class nvitop.HostProcess(pid: int | None = None)[source]
Bases:
Process
Represent an OS process with the given PID.
If PID is omitted current process PID (
os.getpid()
) is used. The instance will be cache during the lifetime of the process.Examples
>>> HostProcess() # the current process HostProcess(pid=12345, name='python3', status='running', started='00:55:43')
>>> p1 = HostProcess(12345) >>> p2 = HostProcess(12345) >>> p1 is p2 # the same instance True
>>> import copy >>> copy.deepcopy(p1) is p1 # the same instance True
>>> p = HostProcess(pid=12345) >>> p.cmdline() ['python3', '-c', 'import IPython; IPython.terminal.ipapp.launch_new_instance()'] >>> p.command() # the result is in shell-escaped format 'python3 -c "import IPython; IPython.terminal.ipapp.launch_new_instance()"'
>>> p.as_snapshot() HostProcessSnapshot( real=HostProcess(pid=12345, name='python3', status='running', started='00:55:43'), cmdline=['python3', '-c', 'import IPython; IPython.terminal.ipapp.launch_new_instance()'], command='python3 -c "import IPython; IPython.terminal.ipapp.launch_new_instance()"', connections=[], cpu_percent=0.3, cpu_times=pcputimes(user=2.180019456, system=0.18424464, children_user=0.0, children_system=0.0), create_time=1656608143.31, cwd='/home/panxuehai', environ={...}, ... )
- INSTANCE_LOCK: threading.RLock = <unlocked _thread.RLock object owner=0 count=0>
- INSTANCES: WeakValueDictionary[int, HostProcess] = <WeakValueDictionary>
- static __new__(cls, pid: int | None = None) Self [source]
Return the cached instance of
HostProcess
.
- username() str [source]
The name of the user that owns the process.
On UNIX this is calculated by using real process uid.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- cmdline() list[str] [source]
The command line this process has been called with.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- command() str [source]
Return a shell-escaped string from command line arguments.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- running_time() timedelta [source]
The elapsed time this process has been running in
datetime.timedelta
.- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- running_time_human() str [source]
The elapsed time this process has been running in human readable format.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- running_time_in_seconds() float [source]
The elapsed time this process has been running in seconds.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- elapsed_time() timedelta
The elapsed time this process has been running in
datetime.timedelta
.- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- elapsed_time_human() str
The elapsed time this process has been running in human readable format.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- elapsed_time_in_seconds() float
The elapsed time this process has been running in seconds.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- rss_memory() int [source]
The used resident set size (RSS) memory of the process in bytes.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- parent() HostProcess | None [source]
Return the parent process as a
HostProcess
instance orNone
if there is no parent.- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- children(recursive: bool = False) list[HostProcess] [source]
Return the children of this process as a list of
HostProcess
instances.If recursive is
True
return all the descendants.- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- oneshot() Generator[None, None, None] [source]
A utility context manager which considerably speeds up the retrieval of multiple process information at the same time.
Internally different process info (e.g. name, ppid, uids, gids, …) may be fetched by using the same routine, but only one information is returned and the others are discarded. When using this context manager the internal routine is executed once (in the example below on
name()
) and the other info are cached.The cache is cleared when exiting the context manager block. The advice is to use this every time you retrieve more than one information about the process.
Examples
>>> from nvitop import HostProcess >>> p = HostProcess() >>> with p.oneshot(): ... p.name() # collect multiple info ... p.cpu_times() # return cached value ... p.cpu_percent() # return cached value ... p.create_time() # return cached value
- class nvitop.GpuProcess(pid: int | None, device: Device, *, gpu_memory: int | NaType | None = None, gpu_instance_id: int | NaType | None = None, compute_instance_id: int | NaType | None = None, type: str | NaType | None = None)[source]
Bases:
object
Represent a process with the given PID running on the given GPU device.
The instance will be cache during the lifetime of the process.
The same host process can use multiple GPU devices. The
GpuProcess
instances representing the same PID on the host but different GPU devices are different.- INSTANCE_LOCK: threading.RLock = <unlocked _thread.RLock object owner=0 count=0>
- INSTANCES: WeakValueDictionary[tuple[int, Device], GpuProcess] = <WeakValueDictionary>
- static __new__(cls, pid: int | None, device: Device, *, gpu_memory: int | NaType | None = None, gpu_instance_id: int | NaType | None = None, compute_instance_id: int | NaType | None = None, type: str | NaType | None = None) Self [source]
Return the cached instance of
GpuProcess
.
- __init__(pid: int | None, device: Device, *, gpu_memory: int | NaType | None = None, gpu_instance_id: int | NaType | None = None, compute_instance_id: int | NaType | None = None, type: str | NaType | None = None) None [source]
Initialize the instance returned by
__new__()
.
- __getattr__(name: str) Any | Callable[..., Any] [source]
Get a member from the instance or fallback to the host process instance if missing.
- Raises:
AttributeError – If the attribute is not defined in either
GpuProcess
norHostProcess
.host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
- property pid: int
The process PID.
- property host: HostProcess
The process instance running on the host.
- property device: Device
The GPU device the process running on.
The same host process can use multiple GPU devices. The
GpuProcess
instances representing the same PID on the host but different GPU devices are different.
- gpu_instance_id() int | NaType [source]
The GPU instance ID of the MIG device, or
nvitop.NA
if not applicable.
- compute_instance_id() int | NaType [source]
The compute instance ID of the MIG device, or
nvitop.NA
if not applicable.
- gpu_memory_human() str | NaType [source]
The used GPU memory in human readable format, or
nvitop.NA
if not applicable.
- gpu_memory_percent() float | NaType [source]
The percentage of used GPU memory by the process, or
nvitop.NA
if not applicable.
- gpu_sm_utilization() int | NaType [source]
The utilization rate of SM (Streaming Multiprocessor), or
nvitop.NA
if not applicable.
- gpu_memory_utilization() int | NaType [source]
The utilization rate of GPU memory bandwidth, or
nvitop.NA
if not applicable.
- gpu_encoder_utilization() int | NaType [source]
The utilization rate of the encoder, or
nvitop.NA
if not applicable.
- gpu_decoder_utilization() int | NaType [source]
The utilization rate of the decoder, or
nvitop.NA
if not applicable.
- set_gpu_utilization(gpu_sm_utilization: int | NaType | None = None, gpu_memory_utilization: int | NaType | None = None, gpu_encoder_utilization: int | NaType | None = None, gpu_decoder_utilization: int | NaType | None = None) None [source]
Set the GPU utilization rates.
- property type: str | NaType
The type of the GPU context.
- The type is one of the following:
'C'
: compute context'G'
: graphics context'C+G'
: both compute context and graphics context'N/A'
: not applicable
- status() str [source]
The process current status.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- create_time() float | NaType [source]
The process creation time as a floating point number expressed in seconds since the epoch.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- running_time() datetime.timedelta | NaType [source]
The elapsed time this process has been running in
datetime.timedelta
.- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- running_time_human() str | NaType [source]
The elapsed time this process has been running in human readable format.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- running_time_in_seconds() float | NaType [source]
The elapsed time this process has been running in seconds.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- elapsed_time() datetime.timedelta | NaType
The elapsed time this process has been running in
datetime.timedelta
.- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- elapsed_time_human() str | NaType
The elapsed time this process has been running in human readable format.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- elapsed_time_in_seconds() float | NaType
The elapsed time this process has been running in seconds.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- username() str | NaType [source]
The name of the user that owns the process.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- name() str | NaType [source]
The process name.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- cpu_percent() float | NaType [source]
Return a float representing the current process CPU utilization as a percentage.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- memory_percent() float | NaType [source]
Compare process RSS memory to total physical system memory and calculate process memory utilization as a percentage.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- host_memory_percent() float | NaType
Compare process RSS memory to total physical system memory and calculate process memory utilization as a percentage.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- host_memory() int | NaType [source]
The used resident set size (RSS) memory of the process in bytes.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- host_memory_human() str | NaType [source]
The used resident set size (RSS) memory of the process in human readable format.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- rss_memory() int | NaType
The used resident set size (RSS) memory of the process in bytes.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- cmdline() list[str] [source]
The command line this process has been called with.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- command() str [source]
Return a shell-escaped string from command line arguments.
- Raises:
host.NoSuchProcess – If the process is gone.
host.AccessDenied – If the user do not have read privilege to the process’ status file.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. See alsotake_snapshots()
andfailsafe()
.
- as_snapshot(*, host_process_snapshot_cache: dict[int, Snapshot] | None = None) Snapshot [source]
Return a onetime snapshot of the process on the GPU device.
Note
To return the fallback value rather than raise an exception, please use the context manager
GpuProcess.failsafe()
. Also, consider using the batched version to take snapshots withGpuProcess.take_snapshots()
, which caches the results and reduces redundant queries. See alsotake_snapshots()
andfailsafe()
.
- classmethod take_snapshots(gpu_processes: Iterable[GpuProcess], *, failsafe: bool = False) list[Snapshot] [source]
Take snapshots for a list of
GpuProcess
instances.If failsafe is
True
, then if any method fails, the fallback value inauto_garbage_clean()
will be used.
- classmethod failsafe() Generator[None, None, None] [source]
A context manager that enables fallback values for methods that fail.
Examples
>>> p = GpuProcess(pid=10000, device=Device(0)) # process does not exist >>> p GpuProcess(pid=10000, gpu_memory=N/A, type=N/A, device=PhysicalDevice(index=0, name="NVIDIA GeForce RTX 3070", total_memory=8192MiB), host=HostProcess(pid=10000, status='terminated')) >>> p.cpu_percent() Traceback (most recent call last): ... NoSuchProcess: process no longer exists (pid=10000)
>>> # Failsafe to the fallback value instead of raising exceptions ... with GpuProcess.failsafe(): ... print('fallback: {!r}'.format(p.cpu_percent())) ... print('fallback (float cast): {!r}'.format(float(p.cpu_percent()))) # `nvitop.NA` can be cast to float or int ... print('fallback (int cast): {!r}'.format(int(p.cpu_percent()))) # `nvitop.NA` can be cast to float or int fallback: 'N/A' fallback (float cast): nan fallback (int cast): 0
- nvitop.command_join(cmdline: list[str]) str [source]
Return a shell-escaped string from command line arguments.
- nvitop.take_snapshots(devices: Device | Iterable[Device] | None = None, *, gpu_processes: bool | GpuProcess | Iterable[GpuProcess] | None = None) SnapshotResult [source]
Retrieve status of demanded devices and GPU processes.
- Parameters:
devices (Optional[Union[Device, Iterable[Device]]]) – Requested devices for snapshots. If not given, the devices will be determined from GPU processes: (1) All devices (no GPU processes are given); (2) Devices that used by given GPU processes.
gpu_processes (Optional[Union[bool, GpuProcess, Iterable[GpuProcess]]]) – Requested GPU processes snapshots. If not given, all GPU processes running on the requested device will be returned. The GPU process snapshots can be suppressed by specifying
gpu_processes=False
.
- Returns: SnapshotResult
A named tuple containing two lists of snapshots.
Note
If not arguments are specified, all devices and all GPU processes will be returned.
Examples
>>> from nvitop import take_snapshots, Device >>> import os >>> os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' >>> os.environ['CUDA_VISIBLE_DEVICES'] = '1,0'
>>> take_snapshots() # equivalent to `take_snapshots(Device.all())` SnapshotResult( devices=[ PhysicalDeviceSnapshot( real=PhysicalDevice(index=0, ...), ... ), ... ], gpu_processes=[ GpuProcessSnapshot( real=GpuProcess(pid=xxxxxx, device=PhysicalDevice(index=0, ...), ...), ... ), ... ] )
>>> device_snapshots, gpu_process_snapshots = take_snapshots(Device.all()) # type: Tuple[List[DeviceSnapshot], List[GpuProcessSnapshot]]
>>> device_snapshots, _ = take_snapshots(gpu_processes=False) # ignore process snapshots
>>> take_snapshots(Device.cuda.all()) # use CUDA device enumeration SnapshotResult( devices=[ CudaDeviceSnapshot( real=CudaDevice(cuda_index=0, physical_index=1, ...), ... ), CudaDeviceSnapshot( real=CudaDevice(cuda_index=1, physical_index=0, ...), ... ), ], gpu_processes=[ GpuProcessSnapshot( real=GpuProcess(pid=xxxxxx, device=CudaDevice(cuda_index=0, ...), ...), ... ), ... ] )
>>> take_snapshots(Device.cuda(1)) # <CUDA 1> only SnapshotResult( devices=[ CudaDeviceSnapshot( real=CudaDevice(cuda_index=1, physical_index=0, ...), ... ) ], gpu_processes=[ GpuProcessSnapshot( real=GpuProcess(pid=xxxxxx, device=CudaDevice(cuda_index=1, ...), ...), ... ), ... ] )
- nvitop.collect_in_background(on_collect: Callable[[dict[str, float]], bool], collector: ResourceMetricCollector | None = None, interval: float | None = None, *, on_start: Callable[[ResourceMetricCollector], None] | None = None, on_stop: Callable[[ResourceMetricCollector], None] | None = None, tag: str = 'metrics-daemon', start: bool = True) threading.Thread [source]
Start a background daemon thread that collect and call the callback function periodically.
See also
ResourceMetricCollector.daemonize()
.- Parameters:
on_collect (Callable[[Dict[str, float]], bool]) – A callback function that will be called periodically. It takes a dictionary containing the resource metrics and returns a boolean indicating whether to continue monitoring.
collector (Optional[ResourceMetricCollector]) – A
ResourceMetricCollector
instance to collect metrics. If not given, it will collect metrics for all GPUs and subprocess of the current process.interval (Optional[float]) – The collect interval. If not given, use
collector.interval
.on_start (Optional[Callable[[ResourceMetricCollector], None]]) – A function to initialize the daemon thread and collector.
on_stop (Optional[Callable[[ResourceMetricCollector], None]]) – A function that do some necessary cleanup after the daemon thread is stopped.
tag (str) – The tag prefix used for metrics results.
start (bool) – Whether to start the daemon thread on return.
- Returns: threading.Thread
A daemon thread object.
Examples
logger = ... def on_collect(metrics): # will be called periodically if logger.is_closed(): # closed manually by user return False logger.log(metrics) return True def on_stop(collector): # will be called only once at stop if not logger.is_closed(): logger.close() # cleanup # Record metrics to the logger in the background every 5 seconds. # It will collect 5-second mean/min/max for each metric. collect_in_background( on_collect, ResourceMetricCollector(Device.cuda.all()), interval=5.0, on_stop=on_stop, )
- class nvitop.ResourceMetricCollector(devices: Iterable[Device] | None = None, root_pids: Iterable[int] | None = None, interval: float = 1.0)[source]
Bases:
object
A class for collecting resource metrics.
- Parameters:
devices (Iterable[Device]) – Set of Device instances for logging. If not given, all physical devices on board will be used.
root_pids (Set[int]) – A set of PIDs, only the status of the descendant processes on the GPUs will be collected. If not given, the PID of the current process will be used.
interval (float) – The snapshot interval for background daemon thread.
Core methods:
collector.activate(tag='<tag>') # alias: start collector.deactivate() # alias: stop collector.reset(tag='<tag>') collector.collect() with collector(tag='<tag>'): ... collector.daemonize(on_collect_fn)
Examples
>>> import os >>> os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID' >>> os.environ['CUDA_VISIBLE_DEVICES'] = '3,2,1,0'
>>> from nvitop import ResourceMetricCollector, Device
>>> collector = ResourceMetricCollector() # log all devices and descendant processes of the current process on the GPUs >>> collector = ResourceMetricCollector(root_pids={1}) # log all devices and all GPU processes >>> collector = ResourceMetricCollector(devices=Device.cuda.all()) # use the CUDA ordinal
>>> with collector(tag='<tag>'): ... # Do something ... collector.collect() # -> Dict[str, float] # key -> '<tag>/<scope>/<metric (unit)>/<mean/min/max>' { '<tag>/host/cpu_percent (%)/mean': 8.967849777683456, '<tag>/host/cpu_percent (%)/min': 6.1, '<tag>/host/cpu_percent (%)/max': 28.1, ..., '<tag>/host/memory_percent (%)/mean': 21.5, '<tag>/host/swap_percent (%)/mean': 0.3, '<tag>/host/memory_used (GiB)/mean': 91.0136418208109, '<tag>/host/load_average (%) (1 min)/mean': 10.251427386878328, '<tag>/host/load_average (%) (5 min)/mean': 10.072539414569503, '<tag>/host/load_average (%) (15 min)/mean': 11.91126970422139, ..., '<tag>/cuda:0 (gpu:3)/memory_used (MiB)/mean': 3.875, '<tag>/cuda:0 (gpu:3)/memory_free (MiB)/mean': 11015.562499999998, '<tag>/cuda:0 (gpu:3)/memory_total (MiB)/mean': 11019.437500000002, '<tag>/cuda:0 (gpu:3)/memory_percent (%)/mean': 0.0, '<tag>/cuda:0 (gpu:3)/gpu_utilization (%)/mean': 0.0, '<tag>/cuda:0 (gpu:3)/memory_utilization (%)/mean': 0.0, '<tag>/cuda:0 (gpu:3)/fan_speed (%)/mean': 22.0, '<tag>/cuda:0 (gpu:3)/temperature (C)/mean': 25.0, '<tag>/cuda:0 (gpu:3)/power_usage (W)/mean': 19.11166264116916, ..., '<tag>/cuda:1 (gpu:2)/memory_used (MiB)/mean': 8878.875, ..., '<tag>/cuda:2 (gpu:1)/memory_used (MiB)/mean': 8182.875, ..., '<tag>/cuda:3 (gpu:0)/memory_used (MiB)/mean': 9286.875, ..., '<tag>/pid:12345/host/cpu_percent (%)/mean': 151.34342772112265, '<tag>/pid:12345/host/host_memory (MiB)/mean': 44749.72373447514, '<tag>/pid:12345/host/host_memory_percent (%)/mean': 8.675082352111717, '<tag>/pid:12345/host/running_time (min)': 336.23803206741576, '<tag>/pid:12345/cuda:1 (gpu:4)/gpu_memory (MiB)/mean': 8861.0, '<tag>/pid:12345/cuda:1 (gpu:4)/gpu_memory_percent (%)/mean': 80.4, '<tag>/pid:12345/cuda:1 (gpu:4)/gpu_memory_utilization (%)/mean': 6.711118172407917, '<tag>/pid:12345/cuda:1 (gpu:4)/gpu_sm_utilization (%)/mean': 48.23283397736476, ..., '<tag>/duration (s)': 7.247399162035435, '<tag>/timestamp': 1655909466.9981883 }
- DEVICE_METRICS: ClassVar[list[tuple[str, str, float | int]]] = [('memory_used', 'memory_used (MiB)', 1048576), ('memory_free', 'memory_free (MiB)', 1048576), ('memory_total', 'memory_total (MiB)', 1048576), ('memory_percent', 'memory_percent (%)', 1.0), ('gpu_utilization', 'gpu_utilization (%)', 1.0), ('memory_utilization', 'memory_utilization (%)', 1.0), ('fan_speed', 'fan_speed (%)', 1.0), ('temperature', 'temperature (C)', 1.0), ('power_usage', 'power_usage (W)', 1000.0)]
- PROCESS_METRICS: ClassVar[list[tuple[str, str | None, str, float | int]]] = [('cpu_percent', 'host', 'cpu_percent (%)', 1.0), ('host_memory', 'host', 'host_memory (MiB)', 1048576), ('host_memory_percent', 'host', 'host_memory_percent (%)', 1.0), ('running_time_in_seconds', 'host', 'running_time (min)', 60.0), ('gpu_memory', None, 'gpu_memory (MiB)', 1048576), ('gpu_memory_percent', None, 'gpu_memory_percent (%)', 1.0), ('gpu_memory_utilization', None, 'gpu_memory_utilization (%)', 1.0), ('gpu_sm_utilization', None, 'gpu_sm_utilization (%)', 1.0)]
- __init__(devices: Iterable[Device] | None = None, root_pids: Iterable[int] | None = None, interval: float = 1.0) None [source]
Initialize the resource metric collector.
- interval: float
- activate(tag: str) ResourceMetricCollector [source]
Start a new metric collection with the given tag.
- Parameters:
tag (str) – The name of the new metric collection. The tag will be used to identify the metric collection. It must be a unique string.
Examples
>>> collector = ResourceMetricCollector()
>>> collector.activate(tag='train') # key prefix -> 'train' >>> collector.activate(tag='batch') # key prefix -> 'train/batch' >>> collector.deactivate() # key prefix -> 'train' >>> collector.deactivate() # the collector has been stopped >>> collector.activate(tag='test') # key prefix -> 'test'
- start(tag: str) ResourceMetricCollector
Start a new metric collection with the given tag.
- Parameters:
tag (str) – The name of the new metric collection. The tag will be used to identify the metric collection. It must be a unique string.
Examples
>>> collector = ResourceMetricCollector()
>>> collector.activate(tag='train') # key prefix -> 'train' >>> collector.activate(tag='batch') # key prefix -> 'train/batch' >>> collector.deactivate() # key prefix -> 'train' >>> collector.deactivate() # the collector has been stopped >>> collector.activate(tag='test') # key prefix -> 'test'
- deactivate(tag: str | None = None) ResourceMetricCollector [source]
Stop the current collection with the given tag and remove all sub-tags.
If the tag is not specified, deactivate the current active collection. For nested collections, the sub-collections will be deactivated as well.
- stop(tag: str | None = None) ResourceMetricCollector
Stop the current collection with the given tag and remove all sub-tags.
If the tag is not specified, deactivate the current active collection. For nested collections, the sub-collections will be deactivated as well.
- context(tag: str) Generator[ResourceMetricCollector, None, None] [source]
A context manager for starting and stopping resource metric collection.
- Parameters:
tag (str) – The name of the new metric collection. The tag will be used to identify the metric collection. It must be a unique string.
Examples
>>> collector = ResourceMetricCollector()
>>> with collector.context(tag='train'): # key prefix -> 'train' ... # Do something ... collector.collect() # -> Dict[str, float]
- __call__(tag: str) Generator[ResourceMetricCollector, None, None]
A context manager for starting and stopping resource metric collection.
- Parameters:
tag (str) – The name of the new metric collection. The tag will be used to identify the metric collection. It must be a unique string.
Examples
>>> collector = ResourceMetricCollector()
>>> with collector.context(tag='train'): # key prefix -> 'train' ... # Do something ... collector.collect() # -> Dict[str, float]
- clear(tag: str | None = None) None [source]
Reset the metric collection with the given tag.
If the tag is not specified, reset the current active collection. For nested collections, the sub-collections will be reset as well.
- Parameters:
tag (Optional[str]) – The tag to reset. If
None
, the current active collection will be reset.
Examples
>>> collector = ResourceMetricCollector()
>>> with collector(tag='train'): # key prefix -> 'train' ... time.sleep(5.0) ... collector.collect() # metrics within the 5.0s interval ... ... time.sleep(5.0) ... collector.collect() # metrics within the cumulative 10.0s interval ... ... collector.reset() # reset the active collection ... time.sleep(5.0) ... collector.collect() # metrics within the 5.0s interval ... ... with collector(tag='batch'): # key prefix -> 'train/batch' ... collector.reset(tag='train') # reset both 'train' and 'train/batch'
- daemonize(on_collect: Callable[[dict[str, float]], bool], interval: float | None = None, *, on_start: Callable[[ResourceMetricCollector], None] | None = None, on_stop: Callable[[ResourceMetricCollector], None] | None = None, tag: str = 'metrics-daemon', start: bool = True) threading.Thread [source]
Start a background daemon thread that collect and call the callback function periodically.
See also
collect_in_background()
.- Parameters:
on_collect (Callable[[Dict[str, float]], bool]) – A callback function that will be called periodically. It takes a dictionary containing the resource metrics and returns a boolean indicating whether to continue monitoring.
interval (Optional[float]) – The collect interval. If not given, use
collector.interval
.on_start (Optional[Callable[[ResourceMetricCollector], None]]) – A function to initialize the daemon thread and collector.
on_stop (Optional[Callable[[ResourceMetricCollector], None]]) – A function that do some necessary cleanup after the daemon thread is stopped.
tag (str) – The tag prefix used for metrics results.
start (bool) – Whether to start the daemon thread on return.
- Returns: threading.Thread
A daemon thread object.
Examples
logger = ... def on_collect(metrics): # will be called periodically if logger.is_closed(): # closed manually by user return False logger.log(metrics) return True def on_stop(collector): # will be called only once at stop if not logger.is_closed(): logger.close() # cleanup # Record metrics to the logger in the background every 5 seconds. # It will collect 5-second mean/min/max for each metric. ResourceMetricCollector(Device.cuda.all()).daemonize( on_collect, ResourceMetricCollector(Device.cuda.all()), interval=5.0, on_stop=on_stop, )
- take_snapshots() SnapshotResult [source]
Take snapshots of the current resource metrics and update the metric buffer.
- nvitop.bytes2human(b: int | float | NaType, *, min_unit: int = 1) str [source]
Convert bytes to a human readable string.
- nvitop.human2bytes(s: int | str) int [source]
Convert a human readable size string (case insensitive) to bytes.
- Raises:
ValueError – If cannot convert the given size string.
Examples
>>> human2bytes('500B') 500 >>> human2bytes('10k') 10000 >>> human2bytes('10ki') 10240 >>> human2bytes('1M') 1000000 >>> human2bytes('1MiB') 1048576 >>> human2bytes('1.5GiB') 1610612736
- nvitop.timedelta2human(dt: int | float | datetime.timedelta | NaType, *, round: bool = False) str [source]
Convert a number in seconds or a
datetime.timedelta
instance to a human readable string.
- nvitop.utilization2string(utilization: int | float | NaType) str [source]
Convert a utilization rate to string.
- nvitop.colored(text: str, color: str | None = None, on_color: str | None = None, attrs: Iterable[str] | None = None) str [source]
Colorize text with ANSI color escape codes.
- Available text colors:
red, green, yellow, blue, magenta, cyan, white.
- Available text highlights:
on_red, on_green, on_yellow, on_blue, on_magenta, on_cyan, on_white.
- Available attributes:
bold, dark, underline, blink, reverse, concealed.
Examples
>>> colored('Hello, World!', 'red', 'on_grey', ['blue', 'blink']) >>> colored('Hello, World!', 'green')
- nvitop.boolify(string: str, default: Any | None = None) bool [source]
Convert the given value, usually a string, to boolean.
- class nvitop.Snapshot(real: Any, **items: Any)[source]
Bases:
object
A dict-like object holds the snapshot values.
The value can be accessed by
snapshot.name
orsnapshot['name']
syntax. The Snapshot can also be converted to a dictionary bydict(snapshot)
or{**snapshot}
.Missing attributes will be automatically fetched from the original object.
- __init__(real: Any, **items: Any) None [source]
Initialize a new
Snapshot
object with the given attributes.
- __getattr__(name: str) Any [source]
Get a member from the instance.
If the attribute is not defined, fetches from the original object and makes a function call.