Monitoring GPU performance

Monitoring GPU performance

Introduction

This document will demonstrate several techniques that can be used to observe GPU utilization. GPU monitoring is critical in understanding how effective your application is at utilizing attached GPUs. 

We will use Ubuntu 20.04 Application Bundle configured with a multi-instance GPU (MIG) for this example.

Launch Ubuntu application

For additional documentation on how to launch an Application please refer to Running your first application


Graphical user interface, application, TeamsDescription automatically generated

Monitor from command-line

The screenshots below are examples of an unutilized MIG that has no processes running. 

nvida-smi is a command that can be used to display a snapshot of current GPU utilization. 


To get a continuous output of GPU utilization include the -l option followed by a time interval (seconds).


Another command to continuously monitor GPU utilization is nvidia-smi dmon


nvidia-smi can be captured for logging; for additional documentation on nvidia-smi queries, refer to useful nvidia-smi queries


    • Related Articles

    • Why A100?

      NVIDIA A100 The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration to many domains - AI Training, AI Inference, HPC, and data analytics. The A100 is the current-generation engine of the NVIDIA data center platform and provides ...
    • Cluster hardware overview

      Denvr Cloud clusters are hosted on Equinix Fabric which provides internet-based access, as well as direct-connect and VPN secure private access. MSC1 (Calgary) The MSC1 environment (Modular SuperCluster 1) contains 500+ NVIDIA A100 GPUs and ...
    • Storage platform overview

      The Denvr Cloud is integrated with a high-performance storage platform that maximizes overall system performance. There are three separate storage systems that work together to deliver throughput, parallel I/O, and high IOPS, as well as ...
    • Running your first application

      This tutorial will use the PyTorch 1.8.2 application which provides a Ubuntu command line and JupyterLab web interface for development. Select application bundle Navigate to 'Applications' then 'Bundles' and select an application to deploy. Name your ...
    • Release Notes

      Release Notes (July 15, 2023) We're excited to announce several updates to Denvr AI Cloud! Here's what's new: Virtual machines: UI and API is released to support virtual machine for 4-GPU A100 PCIe full nodes. VMs should be used by teams requiring ...