Monitoring GPU performance

Monitoring GPU performance

Introduction

This document will demonstrate several techniques that can be used to observe GPU utilization. GPU monitoring is critical in understanding how effective your application is at utilizing attached GPUs. 

We will use Ubuntu 20.04 Application Bundle configured with a multi-instance GPU (MIG) for this example.

Launch Ubuntu application

For additional documentation on how to launch an Application please refer to Running your first application


Graphical user interface, application, TeamsDescription automatically generated

Monitor from command-line

The screenshots below are examples of an unutilized MIG that has no processes running. 

nvida-smi is a command that can be used to display a snapshot of current GPU utilization. 


To get a continuous output of GPU utilization include the -l option followed by a time interval (seconds).


Another command to continuously monitor GPU utilization is nvidia-smi dmon


nvidia-smi can be captured for logging; for additional documentation on nvidia-smi queries, refer to useful nvidia-smi queries


    • Related Articles

    • Why A100?

      NVIDIA A100 The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration to many domains - AI Training, AI Inference, HPC, and data analytics. The A100 is the current-generation engine of the NVIDIA data center platform and provides ...
    • Cluster hardware overview

      Denvr Cloud clusters are hosted on Equinix Fabric which provides internet-based access, as well as direct-connect and VPN secure private access. MSC1 (Calgary) The MSC1 environment (Modular SuperCluster 1) contains 500+ NVIDIA A100 GPUs and ...
    • Storage platform overview

      The Denvr Cloud is integrated with a high-performance storage platform that maximizes overall system performance. There are three separate storage systems that work together to deliver throughput, parallel I/O, and high IOPS, as well as ...
    • Running your first application

      This tutorial will use the PyTorch 1.8.2 application which provides a Ubuntu command line and JupyterLab web interface for development. Select application bundle Navigate to 'Applications' then 'Bundles' and select an application to deploy. Name your ...
    • Transferring data files using JupyterLab

      Introduction This tutorial will demonstrate use of Jupyter Lab for file transfer. Files can be read/write to the operating system disks or the Denvr Storage platforms using the /data/ filesystem. Launch Jupyter application For additional ...