Storage platform overview

The Denvr Cloud is integrated with a high-performance storage platform that maximizes overall system performance. There are three separate storage systems that work together to deliver throughput, parallel I/O, and high IOPS, as well as fault-tolerant protection.

User Storage

Features:

Petabyte-scale RAID-protected storage, optimized for low cost, parallel I/O, and persistent storage
Up to 1 GB/s of read throughput
Encrypted at rest and delivered to GPUs over tenant-private networking

Primary use:

Used for home directory files, application code and configuration, and datasets
Clients may optionally prefer to keep data in their own on-prem storage systems and leverage Denvr for caching
GPUs can process data directly from User Storage, but may benefit from Cache storage depending on your use

Performance Storage

Features:

Petabyte-scale network-attached Flash SSD with configurable data protection
Up to 5 GB/s read throughput
Encrypted at rest and delivered to GPUs over tenant-private networking

Primary use:

Working space for very large datasets that are copied in and out for local Cache
Application file systems, databases, and large capacity/high IOPS requirements
Faster version of User Storage but is typically used for pre-processing and streaming large datasets in and out of Cache

Cache Storage

Features:

Up to 16 TB of direct-attached NVMe SSD per bare metal node
Lowest latency and up to 12 GB/s read throughput
Cache is non-persistent and is freed when applications are stopped

Primary uses:

Volumes provide local storage for model training data, checkpoint files, and intermediate results data
Data loaders should copy training data into the Cache Storage to ensure full GPU utilization and not bottleneck performance on slower network-attached storage

Related Articles
Cluster hardware overview
Denvr Cloud clusters are hosted on Equinix Fabric which provides internet-based access, as well as direct-connect and VPN secure private access. MSC1 (Calgary) The MSC1 environment (Modular SuperCluster 1) contains 500+ NVIDIA A100 GPUs and ...
Transferring data files using JupyterLab
Introduction This tutorial will demonstrate use of Jupyter Lab for file transfer. Files can be read/write to the operating system disks or the Denvr Storage platforms using the /data/ filesystem. Launch Jupyter application For additional ...
Transferring data files using SFTP
Introduction This tutorial will demonstrate use of SFTP (Secure File Transfer Protocol) to transfer files into an application instance. Files can be read/write to the operating system disks or the Denvr Storage platforms using the /data/ filesystem. ...
Running your first application
This tutorial will use the PyTorch 1.8.2 application which provides a Ubuntu command line and JupyterLab web interface for development. Select application bundle Navigate to 'Applications' then 'Bundles' and select an application to deploy. Name your ...
Why A100?
NVIDIA A100 The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration to many domains - AI Training, AI Inference, HPC, and data analytics. The A100 is the current-generation engine of the NVIDIA data center platform and provides ...

Storage platform overview

Storage platform overview

User Storage

Performance Storage

Cache Storage

Related Articles

Cluster hardware overview

Transferring data files using JupyterLab

Transferring data files using SFTP

Running your first application

Why A100?