Cluster hardware overview
Denvr Cloud clusters are hosted on Equinix Fabric which provides internet-based access, as well as direct-connect and VPN secure private access.
MSC1 (Calgary)
The
MSC1 environment (Modular SuperCluster 1) contains 500+ NVIDIA
A100 GPUs and non-blocking InfiniBand for demanding HPC, machine learning,
and scientific workloads. User application containers run directly on bare-metal hosts.
Features
- AMD EPYC 3rd Generation Zen3 processors (7003 series)
- Up to 8x NVIDIA A100 (40GB) Tensor Core GPUs
- Support for Multi-Instance GPU (MIG) with 5GB and 20GB partition sizes
- 800 Gbps non-blocking InfiniBand for multi-node training and distributed compute
On-Demand and Reserved nodes
Node Types
| GPUs
| vCPUs
| Memory
(GB)
| GPU-to-GPU
Bandwidth
| Network
Bandwidth
| Local storage
|
NVIDIA A100 (40GB) - NvLink
| 8
| 128
| 1,024
| 600 GB/s
| 800G
| 15.4 TB NVMe SSD
|
NVIDIA A100 (40GB) - PCIe
| 4
| 64
| 512
| 64 GB/s
| 10G
| 7.68 TB NVMe SSD
|
CPU Only
| -
| 256
| 1,024
| -
| 10G
| 7.68 TB NVMe SSD |
Storage tiers
- Local scratch - lowest latency for scratch, model checkpoints, and training data
- Performance - high IIOPS network attached SSD for Application storage, large working datasets, and model checkpoints
- Bulk/archive - Qumulo hybrid HDD/SSD for protected long term file storage
LAB
The LAB environment is a small cluster for technology preview before general availability in Denvr Cloud MSC clusters.
Features
- AMD EPYC 3rd Generation Zen3 processors (7003 series)
- NVIDIA A100 (40GB), NVIDIA A40 (48GB), and AMD Mi210 (64GB)
- Up to 800 Gbps non-blocking InfiniBand for multi-node training and distributed compute
Reserved nodes
Node Types
| GPUs
| vCPUs
| Memory
(GB)
| GPU-to-GPU
Bandwidth
| Network
Bandwidth
| Local storage
|
NVIDIA A100 (40GB) - NvLink
| 8
| 128
| 1,024
| 600 GB/s
| 800G | 15.4 TB NVMe SSD
|
NVIDIA A100 (40GB) - PCIe
| 4
| 64
| 512
| 64 GB/s
| 10G
| 7.68 TB NVMe SSD
|
NVIDIA A40 (48GB)
| 4
| 64
| 256
| 64 GB/s
| 10G
| 7.68 TB NVMe SSD |
AMD Mi210 (64GB)
| 2
| 64
| 256
| 64 GB/s
| 10G
| 7.68 TB NVMe SSD |
Related Articles
Storage platform overview
The Denvr Cloud is integrated with a high-performance storage platform that maximizes overall system performance. There are three separate storage systems that work together to deliver throughput, parallel I/O, and high IOPS, as well as ...
Why A100?
NVIDIA A100 The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration to many domains - AI Training, AI Inference, HPC, and data analytics. The A100 is the current-generation engine of the NVIDIA data center platform and provides ...
Application bundles
Overview Denvr Dataworks offers a new type of application deployment called "Application Bundles". An application bundles packages together all resources necessary to operate a user application, including: Application code in the form of containers ...