NVIDIA V100S PCIe 32GB とA100™ PCIe 40GB の Deep Learning 学習での性能評価のため、HPC5000-XCLGPU4TS (PCIe Gen3)とHPC5000-ERMGPU8R4S (PCIe Gen4)にGPUを1枚、2枚、3枚、4枚を実装して、nvidia/tensorflow:20.11-tf1-py3を実行してみました。
CPU Intel(R) Xeon(R) Gold 6242, CPUクロック 2.8 GHz, CPUコア 32, メモリ容量 192GB, 2933 MT/s 2.35 GB/s, GPU ネットワーク Gen3, GPUネットワークスピード16GB/s, NVIDIA Driver:450.80.02 Framework: nvidia/tensorflow:20.11-tf1-py3
CPU AMD EPYC 7742 2.25 GHz 128core, DDR4 256 GB, 3200 MT/s(2.56 GB/s) GPU ネットワーク Gen4, GPUネットワークスピード:32 GB/s NVIDIA Driver:450.80.02 Framework: nvidia/tensorflow:20.11-tf1-py3
Model: inception-v3 Precision fp16
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
inception-v3 |
fp16 |
230 |
–bind-to socket |
2651.79 |
Tesla A100 |
4 |
inception-v3 |
fp16 |
230 |
–bind-to socket |
4366.79 |
Tesla V100S |
3 |
inception-v3 |
fp16 |
230 |
–bind-to socket |
2112.24 |
Tesla A100 |
3 |
inception-v3 |
fp16 |
230 |
–bind-to socket |
3296.77 |
Tesla V100S |
2 |
inception-v3 |
fp16 |
230 |
–bind-to socket |
1427.38 |
Tesla A100 |
2 |
inception-v3 |
fp16 |
230 |
–bind-to socket |
2250.57 |
Tesla V100S |
1 |
inception-v3 |
fp16 |
230 |
–bind-to socket |
771.76 |
Tesla A100 |
1 |
inception-v3 |
fp16 |
230 |
–bind-to socket |
1272.74 |
Model inception-v3 Precision fp32
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
inception-v3 |
fp32 |
115 |
–bind-to socket |
1066.91 |
Tesla A100 |
4 |
inception-v3 |
fp32 |
115 |
–bind-to socket |
2095.08 |
Tesla V100S |
3 |
inception-v3 |
fp32 |
115 |
–bind-to socket |
804.49 |
Tesla A100 |
3 |
inception-v3 |
fp32 |
115 |
–bind-to socket |
1580.59 |
Tesla V100S |
2 |
inception-v3 |
fp32 |
115 |
–bind-to socket |
542.67 |
Tesla A100 |
2 |
inception-v3 |
fp32 |
115 |
–bind-to socket |
1080.17 |
Tesla V100S |
1 |
inception-v3 |
fp32 |
115 |
–bind-to socket |
288.66 |
Tesla A100 |
1 |
inception-v3 |
fp32 |
115 |
–bind-to socket |
603.47 |
Model inception-v4 Precision fp16
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
inception-v4 |
fp16 |
120 |
–bind-to socket |
1301.5 |
Tesla A100 |
4 |
inception-v4 |
fp16 |
120 |
–bind-to socket |
2005.06 |
Tesla V100S |
3 |
inception-v4 |
fp16 |
120 |
–bind-to socket |
991.36 |
Tesla A100 |
3 |
inception-v4 |
fp16 |
120 |
–bind-to socket |
1529.07 |
Tesla V100S |
2 |
inception-v4 |
fp16 |
120 |
–bind-to socket |
671.04 |
Tesla A100 |
2 |
inception-v4 |
fp16 |
120 |
–bind-to socket |
1057.1 |
Tesla V100S |
1 |
inception-v4 |
fp16 |
120 |
–bind-to socket |
377.97 |
Tesla A100 |
1 |
inception-v4 |
fp16 |
120 |
–bind-to socket |
648.44 |
Model inception-v4 Precision fp32
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
inception-v4 |
fp32 |
60 |
–bind-to socket |
468.36 |
Tesla A100 |
4 |
inception-v4 |
fp32 |
60 |
–bind-to socket |
882.08 |
Tesla V100S |
3 |
inception-v4 |
fp32 |
60 |
–bind-to socket |
354.46 |
Tesla A100 |
3 |
inception-v4 |
fp32 |
60 |
–bind-to socket |
674.95 |
Tesla V100S |
2 |
inception-v4 |
fp32 |
60 |
–bind-to socket |
238.46 |
Tesla A100 |
2 |
inception-v4 |
fp32 |
60 |
–bind-to socket |
467.62 |
Tesla V100S |
1 |
inception-v4 |
fp32 |
60 |
–bind-to socket |
130.47 |
Tesla A100 |
1 |
inception-v4 |
fp32 |
60 |
–bind-to socket |
282.37 |
Model inception-resnet-v2 Precision fp16
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
inception-resnet-v2 |
fp16 |
128 |
–bind-to socket |
1463.21 |
Tesla A100 |
4 |
inception-resnet-v2 |
fp16 |
128 |
–bind-to socket |
2518.65 |
Tesla V100S |
3 |
inception-resnet-v2 |
fp16 |
128 |
–bind-to socket |
1165.17 |
Tesla A100 |
3 |
inception-resnet-v2 |
fp16 |
128 |
–bind-to socket |
1913.94 |
Tesla V100S |
2 |
inception-resnet-v2 |
fp16 |
128 |
–bind-to socket |
790.47 |
Tesla A100 |
2 |
inception-resnet-v2 |
fp16 |
128 |
–bind-to socket |
1318.09 |
Tesla V100S |
1 |
inception-resnet-v2 |
fp16 |
128 |
–bind-to socket |
439.78 |
Tesla A100 |
1 |
inception-resnet-v2 |
fp16 |
128 |
–bind-to socket |
786.83 |
Model inception-resnet-v2 Precision fp32
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
inception-resnet-v2 |
fp32 |
64 |
–bind-to socket |
557.01 |
Tesla A100 |
4 |
inception-resnet-v2 |
fp32 |
64 |
–bind-to socket |
1104.13 |
Tesla V100S |
3 |
inception-resnet-v2 |
fp32 |
64 |
–bind-to socket |
420.72 |
Tesla A100 |
3 |
inception-resnet-v2 |
fp32 |
64 |
–bind-to socket |
844.99 |
Tesla V100S |
2 |
inception-resnet-v2 |
fp32 |
64 |
–bind-to socket |
281.03 |
Tesla A100 |
2 |
inception-resnet-v2 |
fp32 |
64 |
–bind-to socket |
574.72 |
Tesla V100S |
1 |
inception-resnet-v2 |
fp32 |
64 |
–bind-to socket |
153.28 |
Tesla A100 |
1 |
inception-resnet-v2 |
fp32 |
64 |
–bind-to socket |
336.39 |
Model resnet-101 Precision fp16
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
resnet-101 |
fp16 |
170 |
–bind-to socket |
2476.54 |
Tesla A100 |
4 |
resnet-101 |
fp16 |
170 |
–bind-to socket |
3772.25 |
Tesla V100S |
3 |
resnet-101 |
fp16 |
170 |
–bind-to socket |
1883.83 |
Tesla A100 |
3 |
resnet-101 |
fp16 |
170 |
–bind-to socket |
2866.09 |
Tesla V100S |
2 |
resnet-101 |
fp16 |
170 |
–bind-to socket |
1289.79 |
Tesla A100 |
2 |
resnet-101 |
fp16 |
170 |
–bind-to socket |
2029.69 |
Tesla V100S |
1 |
resnet-101 |
fp16 |
170 |
–bind-to socket |
744.59 |
Tesla A100 |
1 |
resnet-101 |
fp16 |
170 |
–bind-to socket |
1277.85 |
Model resnet-101 Precision fp32
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
resnet-101 |
fp32 |
85 |
–bind-to socket |
864.48 |
Tesla A100 |
4 |
resnet-101 |
fp32 |
85 |
–bind-to socket |
1623.91 |
Tesla V100S |
3 |
resnet-101 |
fp32 |
85 |
–bind-to socket |
649.86 |
Tesla A100 |
3 |
resnet-101 |
fp32 |
85 |
–bind-to socket |
1240.8 |
Tesla V100S |
2 |
resnet-101 |
fp32 |
85 |
–bind-to socket |
443.43 |
Tesla A100 |
2 |
resnet-101 |
fp32 |
85 |
–bind-to socket |
880.53 |
Tesla V100S |
1 |
resnet-101 |
fp32 |
85 |
–bind-to socket |
245.34 |
Tesla A100 |
1 |
resnet-101 |
fp32 |
85 |
–bind-to socket |
547.69 |
Model resnet-152 Precision fp16
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
resnet-152 |
fp16 |
120 |
–bind-to socket |
1412.45 |
Tesla A100 |
4 |
resnet-152 |
fp16 |
120 |
–bind-to socket |
2313.47 |
Tesla V100S |
3 |
resnet-152 |
fp16 |
120 |
–bind-to socket |
1073.74 |
Tesla A100 |
3 |
resnet-152 |
fp16 |
120 |
–bind-to socket |
1772.58 |
Tesla V100S |
2 |
resnet-152 |
fp16 |
120 |
–bind-to socket |
741.07 |
Tesla A100 |
2 |
resnet-152 |
fp16 |
120 |
–bind-to socket |
1260.54 |
Tesla V100S |
1 |
resnet-152 |
fp16 |
120 |
–bind-to socket |
440.26 |
Tesla A100 |
1 |
resnet-152 |
fp16 |
120 |
–bind-to socket |
849.27 |
Model resnet-152 Precision fp32
GPU |
# of GPU |
model |
precision |
batch size/GPU |
other option |
images/sec |
Tesla V100S |
4 |
resnet-152 |
fp32 |
60 |
–bind-to socket |
564.35 |
Tesla A100 |
4 |
resnet-152 |
fp32 |
60 |
–bind-to socket |
879.07 |
Tesla V100S |
3 |
resnet-152 |
fp32 |
60 |
–bind-to socket |
427.98 |
Tesla A100 |
3 |
resnet-152 |
fp32 |
60 |
–bind-to socket |
684.37 |
Tesla V100S |
2 |
resnet-152 |
fp32 |
60 |
–bind-to socket |
289.93 |
Tesla A100 |
2 |
resnet-152 |
fp32 |
60 |
–bind-to socket |
480.71 |
Tesla V100S |
1 |
resnet-152 |
fp32 |
60 |
–bind-to socket |
165.83 |
Tesla A100 |
1 |
resnet-152 |
fp32 |
60 |
–bind-to socket |
311.03 |
NVIDIA® V100S-PCIe |
NVIDIA® A100™-PCIe |
|
GPU Architecture |
NVIDIA Volta |
NVIDIA Ampere |
NVIDIA Tensor Cores |
640 |
432 |
NVIDIA CUDA Cores |
5,120 |
6,912 |
Double-Precision Performance |
8.2 TFLOPS |
9.7 TFLOPS |
Single-Precision Performance |
16.4 TFLOPS |
19.5 TFLOPS |
Tensor Performance |
130 TFLOPS |
312 TFLOPS/624 TFLOPS* |
GPU Memory |
32 GB HBM2 |
40 GB HBM2e |
Memory Bandwidth |
1134 GB/sec |
1,555 GB/s |
System Interface |
PCIe Gen3.0 x16 |
PCIe Gen4.0 x16 |
Thermal Solution |
Passive |
Passive |
TDP |
250W |
250W |
*Effective TOPS/ TFLPOS using the new Sparsity feature (* スパース行列の場合)