HPCシステムズではエンジニアを募集しています。詳しくはこちらをご覧ください。
HPCシステムズのエンジニア達による技術ブログ

Tech Blog

NVIDIA® V100S vs A100™ Deep Learning Benchmarks

NVIDIA V100S PCIe 32GB とA100™ PCIe 40GB の Deep Learning 学習での性能評価のため、HPC5000-XCLGPU4TS (PCIe Gen3)とHPC5000-ERMGPU8R4S (PCIe Gen4)にGPUを1枚、2枚、3枚、4枚を実装して、nvidia/tensorflow:20.11-tf1-py3を実行してみました。

製品名:HPC5000-XCLGPU4TS

CPU Intel(R) Xeon(R) Gold 6242, CPUクロック 2.8 GHz, CPUコア 32, メモリ容量 192GB, 2933 MT/s   2.35 GB/s, GPU ネットワーク Gen3, GPUネットワークスピード16GB/s,                                        NVIDIA Driver:450.80.02                                                                                                        Framework: nvidia/tensorflow:20.11-tf1-py3

製品名:HPC5000-ERMGPU8R4S

CPU AMD EPYC 7742 2.25 GHz 128core, DDR4 256 GB, 3200 MT/s(2.56 GB/s)                            GPU ネットワーク Gen4, GPUネットワークスピード:32 GB/s                                                                NVIDIA Driver:450.80.02                                                                                                        Framework: nvidia/tensorflow:20.11-tf1-py3

Model:    inception-v3                                                                                                      Precision       fp16

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

inception-v3

fp16

230

–bind-to socket

2651.79

Tesla A100

4

inception-v3

fp16

230

–bind-to socket

4366.79

Tesla V100S

3

inception-v3

fp16

230

–bind-to socket

2112.24

Tesla A100

3

inception-v3

fp16

230

–bind-to socket

3296.77

Tesla V100S

2

inception-v3

fp16

230

–bind-to socket

1427.38

Tesla A100

2

inception-v3

fp16

230

–bind-to socket

2250.57

Tesla V100S

1

inception-v3

fp16

230

–bind-to socket

771.76

Tesla A100

1

inception-v3

fp16

230

–bind-to socket

1272.74

Model            inception-v3                                                                                                              Precision        fp32

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

inception-v3

fp32

115

–bind-to socket

1066.91

Tesla A100

4

inception-v3

fp32

115

–bind-to socket

2095.08

Tesla V100S

3

inception-v3

fp32

115

–bind-to socket

804.49

Tesla A100

3

inception-v3

fp32

115

–bind-to socket

1580.59

Tesla V100S

2

inception-v3

fp32

115

–bind-to socket

542.67

Tesla A100

2

inception-v3

fp32

115

–bind-to socket

1080.17

Tesla V100S

1

inception-v3

fp32

115

–bind-to socket

288.66

Tesla A100

1

inception-v3

fp32

115

–bind-to socket

603.47

 

Model            inception-v4                                                                                                          Precision        fp16

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

inception-v4

fp16

120

–bind-to socket

1301.5

Tesla A100

4

inception-v4

fp16

120

–bind-to socket

2005.06

Tesla V100S

3

inception-v4

fp16

120

–bind-to socket

991.36

Tesla A100

3

inception-v4

fp16

120

–bind-to socket

1529.07

Tesla V100S

2

inception-v4

fp16

120

–bind-to socket

671.04

Tesla A100

2

inception-v4

fp16

120

–bind-to socket

1057.1

Tesla V100S

1

inception-v4

fp16

120

–bind-to socket

377.97

Tesla A100

1

inception-v4

fp16

120

–bind-to socket

648.44

 

Model            inception-v4                                                                                                          Precision        fp32

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

inception-v4

fp32

60

–bind-to socket

468.36

Tesla A100

4

inception-v4

fp32

60

–bind-to socket

882.08

Tesla V100S

3

inception-v4

fp32

60

–bind-to socket

354.46

Tesla A100

3

inception-v4

fp32

60

–bind-to socket

674.95

Tesla V100S

2

inception-v4

fp32

60

–bind-to socket

238.46

Tesla A100

2

inception-v4

fp32

60

–bind-to socket

467.62

Tesla V100S

1

inception-v4

fp32

60

–bind-to socket

130.47

Tesla A100

1

inception-v4

fp32

60

–bind-to socket

282.37

 

 Model            inception-resnet-v2                                                                                         Precision        fp16

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

inception-resnet-v2

fp16

128

–bind-to socket

1463.21

Tesla A100

4

inception-resnet-v2

fp16

128

–bind-to socket

2518.65

Tesla V100S

3

inception-resnet-v2

fp16

128

–bind-to socket

1165.17

Tesla A100

3

inception-resnet-v2

fp16

128

–bind-to socket

1913.94

Tesla V100S

2

inception-resnet-v2

fp16

128

–bind-to socket

790.47

Tesla A100

2

inception-resnet-v2

fp16

128

–bind-to socket

1318.09

Tesla V100S

1

inception-resnet-v2

fp16

128

–bind-to socket

439.78

Tesla A100

1

inception-resnet-v2

fp16

128

–bind-to socket

786.83

 

Model            inception-resnet-v2                                                                                          Precision        fp32

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

inception-resnet-v2

fp32

64

–bind-to socket

557.01

Tesla A100

4

inception-resnet-v2

fp32

64

–bind-to socket

1104.13

Tesla V100S

3

inception-resnet-v2

fp32

64

–bind-to socket

420.72

Tesla A100

3

inception-resnet-v2

fp32

64

–bind-to socket

844.99

Tesla V100S

2

inception-resnet-v2

fp32

64

–bind-to socket

281.03

Tesla A100

2

inception-resnet-v2

fp32

64

–bind-to socket

574.72

Tesla V100S

1

inception-resnet-v2

fp32

64

–bind-to socket

153.28

Tesla A100

1

inception-resnet-v2

fp32

64

–bind-to socket

336.39

 

Model            resnet-101                                                                                                        Precision        fp16

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

resnet-101

fp16

170

–bind-to socket

2476.54

Tesla A100

4

resnet-101

fp16

170

–bind-to socket

3772.25

Tesla V100S

3

resnet-101

fp16

170

–bind-to socket

1883.83

Tesla A100

3

resnet-101

fp16

170

–bind-to socket

2866.09

Tesla V100S

2

resnet-101

fp16

170

–bind-to socket

1289.79

Tesla A100

2

resnet-101

fp16

170

–bind-to socket

2029.69

Tesla V100S

1

resnet-101

fp16

170

–bind-to socket

744.59

Tesla A100

1

resnet-101

fp16

170

–bind-to socket

1277.85

 

Model            resnet-101                                                                                                      Precision        fp32

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

resnet-101

fp32

85

–bind-to socket

864.48

Tesla A100

4

resnet-101

fp32

85

–bind-to socket

1623.91

Tesla V100S

3

resnet-101

fp32

85

–bind-to socket

649.86

Tesla A100

3

resnet-101

fp32

85

–bind-to socket

1240.8

Tesla V100S

2

resnet-101

fp32

85

–bind-to socket

443.43

Tesla A100

2

resnet-101

fp32

85

–bind-to socket

880.53

Tesla V100S

1

resnet-101

fp32

85

–bind-to socket

245.34

Tesla A100

1

resnet-101

fp32

85

–bind-to socket

547.69

 

Model            resnet-152                                                                                                      Precision        fp16

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

resnet-152

fp16

120

–bind-to socket

1412.45

Tesla A100

4

resnet-152

fp16

120

–bind-to socket

2313.47

Tesla V100S

3

resnet-152

fp16

120

–bind-to socket

1073.74

Tesla A100

3

resnet-152

fp16

120

–bind-to socket

1772.58

Tesla V100S

2

resnet-152

fp16

120

–bind-to socket

741.07

Tesla A100

2

resnet-152

fp16

120

–bind-to socket

1260.54

Tesla V100S

1

resnet-152

fp16

120

–bind-to socket

440.26

Tesla A100

1

resnet-152

fp16

120

–bind-to socket

849.27

 

Model            resnet-152                                                                                                        Precision        fp32

GPU

# of GPU

model

precision

batch size/GPU

other option

images/sec

Tesla V100S

4

resnet-152

fp32

60

–bind-to socket

564.35

Tesla A100

4

resnet-152

fp32

60

–bind-to socket

879.07

Tesla V100S

3

resnet-152

fp32

60

–bind-to socket

427.98

Tesla A100

3

resnet-152

fp32

60

–bind-to socket

684.37

Tesla V100S

2

resnet-152

fp32

60

–bind-to socket

289.93

Tesla A100

2

resnet-152

fp32

60

–bind-to socket

480.71

Tesla V100S

1

resnet-152

fp32

60

–bind-to socket

165.83

Tesla A100

1

resnet-152

fp32

60

–bind-to socket

311.03

 

 

 

NVIDIA® V100S-PCIe

NVIDIA® A100™-PCIe

GPU Architecture

NVIDIA Volta

NVIDIA Ampere

NVIDIA Tensor Cores

640

432

NVIDIA CUDA Cores

5,120

6,912

Double-Precision Performance

8.2 TFLOPS

9.7 TFLOPS

Single-Precision Performance

16.4 TFLOPS

19.5 TFLOPS

Tensor Performance

130 TFLOPS

312 TFLOPS/624 TFLOPS*

GPU Memory

32 GB HBM2

40 GB HBM2e

Memory Bandwidth

1134 GB/sec

1,555 GB/s

System Interface

PCIe Gen3.0 x16

PCIe Gen4.0 x16

Thermal Solution

Passive

Passive

TDP

250W

250W

*Effective TOPS/ TFLPOS using the new Sparsity feature (* スパース行列の場合)