Skip to content

Commit 6609c64

Browse files
author
Edoardo Holzl
committed
Merge conflict
2 parents d2f6b9e + 4a8e90a commit 6609c64

2 files changed

Lines changed: 155 additions & 1 deletion

File tree

_data/authors.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,8 @@ m_jaggi:
88

99
e_hoelzl:
1010
name: Edoardo Hölzl
11-
web: https://github.com/ehoelzl
11+
web: https://github.com/ehoelzl
12+
13+
m_milenkoski:
14+
name: M. Milenkoski
15+
web: https://github.com/mmilenkoski

_drafts/mlbench-vs-mlperf.md

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
---
2+
layout: post
3+
title: Comparison of MLBench and MLPerf
4+
author: m_milenkoski
5+
---
6+
7+
MLPerf is a broad benchmark suite for measuring the performance of machine learning (ML) software frameworks, ML hardware platforms and ML cloud platforms.
8+
9+
In this post, we will highlight the main differences between MLBench and MLPerf.
10+
11+
## Key Advantages of MLBench
12+
13+
- Reference implementations for distributed execution instead of only single-node execution as in MLPerf
14+
- Improved ease-of-use of benchmarks through the CLI or Dashboard instead of the manual setup in MLPerf
15+
- Public cloud support (currently Google Cloud, AWS and local deployment)
16+
- Full PyTorch support
17+
- Fine-grained metrics for the execution time of all relevant components
18+
- Performance comparisons between different communication backends
19+
- Additional light goal for each task, for quick iterations
20+
- Showing scaling efficiency when increasing number of nodes
21+
- Providing many different optimization algorithms and communication patterns
22+
- Support for easier local development of distributed machine learning methods (via kubernetes in docker)
23+
24+
## Results reporting
25+
26+
Both MLBench and MLPerf use end-to-end time to accuracy. However, MLBench reports how much time was used on communication and how much on computing, while MLPerf does not. Furthermore, MLBench reports more fine grained results in comparison to MLPerf. While MLPerf allows for distributed training, it does not distinguish the results obtained from single node and multi-node training. Moreover, MLPerf does not distinguish between single GPU and multiple GPUs per node. While the number of nodes and GPUs per node are reported, there does not seem to be any fine grained reporting on the amount of time spent on communication and computation. MLPerf reports only one number for all possible scenarios - time to accuracy. For this reason, MLPerf can not accurately show the effects of scaling the number of nodes or GPUs. They are also not able to pinpoint the reason for an improved or decreased performance of a model because their reported results are not fine grained. On the other hand, MLBench is fully focused on distributed training, can show the effects of scaling, can identify the bottlenecks in the model performance and can accurately show the effects of the model hyperparameters on different parts like communication and computation. In this way, MLBench offers a much more powerful and versatile benchmarking suite than the one offered by MLPerf.
27+
28+
## Hyperparameter tuning
29+
30+
MLPerf restricts the set of hyperparameters that can be tuned. It also allows users to borrow hyperparameters from others. MLBench currently provides exact values for all hyperparameters.
31+
32+
## Benchmark Suites
33+
34+
<table>
35+
<thead>
36+
<tr>
37+
<th rowspan="2">Benchmark</th>
38+
<th colspan="2">Dataset</th>
39+
<th colspan="2">Quality Target</th>
40+
<th colspan="2">Reference Implementation Model</th>
41+
<th colspan="2">Frameworks</th>
42+
</tr>
43+
<tr>
44+
<td>MLBench</td>
45+
<td>MLPerf</td>
46+
<td>MLBench</td>
47+
<td>MLPerf</td>
48+
<td>MLBench</td>
49+
<td>MLPerf</td>
50+
<td>MLBench</td>
51+
<td>MLPerf</td>
52+
</tr>
53+
</thead>
54+
<tbody>
55+
<tr>
56+
<td>Image classification</td>
57+
<td>CIFAR10 (32x32)</td>
58+
<td>/</td>
59+
<td>80% Top-1 Accuracy</td>
60+
<td>/</td>
61+
<td>ResNet-20</td>
62+
<td>/</td>
63+
<td>PyTorch, Tensorflow</td>
64+
<td>/</td>
65+
</tr>
66+
<tr>
67+
<td>Image classification</td>
68+
<td colspan="2">ImageNet (224x224)</td>
69+
<td>TODO</td>
70+
<td>75.9% Top-1 Accuracy</td>
71+
<td>TODO</td>
72+
<td>Resnet-50 v1.5</td>
73+
<td>TODO</td>
74+
<td>MXNet, Tensorflow</td>
75+
</tr>
76+
<tr>
77+
<td>Object detection (light weight)</td>
78+
<td>/</td>
79+
<td>COCO 2017</td>
80+
<td>/</td>
81+
<td>23% mAP</td>
82+
<td>/</td>
83+
<td>SSD-ResNet34</td>
84+
<td>/</td>
85+
<td>Tensorflow, PyTorch</td>
86+
</tr>
87+
<tr>
88+
<td>Object detection (heavy weight)</td>
89+
<td>/</td>
90+
<td>COCO 2017</td>
91+
<td>/</td>
92+
<td>0.377 Box min AP, 0.339 Mask min AP</td>
93+
<td>/</td>
94+
<td>Mask R-CNN</td>
95+
<td>/</td>
96+
<td>Tensorflow, PyTorch</td>
97+
</tr>
98+
<tr>
99+
<td>Language Modelling</td>
100+
<td>Wikitext2</td>
101+
<td>/</td>
102+
<td>Perplexity &lt;= 50</td>
103+
<td>/</td>
104+
<td>RNN-LM</td>
105+
<td>/</td>
106+
<td>PyTorch</td>
107+
<td>/</td>
108+
</tr>
109+
<tr>
110+
<td>Translation (recurrent)</td>
111+
<td>WMT16 EN-DE</td>
112+
<td>WMT English-German</td>
113+
<td colspan="2">24.0 BLEU</td>
114+
<td colspan="2">GNMT</td>
115+
<td>PyTorch</td>
116+
<td>Tensorflow, PyTorch</td>
117+
</tr>
118+
<tr>
119+
<td>Translation (non-recurrent)</td>
120+
<td>WMT17 EN-DE</td>
121+
<td>WMT English-German</td>
122+
<td colspan="2">25.0 BLEU</td>
123+
<td colspan="2">Transformer</td>
124+
<td>PyTorch</td>
125+
<td>Tensorflow, PyTorch</td>
126+
</tr>
127+
<tr>
128+
<td>Recommendation</td>
129+
<td>/</td>
130+
<td>Undergoing modification</td>
131+
<td>/</td>
132+
<td></td>
133+
<td>/</td>
134+
<td></td>
135+
<td>/</td>
136+
<td></td>
137+
</tr>
138+
<tr>
139+
<td>Reinforcement learning</td>
140+
<td>/</td>
141+
<td>N/A</td>
142+
<td>/</td>
143+
<td>Pre-trained checkpoint</td>
144+
<td>/</td>
145+
<td>Mini Go</td>
146+
<td>/</td>
147+
<td>Tensorflow</td>
148+
</tr>
149+
</tbody>
150+
</table>

0 commit comments

Comments
 (0)