-
-
Notifications
You must be signed in to change notification settings - Fork 270
Expand file tree
/
Copy pathsimple_suites_tutorial.py
More file actions
54 lines (46 loc) · 1.95 KB
/
simple_suites_tutorial.py
File metadata and controls
54 lines (46 loc) · 1.95 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# %% [markdown]
# This is a brief showcase of OpenML benchmark suites, which were introduced by
# [Bischl et al. (2019)](https://arxiv.org/abs/1708.03731v2). Benchmark suites standardize the
# datasets and splits to be used in an experiment or paper. They are fully integrated into OpenML
# and simplify both the sharing of the setup and the results.
# %%
import openml
# %% [markdown]
# ## OpenML-CC18
#
# As an example we have a look at the OpenML-CC18, which is a suite of 72 classification datasets
# from OpenML which were carefully selected to be usable by many algorithms. These are all datasets
# from mid-2018 that satisfy a large set of clear requirements for thorough yet practical benchmarking:
#
# 1. the number of observations are between 500 and 100,000 to focus on medium-sized datasets,
# 2. the number of features does not exceed 5,000 features to keep the runtime of the algorithms
# low
# 3. the target attribute has at least two classes with no class having less than 20 observations
# 4. the ratio of the minority class and the majority class is above 0.05 (to eliminate highly
# imbalanced datasets which require special treatment for both algorithms and evaluation
# measures).
#
# A full description can be found in the
# [OpenML benchmarking docs](https://docs.openml.org/benchmark/#openml-cc18).
#
# In this example, we'll focus on how to use benchmark suites in practice.
# %% [markdown]
# ## Downloading benchmark suites
# %%
suite = openml.study.get_suite(99)
print(suite)
# %% [markdown]
# The benchmark suite does not download the included tasks and datasets itself, but only contains
# a list of which tasks constitute the study.
#
# Tasks can then be accessed via
# %%
tasks = suite.tasks
print(tasks)
# %% [markdown]
# and iterated over for benchmarking. For speed reasons, we only iterate over the first three tasks:
# %%
if tasks is not None:
for task_id in tasks[:3]:
task = openml.tasks.get_task(task_id)
print(task)