Skip to content

Commit 5814b08

Browse files
authored
Merge pull request #675 from openml/fix596_dependencies
Fix596 dependencies
2 parents ab8a966 + 973d48a commit 5814b08

10 files changed

Lines changed: 139 additions & 101 deletions

File tree

CONTRIBUTING.md

Lines changed: 31 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ local disk:
1919
$ cd openml-python
2020
```
2121

22-
3. Swith to the ``develop`` branch:
22+
3. Switch to the ``develop`` branch:
2323

2424
```bash
2525
$ git checkout develop
@@ -31,7 +31,8 @@ local disk:
3131
$ git checkout -b feature/my-feature
3232
```
3333

34-
Always use a ``feature`` branch. It's good practice to never work on the ``master`` or ``develop`` branch! To make the nature of your pull request easily visible, please perpend the name of the branch with the type of changes you want to merge, such as ``feature`` if it contains a new feature, ``fix`` for a bugfix, ``doc`` for documentation and ``maint`` for other maintenance on the package.
34+
Always use a ``feature`` branch. It's good practice to never work on the ``master`` or ``develop`` branch!
35+
To make the nature of your pull request easily visible, please prepend the name of the branch with the type of changes you want to merge, such as ``feature`` if it contains a new feature, ``fix`` for a bugfix, ``doc`` for documentation and ``maint`` for other maintenance on the package.
3536

3637
4. Develop the feature on your feature branch. Add changed files using ``git add`` and then ``git commit`` files:
3738

@@ -59,7 +60,15 @@ We recommended that your contribution complies with the
5960
following rules before you submit a pull request:
6061

6162
- Follow the
62-
[pep8 style guilde](https://www.python.org/dev/peps/pep-0008/).
63+
[pep8 style guide](https://www.python.org/dev/peps/pep-0008/).
64+
With the following exceptions or additions:
65+
- The max line length is 100 characters instead of 80.
66+
- When creating a multi-line expression with binary operators, break before the operator.
67+
- Add type hints to all function signatures.
68+
(note: not all functions have type hints yet, this is work in progress.)
69+
- Use the [`str.format`](https://docs.python.org/3/library/stdtypes.html#str.format) over [`printf`](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting) style formatting.
70+
E.g. use `"{} {}".format('hello', 'world')` not `"%s %s" % ('hello', 'world')`.
71+
(note: old code may still use `printf`-formatting, this is work in progress.)
6372

6473
- If your pull request addresses an issue, please use the pull request title
6574
to describe the issue and mention the issue number in the pull request description. This will make sure a link back to the original issue is
@@ -105,18 +114,18 @@ tools:
105114
$ pytest --cov=. path/to/tests_for_package
106115
```
107116

108-
- No pyflakes warnings, check with:
117+
- No style warnings, check with:
109118

110119
```bash
111-
$ pip install pyflakes
112-
$ pyflakes path/to/module.py
120+
$ pip install flake8
121+
$ flake8 --ignore E402,W503 --show-source --max-line-length 100
113122
```
114123

115-
- No PEP8 warnings, check with:
124+
- No mypy (typing) issues, check with:
116125

117126
```bash
118-
$ pip install pep8
119-
$ pep8 path/to/module.py
127+
$ pip install mypy
128+
$ mypy openml --ignore-missing-imports --follow-imports skip
120129
```
121130

122131
Filing bugs
@@ -151,8 +160,8 @@ following rules before submitting:
151160
New contributor tips
152161
--------------------
153162

154-
A great way to start contributing to scikit-learn is to pick an item
155-
from the list of [Easy issues](https://github.com/openml/openml-python/issues?q=label%3Aeasy)
163+
A great way to start contributing to openml-python is to pick an item
164+
from the list of [Good First Issues](https://github.com/openml/openml-python/labels/Good%20first%20issue)
156165
in the issue tracker. Resolving these issues allow you to start
157166
contributing to the project without much prior knowledge. Your
158167
assistance in this area will be greatly appreciated by the more
@@ -175,6 +184,14 @@ information.
175184

176185
For building the documentation, you will need
177186
[sphinx](http://sphinx.pocoo.org/),
178-
[matplotlib](http://matplotlib.org/), and
179-
[pillow](http://pillow.readthedocs.io/en/latest/).
180-
[sphinx-bootstrap-theme](https://ryan-roemer.github.io/sphinx-bootstrap-theme/)
187+
[sphinx-bootstrap-theme](https://ryan-roemer.github.io/sphinx-bootstrap-theme/),
188+
[sphinx-gallery](https://sphinx-gallery.github.io/)
189+
and
190+
[numpydoc](https://numpydoc.readthedocs.io/en/latest/).
191+
```bash
192+
$ pip install sphinx sphinx-bootstrap-theme sphinx-gallery numpydoc
193+
```
194+
When dependencies are installed, run
195+
```bash
196+
$ sphinx-build -b html doc YOUR_PREFERRED_OUTPUT_DIRECTORY
197+
```

ci_scripts/flake8_diff.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
#!/bin/bash
22

3+
# Update /CONTRIBUTING.md if these commands change.
4+
# The reason for not advocating using this script directly is that it
5+
# might not work out of the box on Windows.
36
flake8 --ignore E402,W503 --show-source --max-line-length 100 $options
47
mypy openml --ignore-missing-imports --follow-imports skip

doc/contributing.rst

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,12 +90,19 @@ The package source code is available from
9090
git clone https://github.com/openml/openml-python.git
9191
9292
93-
Once you cloned the package, change into the new directory ``python`` and
94-
execute
93+
Once you cloned the package, change into the new directory.
94+
If you are a regular user, install with
9595

9696
.. code:: bash
9797
98-
python setup.py install
98+
pip install -e .
99+
100+
If you are a contributor, you will also need to install test dependencies
101+
102+
.. code:: bash
103+
104+
pip install -e ".[test]"
105+
99106
100107
Testing
101108
=======

doc/progress.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ Changelog
1212
0.9.0
1313
~~~~~
1414

15+
* MAINT #596: Fewer dependencies for regular pip install.
16+
* MAINT #652: Numpy and Scipy are no longer required before installation.
1517
* ADD #560: OpenML-Python can now handle regression tasks as well.
1618
* MAINT #184: Dropping Python2 support.
1719

openml/datasets/functions.py

Lines changed: 29 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
import io
22
import os
33
import re
4-
import warnings
54
from typing import List, Dict, Union
65

76
import numpy as np
@@ -10,11 +9,6 @@
109

1110
import xmltodict
1211
from scipy.sparse import coo_matrix
13-
# Currently, importing oslo raises a lot of warning that it will stop working
14-
# under python3.8; remove this once they disappear
15-
with warnings.catch_warnings():
16-
warnings.simplefilter("ignore")
17-
from oslo_concurrency import lockutils
1812
from collections import OrderedDict
1913

2014
import openml.utils
@@ -29,8 +23,7 @@
2923
from ..utils import (
3024
_create_cache_directory,
3125
_remove_cache_dir_for_id,
32-
_create_cache_directory_for_id,
33-
_create_lockfiles_dir,
26+
_create_cache_directory_for_id
3427
)
3528

3629

@@ -334,6 +327,7 @@ def get_datasets(
334327
return datasets
335328

336329

330+
@openml.utils.thread_safe_if_oslo_installed
337331
def get_dataset(dataset_id: Union[int, str], download_data: bool = True) -> OpenMLDataset:
338332
""" Download the OpenML dataset representation, optionally also download actual data file.
339333
@@ -361,38 +355,34 @@ def get_dataset(dataset_id: Union[int, str], download_data: bool = True) -> Open
361355
raise ValueError("Dataset ID is neither an Integer nor can be "
362356
"cast to an Integer.")
363357

364-
with lockutils.external_lock(
365-
name='datasets.functions.get_dataset:%d' % dataset_id,
366-
lock_path=_create_lockfiles_dir(),
367-
):
368-
did_cache_dir = _create_cache_directory_for_id(
369-
DATASETS_CACHE_DIR_NAME, dataset_id,
370-
)
358+
did_cache_dir = _create_cache_directory_for_id(
359+
DATASETS_CACHE_DIR_NAME, dataset_id,
360+
)
371361

372-
try:
373-
remove_dataset_cache = True
374-
description = _get_dataset_description(did_cache_dir, dataset_id)
375-
features = _get_dataset_features(did_cache_dir, dataset_id)
376-
qualities = _get_dataset_qualities(did_cache_dir, dataset_id)
377-
378-
arff_file = _get_dataset_arff(description) if download_data else None
379-
380-
remove_dataset_cache = False
381-
except OpenMLServerException as e:
382-
# if there was an exception,
383-
# check if the user had access to the dataset
384-
if e.code == 112:
385-
raise OpenMLPrivateDatasetError(e.message) from None
386-
else:
387-
raise e
388-
finally:
389-
if remove_dataset_cache:
390-
_remove_cache_dir_for_id(DATASETS_CACHE_DIR_NAME,
391-
did_cache_dir)
392-
393-
dataset = _create_dataset_from_description(
394-
description, features, qualities, arff_file
395-
)
362+
try:
363+
remove_dataset_cache = True
364+
description = _get_dataset_description(did_cache_dir, dataset_id)
365+
features = _get_dataset_features(did_cache_dir, dataset_id)
366+
qualities = _get_dataset_qualities(did_cache_dir, dataset_id)
367+
368+
arff_file = _get_dataset_arff(description) if download_data else None
369+
370+
remove_dataset_cache = False
371+
except OpenMLServerException as e:
372+
# if there was an exception,
373+
# check if the user had access to the dataset
374+
if e.code == 112:
375+
raise OpenMLPrivateDatasetError(e.message) from None
376+
else:
377+
raise e
378+
finally:
379+
if remove_dataset_cache:
380+
_remove_cache_dir_for_id(DATASETS_CACHE_DIR_NAME,
381+
did_cache_dir)
382+
383+
dataset = _create_dataset_from_description(
384+
description, features, qualities, arff_file
385+
)
396386
return dataset
397387

398388

openml/flows/functions.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
import re
66
import xmltodict
77
from typing import Union, Dict
8-
from oslo_concurrency import lockutils
98

109
from ..exceptions import OpenMLCacheException
1110
import openml._api_calls
@@ -70,6 +69,7 @@ def _get_cached_flow(fid: int) -> OpenMLFlow:
7069
"cached" % fid)
7170

7271

72+
@openml.utils.thread_safe_if_oslo_installed
7373
def get_flow(flow_id: int, reinstantiate: bool = False) -> OpenMLFlow:
7474
"""Download the OpenML flow for a given flow ID.
7575
@@ -87,11 +87,7 @@ def get_flow(flow_id: int, reinstantiate: bool = False) -> OpenMLFlow:
8787
the flow
8888
"""
8989
flow_id = int(flow_id)
90-
with lockutils.external_lock(
91-
name='flows.functions.get_flow:%d' % flow_id,
92-
lock_path=openml.utils._create_lockfiles_dir(),
93-
):
94-
flow = _get_flow_description(flow_id)
90+
flow = _get_flow_description(flow_id)
9591

9692
if reinstantiate:
9793
flow.model = flow.extension.flow_to_model(flow)

openml/runs/functions.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,7 @@ def get_runs(run_ids):
466466
return runs
467467

468468

469+
@openml.utils.thread_safe_if_oslo_installed
469470
def get_run(run_id):
470471
"""Gets run corresponding to run_id.
471472

openml/tasks/functions.py

Lines changed: 24 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,6 @@
22
import io
33
import re
44
import os
5-
import warnings
6-
7-
# Currently, importing oslo raises a lot of warning that it will stop working
8-
# under python3.8; remove this once they disappear
9-
with warnings.catch_warnings():
10-
warnings.simplefilter("ignore")
11-
from oslo_concurrency import lockutils
125
import xmltodict
136

147
from ..exceptions import OpenMLCacheException
@@ -300,6 +293,7 @@ def get_tasks(task_ids, download_data=True):
300293
return tasks
301294

302295

296+
@openml.utils.thread_safe_if_oslo_installed
303297
def get_task(task_id: int, download_data: bool = True) -> OpenMLTask:
304298
"""Download OpenML task for a given task ID.
305299
@@ -324,34 +318,30 @@ def get_task(task_id: int, download_data: bool = True) -> OpenMLTask:
324318
raise ValueError("Dataset ID is neither an Integer nor can be "
325319
"cast to an Integer.")
326320

327-
with lockutils.external_lock(
328-
name='task.functions.get_task:%d' % task_id,
329-
lock_path=openml.utils._create_lockfiles_dir(),
330-
):
331-
tid_cache_dir = openml.utils._create_cache_directory_for_id(
332-
TASKS_CACHE_DIR_NAME, task_id,
333-
)
321+
tid_cache_dir = openml.utils._create_cache_directory_for_id(
322+
TASKS_CACHE_DIR_NAME, task_id,
323+
)
334324

335-
try:
336-
task = _get_task_description(task_id)
337-
dataset = get_dataset(task.dataset_id, download_data)
338-
# List of class labels availaible in dataset description
339-
# Including class labels as part of task meta data handles
340-
# the case where data download was initially disabled
341-
if isinstance(task, OpenMLClassificationTask):
342-
task.class_labels = \
343-
dataset.retrieve_class_labels(task.target_name)
344-
# Clustering tasks do not have class labels
345-
# and do not offer download_split
346-
if download_data:
347-
if isinstance(task, OpenMLSupervisedTask):
348-
task.download_split()
349-
except Exception as e:
350-
openml.utils._remove_cache_dir_for_id(
351-
TASKS_CACHE_DIR_NAME,
352-
tid_cache_dir,
353-
)
354-
raise e
325+
try:
326+
task = _get_task_description(task_id)
327+
dataset = get_dataset(task.dataset_id, download_data)
328+
# List of class labels availaible in dataset description
329+
# Including class labels as part of task meta data handles
330+
# the case where data download was initially disabled
331+
if isinstance(task, OpenMLClassificationTask):
332+
task.class_labels = \
333+
dataset.retrieve_class_labels(task.target_name)
334+
# Clustering tasks do not have class labels
335+
# and do not offer download_split
336+
if download_data:
337+
if isinstance(task, OpenMLSupervisedTask):
338+
task.download_split()
339+
except Exception as e:
340+
openml.utils._remove_cache_dir_for_id(
341+
TASKS_CACHE_DIR_NAME,
342+
tid_cache_dir,
343+
)
344+
raise e
355345

356346
return task
357347

0 commit comments

Comments
 (0)