Skip to content

Commit c908993

Browse files
authored
Merge branch 'main' into update-tests-for-local
2 parents 30fd44d + da993f7 commit c908993

8 files changed

Lines changed: 299 additions & 11 deletions

File tree

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ the contribution guidelines: https://github.com/openml/openml-python/blob/main/C
55
Please make sure that:
66
77
* the title of the pull request is descriptive
8-
* this pull requests is against the `develop` branch
8+
* this pull requests is against the `main` branch
99
* for any new functionality, consider adding a relevant example
1010
* add unit tests for new functionalities
1111
* collect files uploaded to test server using _mark_entity_for_removal()

CONTRIBUTING.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ To contribute to the openml-python package, follow these steps:
4444

4545
0. Determine how you want to contribute (see above).
4646
1. Set up your local development environment.
47-
1. Fork and clone the `openml-python` repository. Then, create a new branch from the ``develop`` branch. If you are new to `git`, see our [detailed documentation](#basic-git-workflow), or rely on your favorite IDE.
47+
1. Fork and clone the `openml-python` repository. Then, create a new branch from the ``main`` branch. If you are new to `git`, see our [detailed documentation](#basic-git-workflow), or rely on your favorite IDE.
4848
2. [Install the local dependencies](#install-local-dependencies) to run the tests for your contribution.
4949
3. [Test your installation](#testing-your-installation) to ensure everything is set up correctly.
5050
4. Implement your contribution. If contributing to the documentation, see [here](#contributing-to-the-documentation).
@@ -91,7 +91,7 @@ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest
9191
pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_data
9292
```
9393

94-
To test your new contribution, add [unit tests](https://github.com/openml/openml-python/tree/develop/tests), and, if needed, [examples](https://github.com/openml/openml-python/tree/develop/examples) for any new functionality being introduced. Some notes on unit tests and examples:
94+
To test your new contribution, add [unit tests](https://github.com/openml/openml-python/tree/main/tests), and, if needed, [examples](https://github.com/openml/openml-python/tree/main/examples) for any new functionality being introduced. Some notes on unit tests and examples:
9595
* If a unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example, `TestBase._mark_entity_for_removal('data', dataset.dataset_id)`, `TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name))`.
9696
* Please ensure that the example is run on the test server by beginning with the call to `openml.config.start_using_configuration_for_example()`, which is done by default for tests derived from `TestBase`.
9797
* Add the `@pytest.mark.sklearn` marker to your unit tests if they have a dependency on scikit-learn.
@@ -109,7 +109,7 @@ export OPENML_TEST_SERVER_ADMIN_KEY="admin-key"
109109
110110
### Pull Request Checklist
111111
112-
You can go to the `openml-python` GitHub repository to create the pull request by [comparing the branch](https://github.com/openml/openml-python/compare) from your fork with the `develop` branch of the `openml-python` repository. When creating a pull request, make sure to follow the comments and structured provided by the template on GitHub.
112+
You can go to the `openml-python` GitHub repository to create the pull request by [comparing the branch](https://github.com/openml/openml-python/compare) from your fork with the `main` branch of the `openml-python` repository. When creating a pull request, make sure to follow the comments and structured provided by the template on GitHub.
113113
114114
**An incomplete contribution** -- where you expect to do more work before
115115
receiving a full review -- should be submitted as a `draft`. These may be useful
@@ -127,7 +127,7 @@ in the PR description.
127127
128128
The preferred workflow for contributing to openml-python is to
129129
fork the [main repository](https://github.com/openml/openml-python) on
130-
GitHub, clone, check out the branch `develop`, and develop on a new branch
130+
GitHub, clone, check out the branch `main`, and develop on a new branch
131131
branch. Steps:
132132
133133
0. Make sure you have git installed, and a GitHub account.
@@ -148,7 +148,7 @@ local disk:
148148
3. Switch to the ``develop`` branch:
149149
150150
```bash
151-
git checkout develop
151+
git checkout main
152152
```
153153
154154
3. Create a ``feature`` branch to hold your development changes:
@@ -157,7 +157,7 @@ local disk:
157157
git checkout -b feature/my-feature
158158
```
159159
160-
Always use a ``feature`` branch. It's good practice to never work on the ``main`` or ``develop`` branch!
160+
Always use a ``feature`` branch. It's good practice to never work on the ``main`` branch!
161161
To make the nature of your pull request easily visible, please prepend the name of the branch with the type of changes you want to merge, such as ``feature`` if it contains a new feature, ``fix`` for a bugfix, ``doc`` for documentation and ``maint`` for other maintenance on the package.
162162
163163
4. Develop the feature on your feature branch. Add changed files using ``git add`` and then ``git commit`` files:

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
2121
<!-- Add green badges for CI and precommit -->
2222

23-
[Installation](https://openml.github.io/openml-python/main/#how-to-get-openml-for-python) | [Documentation](https://openml.github.io/openml-python) | [Contribution guidelines](https://github.com/openml/openml-python/blob/develop/CONTRIBUTING.md)
23+
[Installation](https://openml.github.io/openml-python/main/#how-to-get-openml-for-python) | [Documentation](https://openml.github.io/openml-python) | [Contribution guidelines](https://github.com/openml/openml-python/blob/main/CONTRIBUTING.md)
2424
</div>
2525

2626
OpenML-Python provides an easy-to-use and straightforward Python interface for [OpenML](http://openml.org), an online platform for open science collaboration in machine learning.
@@ -94,7 +94,7 @@ Bibtex entry:
9494
We welcome contributions from both new and experienced developers!
9595

9696
If you would like to contribute to OpenML-Python, please read our
97-
[Contribution Guidelines](https://github.com/openml/openml-python/blob/develop/CONTRIBUTING.md).
97+
[Contribution Guidelines](https://github.com/openml/openml-python/blob/main/CONTRIBUTING.md).
9898

9999
If you are new to open-source development, a great way to get started is by
100100
looking at issues labeled **"good first issue"** in our GitHub issue tracker.

docs/developer_setup.md

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
# OpenML Local Development Environment Setup
2+
3+
This guide outlines the standard procedures for setting up a local development environment for the OpenML ecosystem. It covers the configuration of the backend servers (API v1 and API v2) and the Python Client SDK.
4+
5+
OpenML currently has two backend architecture:
6+
7+
* **API v1**: The PHP-based server currently serving production traffic.
8+
* **API v2**: The Python-based server (FastAPI) currently under active development.
9+
10+
> Note on Migration: API v1 is projected to remain operational through at least 2026. API v2 is the target architecture for future development.
11+
12+
## 1. API v1 Setup (PHP Backend)
13+
14+
This section details the deployment of the legacy PHP backend.
15+
16+
### Prerequisites
17+
18+
* **Docker**: Docker Desktop (Ensure the daemon is running).
19+
* **Version Control**: Git.
20+
21+
### Installation Steps
22+
23+
#### 1. Clone the Repository
24+
25+
Retrieve the OpenML services source code:
26+
27+
```bash
28+
git clone https://github.com/openml/services
29+
cd services
30+
```
31+
32+
#### 2. Configure File Permissions
33+
34+
To ensure the containerized PHP service can write to the local filesystem, initialize the data directory permissions.
35+
36+
From the repository root:
37+
38+
```bash
39+
chown -R www-data:www-data data/php
40+
```
41+
42+
If the `www-data` user does not exist on the host system, grant full permissions as a fallback:
43+
44+
```bash
45+
chmod -R 777 data/php
46+
```
47+
48+
#### 3. Launch Services
49+
50+
Initialize the container stack:
51+
52+
```bash
53+
docker compose --profile all up -d
54+
```
55+
56+
#### Warning: Container Conflicts
57+
58+
If API v2 (Python backend) containers are present on the system, name conflicts may occur. To resolve this, stop and remove existing containers before launching API v1:
59+
60+
```bash
61+
docker compose --profile all down
62+
docker compose --profile all up -d
63+
```
64+
65+
#### 4. Verification
66+
67+
Validate the deployment by accessing the flow endpoint. A successful response will return structured JSON data.
68+
69+
* **Endpoint**: http://localhost:8080/api/v1/json/flow/181
70+
71+
### Client Configuration
72+
73+
To direct the `openml-python` client to the local API v1 instance, modify the configuration as shown below. The API key corresponds to the default key located in `services/config/php/.env`.
74+
75+
```python
76+
import openml
77+
from openml_sklearn.extension import SklearnExtension
78+
from sklearn.neighbors import KNeighborsClassifier
79+
80+
# Configure client to use local Docker instance
81+
openml.config.server = "http://localhost:8080/api/v1/xml"
82+
openml.config.apikey = "AD000000000000000000000000000000"
83+
84+
# Test flow publication
85+
clf = KNeighborsClassifier(n_neighbors=3)
86+
extension = SklearnExtension()
87+
knn_flow = extension.model_to_flow(clf)
88+
89+
knn_flow.publish()
90+
```
91+
92+
## 2. API v2 Setup (Python Backend)
93+
94+
This section details the deployment of the FastAPI backend.
95+
96+
### Prerequisites
97+
98+
* **Docker**: Docker Desktop (Ensure the daemon is running).
99+
* **Version Control**: Git.
100+
101+
### Installation Steps
102+
103+
#### 1. Clone the Repository
104+
105+
Retrieve the API v2 source code:
106+
107+
```bash
108+
git clone https://github.com/openml/server-api
109+
cd server-api
110+
```
111+
112+
#### 2. Launch Services
113+
114+
Build and start the container stack:
115+
116+
```bash
117+
docker compose --profile all up
118+
```
119+
120+
#### 3. Verification
121+
122+
Validate the deployment using the following endpoints:
123+
124+
* **Task Endpoint**: http://localhost:8001/tasks/31
125+
* **Swagger UI (Documentation)**: http://localhost:8001/docs
126+
127+
## 3. Python SDK (`openml-python`) Setup
128+
129+
This section outlines the environment setup for contributing to the OpenML Python client.
130+
131+
### Installation Steps
132+
133+
#### 1. Clone the Repository
134+
135+
```bash
136+
git clone https://github.com/openml/openml-python
137+
cd openml-python
138+
```
139+
140+
#### 2. Environment Initialization
141+
142+
Create an isolated virtual environment (example using Conda):
143+
144+
```bash
145+
conda create -n openml-python-dev python=3.12
146+
conda activate openml-python-dev
147+
```
148+
149+
#### 3. Install Dependencies
150+
151+
Install the package in editable mode, including development and documentation dependencies:
152+
153+
```bash
154+
python -m pip install -e ".[dev,docs]"
155+
```
156+
157+
#### 4. Configure Quality Gates
158+
159+
Install pre-commit hooks to enforce coding standards:
160+
161+
```bash
162+
pre-commit install
163+
pre-commit run --all-files
164+
```
165+
166+
## 4. Testing Guidelines
167+
168+
The OpenML Python SDK utilizes `pytest` markers to categorize tests based on dependencies and execution context.
169+
170+
| Marker | Description |
171+
|-------------------|-----------------------------------------------------------------------------|
172+
| `sklearn` | Tests requiring `scikit-learn`. Skipped if the library is missing. |
173+
| `production` | Tests that interact with the live OpenML server (real API calls). |
174+
| `uses_test_server` | Tests requiring the OpenML test server environment. |
175+
176+
### Execution Examples
177+
178+
Run the full test suite:
179+
180+
```bash
181+
pytest
182+
```
183+
184+
Run a specific subset (e.g., `scikit-learn` tests):
185+
186+
```bash
187+
pytest -m sklearn
188+
```
189+
190+
Exclude production tests (local only):
191+
192+
```bash
193+
pytest -m "not production"
194+
```
195+
196+
### Admin Privilege Tests
197+
198+
Certain tests require administrative privileges on the test server. These are skipped automatically unless an admin API key is provided via environment variables.
199+
200+
#### Windows (PowerShell):
201+
202+
```shell
203+
$env:OPENML_TEST_SERVER_ADMIN_KEY = "admin-key"
204+
```
205+
206+
#### Linux/macOS:
207+
208+
```bash
209+
export OPENML_TEST_SERVER_ADMIN_KEY="admin-key"
210+
```

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ nav:
6565
- Advanced User Guide: details.md
6666
- API: reference/
6767
- Contributing: contributing.md
68+
- Developer Setup: developer_setup.md
6869

6970
markdown_extensions:
7071
- pymdownx.highlight:

openml/utils/__init__.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
"""Utilities module."""
2+
3+
from openml.utils._openml import (
4+
ProgressBar,
5+
ReprMixin,
6+
_create_cache_directory,
7+
_create_cache_directory_for_id,
8+
_create_lockfiles_dir,
9+
_delete_entity,
10+
_get_cache_dir_for_id,
11+
_get_cache_dir_for_key,
12+
_get_rest_api_type_alias,
13+
_list_all,
14+
_remove_cache_dir_for_id,
15+
_tag_entity,
16+
_tag_openml_base,
17+
extract_xml_tags,
18+
get_cache_size,
19+
thread_safe_if_oslo_installed,
20+
)
21+
22+
__all__ = [
23+
"ProgressBar",
24+
"ReprMixin",
25+
"_create_cache_directory",
26+
"_create_cache_directory_for_id",
27+
"_create_lockfiles_dir",
28+
"_delete_entity",
29+
"_get_cache_dir_for_id",
30+
"_get_cache_dir_for_key",
31+
"_get_rest_api_type_alias",
32+
"_list_all",
33+
"_remove_cache_dir_for_id",
34+
"_tag_entity",
35+
"_tag_openml_base",
36+
"extract_xml_tags",
37+
"get_cache_size",
38+
"thread_safe_if_oslo_installed",
39+
]

openml/utils.py renamed to openml/utils/_openml.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,7 @@
2626
import openml
2727
import openml._api_calls
2828
import openml.exceptions
29-
30-
from . import config
29+
from openml import config
3130

3231
# Avoid import cycles: https://mypy.readthedocs.io/en/latest/common_issues.html#import-cycles
3332
if TYPE_CHECKING:
@@ -436,6 +435,18 @@ def safe_func(*args: P.args, **kwargs: P.kwargs) -> R:
436435
return func
437436

438437

438+
def get_cache_size() -> int:
439+
"""Calculate the size of OpenML cache directory
440+
441+
Returns
442+
-------
443+
cache_size: int
444+
Total size of cache in bytes
445+
"""
446+
path = Path(config.get_cache_directory())
447+
return sum(f.stat().st_size for f in path.rglob("*") if f.is_file())
448+
449+
439450
def _create_lockfiles_dir() -> Path:
440451
path = Path(config.get_cache_directory()) / "locks"
441452
# TODO(eddiebergman): Not sure why this is allowed to error and ignore???

0 commit comments

Comments
 (0)