Skip to content

Commit 2e192e7

Browse files
committed
docs: API reference pages added
1 parent b730ea4 commit 2e192e7

8 files changed

Lines changed: 39 additions & 192 deletions

File tree

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
modAL.acquisition
2+
=================
3+
4+
.. automodule:: modAL.acquisition
5+
:members:
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
modAL.batch
2+
===========
3+
4+
.. automodule:: modAL.batch
5+
:members:
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
modAL.density
2+
=============
3+
4+
.. automodule:: modAL.density
5+
:members:
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
modAL.disagreement
2+
==================
3+
4+
.. automodule:: modAL.disagreement
5+
:members:
Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
1-
models
2-
======
3-
4-
modAL models
5-
1+
modAL.models
2+
============
63

74
.. automodule:: modAL.models
85
:members:
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
modAL.uncertainty
2+
=================
3+
4+
.. automodule:: modAL.uncertainty
5+
:members:
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
modAL.utils
2+
===========
3+
4+
.. automodule:: modAL.utils
5+
:members:

docs/source/index.rst

Lines changed: 7 additions & 187 deletions
Original file line numberDiff line numberDiff line change
@@ -1,195 +1,10 @@
11
modAL: A modular active learning framework for Python3
22
======================================================
33

4-
| *modal: adjective, relating to structure as opposed to substance*
5-
| (Merriam-Webster Dictionary)
4+
Welcome to the documentation for modAL!
65

76
modAL is an active learning framework for Python3, designed with *modularity, flexibility* and *extensibility* in mind. Built on top of scikit-learn, it allows you to rapidly create active learning workflows with nearly complete freedom. What is more, you can easily replace parts with your custom built solutions, allowing you to design novel algorithms with ease.
87

9-
Active learning from bird's-eye view
10-
------------------------------------
11-
12-
With the recent explosion of available data, you have can have millions of unlabelled examples with a high cost to obtain labels. For instance, when trying to predict the sentiment of tweets, obtaining a training set
13-
can require immense manual labour. But worry not, active learning comes to the rescue! In general, AL is a framework allowing you to increase classification performance by intelligently querying you to label the most informative instances. To give an example, suppose that you have the following data and classifier with shaded regions signifying the classification probability.
14-
15-
.. image:: content/overview/img/motivating-example.png
16-
:align: center
17-
18-
Suppose that you can query the label of an unlabelled instance, but it costs you a lot. Which one would you choose? By querying an instance in the uncertain region, surely you obtain more information than querying by random. Active learning gives you a set of tools to handle problems like this. In general, an active learning workflow looks like the following.
19-
20-
.. image:: content/overview/img/active-learning.png
21-
:align: center
22-
23-
The key components of any workflow are the **model** you choose, the **uncertainty** measure you use and the **query** strategy you apply to request labels. With modAL, instead of choosing from a small set of built-in components, you have the freedom to seamlessly integrate scikit-learn or Keras models into your algorithm and easily tailor your custom query strategies and uncertainty measures.
24-
25-
modAL in action
26-
---------------
27-
28-
Let's see what modAL can do for you!
29-
30-
From zero to one in a few lines of code
31-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
32-
33-
Active learning with a scikit-learn classifier, for instance RandomForestClassifier, can be as simple as the following.
34-
35-
.. code:: python
36-
37-
from modAL.models import ActiveLearner
38-
from sklearn.ensemble import RandomForestClassifier
39-
40-
# initializing the learner
41-
learner = ActiveLearner(
42-
estimator=RandomForestClassifier(),
43-
X_training=X_training, y_training=y_training
44-
)
45-
46-
# query for labels
47-
query_idx, query_inst = learner.query(X_pool)
48-
49-
# ...obtaining new labels from the Oracle...
50-
51-
# supply label for queried instance
52-
learner.teach(X_pool[query_idx], y_new)
53-
54-
Replacing parts quickly
55-
^^^^^^^^^^^^^^^^^^^^^^^
56-
57-
If you would like to use different uncertainty measures and query
58-
strategies than the default uncertainty sampling, you can either replace
59-
them with several built-in strategies or you can design your own by
60-
following a few very simple design principles. For instance, replacing
61-
the default uncertainty measure to classification entropy looks the
62-
following.
63-
64-
.. code:: python
65-
66-
from modAL.models import ActiveLearner
67-
from modAL.uncertainty import entropy_sampling
68-
from sklearn.ensemble import RandomForestClassifier
69-
70-
learner = ActiveLearner(
71-
estimator=RandomForestClassifier(),
72-
query_strategy=entropy_sampling,
73-
X_training=X_training, y_training=y_training
74-
)
75-
76-
Replacing parts with your own solutions
77-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
78-
79-
modAL was designed to make it easy for you to implement your own query
80-
strategy. For example, implementing and using a simple random sampling
81-
strategy is as easy as the following.
82-
83-
.. code:: python
84-
85-
import numpy as np
86-
87-
def random_sampling(classifier, X_pool):
88-
n_samples = len(X_pool)
89-
query_idx = np.random.choice(range(n_samples))
90-
return query_idx, X_pool[query_idx]
91-
92-
learner = ActiveLearner(
93-
estimator=RandomForestClassifier(),
94-
query_strategy=random_sampling,
95-
X_training=X_training, y_training=y_training
96-
)
97-
98-
For more details on how to implement your custom strategies, visit the page `Extending modAL <content/overview/Extending-modAL.ipynb>`_!
99-
100-
101-
An example with active regression
102-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
103-
104-
To see modAL in *real* action, let's consider an active regression
105-
problem with Gaussian processes! In this example, we shall try to learn
106-
the *noisy sine* function:
107-
108-
.. code:: python
109-
110-
import numpy as np
111-
112-
X = np.random.choice(np.linspace(0, 20, 10000), size=200, replace=False).reshape(-1, 1)
113-
y = np.sin(X) + np.random.normal(scale=0.3, size=X.shape)
114-
115-
For active learning, we shall define a custom query strategy tailored to
116-
Gaussian processes. In a nutshell, a *query stategy* in modAL is a
117-
function taking (at least) two arguments (an estimator object and a pool
118-
of examples), outputting the index of the queried instance and the
119-
instance itself. In our case, the arguments are ``regressor`` and ``X``.
120-
121-
.. code:: python
122-
123-
def GP_regression_std(regressor, X):
124-
_, std = regressor.predict(X, return_std=True)
125-
query_idx = np.argmax(std)
126-
return query_idx, X[query_idx]
127-
128-
After setting up the query strategy and the data, the active learner can
129-
be initialized.
130-
131-
.. code:: python
132-
133-
from modAL.models import ActiveLearner
134-
from sklearn.gaussian_process import GaussianProcessRegressor
135-
from sklearn.gaussian_process.kernels import WhiteKernel, RBF
136-
137-
n_initial = 5
138-
initial_idx = np.random.choice(range(len(X)), size=n_initial, replace=False)
139-
X_training, y_training = X[initial_idx], y[initial_idx]
140-
141-
kernel = RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e3)) \
142-
+ WhiteKernel(noise_level=1, noise_level_bounds=(1e-10, 1e+1))
143-
144-
regressor = ActiveLearner(
145-
estimator=GaussianProcessRegressor(kernel=kernel),
146-
query_strategy=GP_regression_std,
147-
X_training=X_training.reshape(-1, 1), y_training=y_training.reshape(-1, 1)
148-
)
149-
150-
The initial regressor is not very accurate.
151-
152-
.. image:: content/overview/img/gp-initial.png
153-
:align: center
154-
155-
The blue band enveloping the regressor represents the standard deviation
156-
of the Gaussian process at the given point. Now we are ready to do
157-
active learning!
158-
159-
.. code:: python
160-
161-
# active learning
162-
n_queries = 10
163-
for idx in range(n_queries):
164-
query_idx, query_instance = regressor.query(X)
165-
regressor.teach(X[query_idx].reshape(1, -1), y[query_idx].reshape(1, -1))
166-
167-
After a few queries, we can see that the prediction is much improved.
168-
169-
.. image:: content/overview/img/gp-final.png
170-
:align: center
171-
172-
Citing
173-
------
174-
175-
If you use modAL in your projects, you can cite it as
176-
177-
::
178-
179-
@article{modAL2018,
180-
title={mod{AL}: {A} modular active learning framework for {P}ython},
181-
author={Tivadar Danka and Peter Horvath},
182-
url={https://github.com/cosmic-cortex/modAL},
183-
note={available on arXiv at \url{https://arxiv.org/abs/1805.00979}}
184-
}
185-
186-
About the developer
187-
-------------------
188-
189-
modAL is developed by me, `Tivadar
190-
Danka <https://www.tivadardanka.com>`__ (aka
191-
`cosmic-cortex <https://github.com/cosmic-cortex>`__ in GitHub). I have a PhD in pure mathematics, but I fell in love with biology and machine learning right after I finished my PhD. I have changed fields and now I work in the `Bioimage Analysis and Machine Learning Group of Peter Horvath <http://group.szbk.u-szeged.hu/sysbiol/horvath-peter-lab-index.html>`__, where I am working to develop active learning strategies for intelligent sample analysis in biology. During my work I realized that in Python, creating and prototyping active learning workflows can be made really easy and fast with scikit-learn, so I ended up developing a general framework for this. The result is modAL :) If you have any questions, requests or suggestions, you can contact me at 85a5187a@opayq.com! I hope you'll find modAL useful!
192-
1938
.. toctree::
1949
:maxdepth: 1
19510
:caption: Overview
@@ -234,4 +49,9 @@ Danka <https://www.tivadardanka.com>`__ (aka
23449
:maxdepth: 1
23550
:caption: API reference
23651

237-
content/apireference/*
52+
content/apireference/models.rst
53+
content/apireference/uncertainty.rst
54+
content/apireference/disagreement.rst
55+
content/apireference/acquisition.rst
56+
content/apireference/density.rst
57+
content/apireference/utils.rst

0 commit comments

Comments
 (0)