|
22 | 22 | "cell_type": "markdown", |
23 | 23 | "metadata": {}, |
24 | 24 | "source": [ |
25 | | - "### Enforce a reproducible result across runs" |
| 25 | + "To enforce a reproducible result across runs, we set a random seed." |
26 | 26 | ] |
27 | 27 | }, |
28 | 28 | { |
|
42 | 42 | "cell_type": "markdown", |
43 | 43 | "metadata": {}, |
44 | 44 | "source": [ |
45 | | - "### Load our `iris` dataset\n", |
| 45 | + "## The dataset\n", |
46 | 46 | "\n", |
47 | | - "For more information on the iris dataset, see:\n", |
| 47 | + "Now we load the dataset. In this example, we are going to use the famous Iris dataset. For more information on the iris dataset, see:\n", |
48 | 48 | " - [The dataset documentation on Wikipedia](https://en.wikipedia.org/wiki/Iris_flower_data_set)\n", |
49 | 49 | " - [The scikit-learn interface](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html)" |
50 | 50 | ] |
|
66 | 66 | "cell_type": "markdown", |
67 | 67 | "metadata": {}, |
68 | 68 | "source": [ |
69 | | - "### Apply PCA onto our features and extract the first 2 principle components" |
| 69 | + "For visualization purposes, we apply PCA to the original dataset." |
70 | 70 | ] |
71 | 71 | }, |
72 | 72 | { |
|
86 | 86 | "cell_type": "markdown", |
87 | 87 | "metadata": {}, |
88 | 88 | "source": [ |
89 | | - "### Visualize the principle components" |
| 89 | + "This is how the dataset looks like." |
90 | 90 | ] |
91 | 91 | }, |
92 | 92 | { |
|
124 | 124 | "cell_type": "markdown", |
125 | 125 | "metadata": {}, |
126 | 126 | "source": [ |
127 | | - "### Partition our `iris` dataset\n", |
128 | | - "\n", |
129 | | - "We first specify our training set $\\mathcal{L}$ consisting of 3 random examples. The remaining examples go to our \"unlabeled\" pool $\\mathcal{U}$." |
| 127 | + "Now we partition our `iris` dataset into a training set $\\mathcal{L}$ and $\\mathcal{U}$. We first specify our training set $\\mathcal{L}$ consisting of 3 random examples. The remaining examples go to our \"unlabeled\" pool $\\mathcal{U}$." |
130 | 128 | ] |
131 | 129 | }, |
132 | 130 | { |
|
151 | 149 | "cell_type": "markdown", |
152 | 150 | "metadata": {}, |
153 | 151 | "source": [ |
154 | | - "## Define our models" |
| 152 | + "## Active learning with pool-based sampling\n", |
| 153 | + "\n", |
| 154 | + "For the classification, we are going to use a simple k-nearest neighbors classifier. In this step, we are also going to initialize the ```ActiveLearner```." |
155 | 155 | ] |
156 | 156 | }, |
157 | 157 | { |
|
172 | 172 | "cell_type": "markdown", |
173 | 173 | "metadata": {}, |
174 | 174 | "source": [ |
175 | | - "## Predict class labels based on our limited dataset $\\mathcal{L}$" |
| 175 | + "Let's see how our classifier performs on the initial training set!" |
176 | 176 | ] |
177 | 177 | }, |
178 | 178 | { |
|
242 | 242 | "\n", |
243 | 243 | "As we can see, our model is unable to properly learn the underlying data distribution. All of its predictions are for the third class label, and as such it is only as competitive as defaulting its predictions to a single class – if only we had more data!\n", |
244 | 244 | "\n", |
245 | | - "Below, we tune our classifier by allowing it to query 20 instances it hasn't seen before. Using uncertainty sampling, our classifier aims to reduce the amount of uncertainty in its predictions using a variety of measures — see the documentation for more on specific [classification uncertainty measures](https://cosmic-cortex.github.io/modAL/Uncertainty-sampling#uncertainty). With each requested query, we remove that record from our pool $\\mathcal{U}$ and record our model's accuracy on the raw dataset." |
| 245 | + "Below, we tune our classifier by allowing it to query 20 instances it hasn't seen before. Using uncertainty sampling, our classifier aims to reduce the amount of uncertainty in its predictions using a variety of measures — see the documentation for more on specific [classification uncertainty measures](https://modal-python.readthedocs.io/en/latest/content/query_strategies/Uncertainty-sampling.html). With each requested query, we remove that record from our pool $\\mathcal{U}$ and record our model's accuracy on the raw dataset." |
246 | 246 | ] |
247 | 247 | }, |
248 | 248 | { |
|
388 | 388 | "name": "python", |
389 | 389 | "nbconvert_exporter": "python", |
390 | 390 | "pygments_lexer": "ipython3", |
391 | | - "version": "3.6.6" |
| 391 | + "version": "3.6.5" |
392 | 392 | } |
393 | 393 | }, |
394 | 394 | "nbformat": 4, |
|
0 commit comments