Skip to content

Commit 23f123f

Browse files
author
Tania Allard
committed
Add testing solutions
1 parent b7482dc commit 23f123f

10 files changed

Lines changed: 6296 additions & 21 deletions

03_ProcessData.ipynb

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@
228228
"\n",
229229
"Note you might need to run this from the shell like so \n",
230230
"```\n",
231-
"python -m scripts.runall-wine-analysis\n",
231+
"python -m src.runall-wine-analysis\n",
232232
"``` \n"
233233
]
234234
},
@@ -360,13 +360,6 @@
360360
" return HTML(styles)\n",
361361
"css_styling()"
362362
]
363-
},
364-
{
365-
"cell_type": "code",
366-
"execution_count": null,
367-
"metadata": {},
368-
"outputs": [],
369-
"source": []
370363
}
371364
],
372365
"metadata": {

04_Testing.ipynb

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"slideshow": {
7+
"slide_type": "slide"
8+
}
9+
},
10+
"source": [
11+
"# Testing\n",
12+
"\n",
13+
"We now have a fully automated script! 🎉👏🏻🦄\n",
14+
"\n",
15+
"The next step is to include **tests**... in fact testing should be a core part of our development process. In fact all of our **reproducible workflows** are analogous to experimental design in the scientific world\n",
16+
"\n",
17+
"![science](./assets/the_difference.png)\n",
18+
"\n",
19+
"<small> https://xkcd.com/242/ </small>"
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {},
25+
"source": [
26+
"There are various approaches to tests software:\n",
27+
"- Assertions\n",
28+
"- Exceptions: within the code serve as ⚠️\n",
29+
"- Unit tests: investigate the behaviour of units of code (e.g functions)\n",
30+
"- Regression tests: defends against 🐛\n",
31+
"- Integration tests: ⚙️ checks that the pieces work together as expected"
32+
]
33+
},
34+
{
35+
"cell_type": "markdown",
36+
"metadata": {},
37+
"source": [
38+
"We will start by testing some of our functions:\n",
39+
"Open `03_country-subset.py` and add the following function:\n",
40+
" \n",
41+
"```python \n",
42+
"def get_mean_price(filename):\n",
43+
" \"\"\" function to get the mean price of the wines\n",
44+
" rounded to 4 decimals\"\"\"\n",
45+
" wine = pd.read_csv(filename)\n",
46+
" mean_price = wine['price'].mean()\n",
47+
" return round(mean_price, 4)\n",
48+
"```\n",
49+
"\n",
50+
"And we will modify this function too:\n",
51+
"```python\n",
52+
"def get_country(filename, country):\n",
53+
" \n",
54+
"\n",
55+
" # Load table\n",
56+
" wine = pd.read_csv(filename)\n",
57+
"\n",
58+
" # Use the country name to subset data\n",
59+
" subset_country = wine[wine['country'] == country ].copy()\n",
60+
"\n",
61+
" # Subset the\n",
62+
"\n",
63+
" # Constructing the fname\n",
64+
" today = datetime.datetime.today().strftime('%Y-%m-%d')\n",
65+
" fname = f'data/processed/{today}-winemag_{country}.csv'\n",
66+
"\n",
67+
" # Saving the csv\n",
68+
" subset_country.to_csv(fname)\n",
69+
" print(fname) # print the fname from here\n",
70+
"\n",
71+
" return(subset_country) #returns the data frame\n",
72+
"```"
73+
]
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"Now we need to create out testing scripts. \n",
80+
"Some resources:\n",
81+
"- Pytest usage examples can be found [here](http://doc.pytest.org/en/latest/usage.html)\n",
82+
"- Rules for [test discovery](http://doc.pytest.org/en/latest/goodpractices.html)\n",
83+
"\n",
84+
"Now we can create our tests:\n",
85+
"```\n",
86+
"$ touch tests/__init__.py\n",
87+
"$ touch test_03_country_subset.py\n",
88+
"```\n",
89+
"Your test scrips should start with `test`"
90+
]
91+
},
92+
{
93+
"cell_type": "markdown",
94+
"metadata": {},
95+
"source": [
96+
"Your test script should look like this:\n",
97+
"``` python\n",
98+
"import importlib\n",
99+
"\n",
100+
"country = importlib.import_module('.data.03_country-subset', 'src')\n",
101+
"\n",
102+
"interim_data = \"data/interim/2018-04-30-winemag_priceGBP.csv\"\n",
103+
"processed_data = \"data/processed/2018-04-30-winemag_Chile.csv\"\n",
104+
"\n",
105+
"def test_get_mean_price():\n",
106+
" mean_price = country.get_mean_price(processed_data)\n",
107+
" assert mean_price == 20.7865\n",
108+
"```\n",
109+
"\n",
110+
"And you can run it from the shell using:\n",
111+
"```\n",
112+
"$ python -m pytest tests/test_03_country-subset.py\n",
113+
"```"
114+
]
115+
},
116+
{
117+
"cell_type": "markdown",
118+
"metadata": {},
119+
"source": [
120+
"## What if you want all the decimal numbers?\n",
121+
"\n",
122+
"``` python\n",
123+
"import importlib\n",
124+
"import numpy.testing as npt\n",
125+
"\n",
126+
"country = importlib.import_module('.data.03_country-subset', 'src')\n",
127+
"\n",
128+
"interim_data = \"data/interim/2018-04-30-winemag_priceGBP.csv\"\n",
129+
"processed_data = \"data/processed/2018-04-30-winemag_Chile.csv\"\n",
130+
"\n",
131+
"def test_get_mean_price():\n",
132+
" mean_price = country.get_mean_price(processed_data)\n",
133+
" assert mean_price == 20.7865\n",
134+
" npt.assert_allclose(country.get_mean_price(processed_data) , 20.787, rtol = 0.01)\n",
135+
"```\n",
136+
"\n",
137+
"The `numpy.testing.assert_allclose` allows you to set a tolerance "
138+
]
139+
},
140+
{
141+
"cell_type": "markdown",
142+
"metadata": {},
143+
"source": [
144+
"### What else could go wrong?\n",
145+
"\n",
146+
"What if we created a data set and we want to make sure that my interim or raw data has not changed? -> Thus my dataframes have not changes either?\n",
147+
"\n",
148+
"```python \n",
149+
"import pandas.testing as pdt\n",
150+
"import pandas as pd\n",
151+
"\n",
152+
"\n",
153+
"interim_data = \"data/interim/2018-05-09-winemag_priceGBP.csv\"\n",
154+
"processed_data = \"data/processed/2018-05-09-winemag_Chile.csv\"\n",
155+
"\n",
156+
"def test_get_country():\n",
157+
" # call the function\n",
158+
" df = country.get_country(interim_data, 'Chile')\n",
159+
" \n",
160+
" # load my previous dataset\n",
161+
" base = pd.read_csv(processed_data)\n",
162+
" \n",
163+
" # check if I am getting a dataframe\n",
164+
" assert isinstance(df, pd.DataFrame)\n",
165+
" assert isinstance(base, pd.DataFrame)\n",
166+
" \n",
167+
" # check that they are the same dataframes\n",
168+
" pdt.assert_frame_equal(df, base)\n",
169+
"``` "
170+
]
171+
},
172+
{
173+
"cell_type": "markdown",
174+
"metadata": {},
175+
"source": [
176+
"### See what we did in the previous steps?\n",
177+
"\n",
178+
"We tested each of the functions in our module...\n",
179+
"we did *unit testing*!\n",
180+
"Notice something in the functions we just wrote? \n",
181+
"- Set-up: `mean = country.get_mean(interim_data)`\n",
182+
"- Assertions: `assert mean_price == 20.786`"
183+
]
184+
},
185+
{
186+
"cell_type": "code",
187+
"execution_count": 1,
188+
"metadata": {},
189+
"outputs": [
190+
{
191+
"data": {
192+
"text/html": [
193+
"<link href=\"https://fonts.googleapis.com/css?family=Didact+Gothic|Dosis:400,500,700\" rel=\"stylesheet\"><style>\n",
194+
"@font-face {\n",
195+
" font-family: \"Computer Modern\";\n",
196+
" src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
197+
"}\n",
198+
"/* div.cell{\n",
199+
"width:800px;\n",
200+
"margin-left:16% !important;\n",
201+
"margin-right:auto;\n",
202+
"} */\n",
203+
"h1 {\n",
204+
" font-family: 'Dosis', \"Helvetica Neue\", Arial, sans-serif;\n",
205+
" color: #0B132B;\n",
206+
"}\n",
207+
"h2 {\n",
208+
" font-family: 'Dosis', sans-serif;\n",
209+
" color: #1C2541;\n",
210+
"}\n",
211+
"h3{\n",
212+
" font-family: 'Dosis', sans-serif;\n",
213+
" margin-top:12px;\n",
214+
" margin-bottom: 3px;\n",
215+
" color: #40a8a6;\n",
216+
"}\n",
217+
"h4{\n",
218+
" font-family: 'Dosis', sans-serif;\n",
219+
" color: #40a8a6;\n",
220+
"}\n",
221+
"h5 {\n",
222+
" font-family: 'Dosis', sans-serif;\n",
223+
" color: #40a8a6;\n",
224+
"}\n",
225+
"div.text_cell_render{\n",
226+
" font-family: 'Didact Gothic',Computer Modern, \"Helvetica Neue\", Arial, Helvetica,\n",
227+
" Geneva, sans-serif;\n",
228+
" line-height: 130%;\n",
229+
" font-size: 110%;\n",
230+
" /* width:600px; */\n",
231+
" /* margin-left:auto;\n",
232+
" margin-right:auto; */\n",
233+
"}\n",
234+
"\n",
235+
".text_cell_render h1 {\n",
236+
" font-weight: 200;\n",
237+
" font-size: 30pt;\n",
238+
" /* font-size: 50pt */\n",
239+
" line-height: 100%;\n",
240+
" color:#0B132B;\n",
241+
" margin-bottom: 0.5em;\n",
242+
" margin-top: 0.5em;\n",
243+
" display: block;\n",
244+
"}\n",
245+
"\n",
246+
".text_cell_render h2{\n",
247+
" font-weight: 500;\n",
248+
"}\n",
249+
"\n",
250+
".text_cell_render h3{\n",
251+
" font-weight: 500;\n",
252+
"}\n",
253+
"\n",
254+
"\n",
255+
".warning{\n",
256+
" color: rgb( 240, 20, 20 )\n",
257+
"}\n",
258+
"\n",
259+
"div.warn {\n",
260+
" background-color: #FF5A5F;\n",
261+
" border-color: #FF5A5F;\n",
262+
" border-left: 5px solid #C81D25;\n",
263+
" padding: 0.5em;\n",
264+
"\n",
265+
" color: #fff;\n",
266+
" opacity: 0.8;\n",
267+
"}\n",
268+
"\n",
269+
"div.info {\n",
270+
" background-color: #087E8B;\n",
271+
" border-color: #087E8B;\n",
272+
" border-left: 5px solid #0B3954;\n",
273+
" padding: 0.5em;\n",
274+
" color: #fff;\n",
275+
" opacity: 0.8;\n",
276+
"}\n",
277+
"\n",
278+
"</style>\n",
279+
"<script>\n",
280+
"MathJax.Hub.Config({\n",
281+
" TeX: {\n",
282+
" extensions: [\"AMSmath.js\"]\n",
283+
" },\n",
284+
" tex2jax: {\n",
285+
" inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
286+
" displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
287+
" },\n",
288+
" displayAlign: 'center', // Change this to 'center' to center equations.\n",
289+
" \"HTML-CSS\": {\n",
290+
" styles: {'.MathJax_Display': {\"margin\": 4}}\n",
291+
" }\n",
292+
" });\n",
293+
" </script>\n"
294+
],
295+
"text/plain": [
296+
"<IPython.core.display.HTML object>"
297+
]
298+
},
299+
"execution_count": 1,
300+
"metadata": {},
301+
"output_type": "execute_result"
302+
}
303+
],
304+
"source": [
305+
"from IPython.core.display import HTML\n",
306+
"\n",
307+
"\n",
308+
"def css_styling():\n",
309+
" styles = open(\"styles/custom.css\", \"r\").read()\n",
310+
" return HTML(styles)\n",
311+
"css_styling()"
312+
]
313+
}
314+
],
315+
"metadata": {
316+
"kernelspec": {
317+
"display_name": "Python 3",
318+
"language": "python",
319+
"name": "python3"
320+
},
321+
"language_info": {
322+
"codemirror_mode": {
323+
"name": "ipython",
324+
"version": 3
325+
},
326+
"file_extension": ".py",
327+
"mimetype": "text/x-python",
328+
"name": "python",
329+
"nbconvert_exporter": "python",
330+
"pygments_lexer": "ipython3",
331+
"version": "3.6.5"
332+
}
333+
},
334+
"nbformat": 4,
335+
"nbformat_minor": 2
336+
}

assets/the_difference.png

31.9 KB
Loading

0 commit comments

Comments
 (0)