trallard
diff --git a/‎03_ProcessData.ipynb‎
Lines changed: 1 addition & 8 deletions b/‎03_ProcessData.ipynb‎
Lines changed: 1 addition & 8 deletions
diff --git a/‎04_Testing.ipynb‎
Lines changed: 336 additions & 0 deletions b/‎04_Testing.ipynb‎
Lines changed: 336 additions & 0 deletions
diff --git a/‎assets/the_difference.png‎
31.9 KB b/‎assets/the_difference.png‎
31.9 KB
@@ -228,7 +228,7 @@
     "\n",
     "Note you might need to run this from the shell like so \n",
     "```\n",
-    "python -m scripts.runall-wine-analysis\n",
+    "python -m src.runall-wine-analysis\n",
     "``` \n"
    ]
   },
@@ -360,13 +360,6 @@
     "    return HTML(styles)\n",
     "css_styling()"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
 
@@ -0,0 +1,336 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Testing\n",
+    "\n",
+    "We now have a fully automated script! 🎉👏🏻🦄\n",
+    "\n",
+    "The next step is to include **tests**... in fact testing should be a core part of our development process. In fact all of our **reproducible workflows** are analogous to experimental design in the scientific world\n",
+    "\n",
+    "![science](./assets/the_difference.png)\n",
+    "\n",
+    "<small> https://xkcd.com/242/ </small>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are various approaches to tests software:\n",
+    "- Assertions\n",
+    "- Exceptions: within the code serve as ⚠️\n",
+    "- Unit tests: investigate the behaviour of units of code (e.g functions)\n",
+    "- Regression tests: defends against 🐛\n",
+    "- Integration tests: ⚙️ checks that the pieces work together as expected"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We will start by testing some of our functions:\n",
+    "Open `03_country-subset.py` and add the following function:\n",
+    "    \n",
+    "```python \n",
+    "def get_mean_price(filename):\n",
+    "    \"\"\" function to get the mean price of the wines\n",
+    "    rounded to 4 decimals\"\"\"\n",
+    "    wine = pd.read_csv(filename)\n",
+    "    mean_price = wine['price'].mean()\n",
+    "    return round(mean_price, 4)\n",
+    "```\n",
+    "\n",
+    "And we will modify this function too:\n",
+    "```python\n",
+    "def get_country(filename, country):\n",
+    "   \n",
+    "\n",
+    "    # Load table\n",
+    "    wine = pd.read_csv(filename)\n",
+    "\n",
+    "    # Use the country name to subset data\n",
+    "    subset_country = wine[wine['country'] == country ].copy()\n",
+    "\n",
+    "    # Subset the\n",
+    "\n",
+    "    # Constructing the fname\n",
+    "    today = datetime.datetime.today().strftime('%Y-%m-%d')\n",
+    "    fname = f'data/processed/{today}-winemag_{country}.csv'\n",
+    "\n",
+    "    # Saving the csv\n",
+    "    subset_country.to_csv(fname)\n",
+    "    print(fname)  # print the fname from here\n",
+    "\n",
+    "    return(subset_country)  #returns the data frame\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we need to create out testing scripts. \n",
+    "Some resources:\n",
+    "- Pytest usage examples can be found [here](http://doc.pytest.org/en/latest/usage.html)\n",
+    "- Rules for [test discovery](http://doc.pytest.org/en/latest/goodpractices.html)\n",
+    "\n",
+    "Now we can create our tests:\n",
+    "```\n",
+    "$ touch tests/__init__.py\n",
+    "$ touch test_03_country_subset.py\n",
+    "```\n",
+    "Your test scrips should start with `test`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your test script should look like this:\n",
+    "``` python\n",
+    "import importlib\n",
+    "\n",
+    "country = importlib.import_module('.data.03_country-subset', 'src')\n",
+    "\n",
+    "interim_data = \"data/interim/2018-04-30-winemag_priceGBP.csv\"\n",
+    "processed_data = \"data/processed/2018-04-30-winemag_Chile.csv\"\n",
+    "\n",
+    "def test_get_mean_price():\n",
+    "    mean_price = country.get_mean_price(processed_data)\n",
+    "    assert mean_price == 20.7865\n",
+    "```\n",
+    "\n",
+    "And you can run it from the shell using:\n",
+    "```\n",
+    "$ python -m pytest tests/test_03_country-subset.py\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## What if you want all the decimal numbers?\n",
+    "\n",
+    "``` python\n",
+    "import importlib\n",
+    "import numpy.testing as npt\n",
+    "\n",
+    "country = importlib.import_module('.data.03_country-subset', 'src')\n",
+    "\n",
+    "interim_data = \"data/interim/2018-04-30-winemag_priceGBP.csv\"\n",
+    "processed_data = \"data/processed/2018-04-30-winemag_Chile.csv\"\n",
+    "\n",
+    "def test_get_mean_price():\n",
+    "    mean_price = country.get_mean_price(processed_data)\n",
+    "    assert mean_price == 20.7865\n",
+    "    npt.assert_allclose(country.get_mean_price(processed_data) , 20.787, rtol = 0.01)\n",
+    "```\n",
+    "\n",
+    "The `numpy.testing.assert_allclose` allows you to set a tolerance "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### What else could go wrong?\n",
+    "\n",
+    "What if we created a data set and we want to make sure that my interim or raw data has not changed? -> Thus my dataframes have not changes either?\n",
+    "\n",
+    "```python \n",
+    "import pandas.testing as pdt\n",
+    "import pandas as pd\n",
+    "\n",
+    "\n",
+    "interim_data = \"data/interim/2018-05-09-winemag_priceGBP.csv\"\n",
+    "processed_data = \"data/processed/2018-05-09-winemag_Chile.csv\"\n",
+    "\n",
+    "def test_get_country():\n",
+    "    # call the function\n",
+    "    df = country.get_country(interim_data, 'Chile')\n",
+    "    \n",
+    "    # load my previous dataset\n",
+    "    base = pd.read_csv(processed_data)\n",
+    "    \n",
+    "    # check if I am getting a dataframe\n",
+    "    assert isinstance(df, pd.DataFrame)\n",
+    "    assert isinstance(base, pd.DataFrame)\n",
+    "    \n",
+    "    # check that they are the same dataframes\n",
+    "    pdt.assert_frame_equal(df, base)\n",
+    "```    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### See what we did in the previous steps?\n",
+    "\n",
+    "We tested each of the functions in our module...\n",
+    "we did *unit testing*!\n",
+    "Notice something in the functions we just wrote? \n",
+    "- Set-up: `mean = country.get_mean(interim_data)`\n",
+    "- Assertions: `assert mean_price == 20.786`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<link href=\"https://fonts.googleapis.com/css?family=Didact+Gothic|Dosis:400,500,700\" rel=\"stylesheet\"><style>\n",
+       "@font-face {\n",
+       "  font-family: \"Computer Modern\";\n",
+       "  src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
+       "}\n",
+       "/* div.cell{\n",
+       "width:800px;\n",
+       "margin-left:16% !important;\n",
+       "margin-right:auto;\n",
+       "} */\n",
+       "h1 {\n",
+       "  font-family: 'Dosis', \"Helvetica Neue\", Arial, sans-serif;\n",
+       "  color: #0B132B;\n",
+       "}\n",
+       "h2 {\n",
+       "  font-family: 'Dosis', sans-serif;\n",
+       "  color: #1C2541;\n",
+       "}\n",
+       "h3{\n",
+       "  font-family: 'Dosis', sans-serif;\n",
+       "  margin-top:12px;\n",
+       "  margin-bottom: 3px;\n",
+       "  color: #40a8a6;\n",
+       "}\n",
+       "h4{\n",
+       "  font-family: 'Dosis', sans-serif;\n",
+       "  color: #40a8a6;\n",
+       "}\n",
+       "h5 {\n",
+       "  font-family: 'Dosis', sans-serif;\n",
+       "  color: #40a8a6;\n",
+       "}\n",
+       "div.text_cell_render{\n",
+       "  font-family: 'Didact Gothic',Computer Modern, \"Helvetica Neue\", Arial, Helvetica,\n",
+       "  Geneva, sans-serif;\n",
+       "  line-height: 130%;\n",
+       "  font-size: 110%;\n",
+       "  /* width:600px; */\n",
+       "  /* margin-left:auto;\n",
+       "  margin-right:auto; */\n",
+       "}\n",
+       "\n",
+       ".text_cell_render h1 {\n",
+       "  font-weight: 200;\n",
+       "  font-size: 30pt;\n",
+       "  /* font-size: 50pt */\n",
+       "  line-height: 100%;\n",
+       "  color:#0B132B;\n",
+       "  margin-bottom: 0.5em;\n",
+       "  margin-top: 0.5em;\n",
+       "  display: block;\n",
+       "}\n",
+       "\n",
+       ".text_cell_render h2{\n",
+       "  font-weight: 500;\n",
+       "}\n",
+       "\n",
+       ".text_cell_render h3{\n",
+       "  font-weight: 500;\n",
+       "}\n",
+       "\n",
+       "\n",
+       ".warning{\n",
+       "  color: rgb( 240, 20, 20 )\n",
+       "}\n",
+       "\n",
+       "div.warn {\n",
+       "  background-color: #FF5A5F;\n",
+       "  border-color: #FF5A5F;\n",
+       "  border-left: 5px solid #C81D25;\n",
+       "  padding: 0.5em;\n",
+       "\n",
+       "  color: #fff;\n",
+       "  opacity: 0.8;\n",
+       "}\n",
+       "\n",
+       "div.info {\n",
+       "  background-color: #087E8B;\n",
+       "  border-color: #087E8B;\n",
+       "  border-left: 5px solid #0B3954;\n",
+       "  padding: 0.5em;\n",
+       "  color: #fff;\n",
+       "  opacity: 0.8;\n",
+       "}\n",
+       "\n",
+       "</style>\n",
+       "<script>\n",
+       "MathJax.Hub.Config({\n",
+       "  TeX: {\n",
+       "    extensions: [\"AMSmath.js\"]\n",
+       "    },\n",
+       "    tex2jax: {\n",
+       "      inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
+       "      displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
+       "      },\n",
+       "      displayAlign: 'center', // Change this to 'center' to center equations.\n",
+       "      \"HTML-CSS\": {\n",
+       "        styles: {'.MathJax_Display': {\"margin\": 4}}\n",
+       "      }\n",
+       "      });\n",
+       "      </script>\n"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from IPython.core.display import HTML\n",
+    "\n",
+    "\n",
+    "def css_styling():\n",
+    "    styles = open(\"styles/custom.css\", \"r\").read()\n",
+    "    return HTML(styles)\n",
+    "css_styling()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}