trallard
diff --git a/‎02_WorkingWithData.ipynb‎
Lines changed: 26 additions & 3 deletions b/‎02_WorkingWithData.ipynb‎
Lines changed: 26 additions & 3 deletions
diff --git a/‎03_ProcessData.ipynb‎
Lines changed: 81 additions & 5 deletions b/‎03_ProcessData.ipynb‎
Lines changed: 81 additions & 5 deletions
diff --git a/‎assets/allthefiles.png‎
1.01 MB b/‎assets/allthefiles.png‎
1.01 MB
diff --git a/‎assets/meta.jpg‎
115 KB b/‎assets/meta.jpg‎
115 KB
diff --git a/‎assets/metadata.png‎
62.4 KB b/‎assets/metadata.png‎
62.4 KB
diff --git a/‎solutions/01_exploredata.ipynb‎ ‎solutions/00_exploredata.ipynb‎solutions/01_exploredata.ipynb renamed to solutions/00_exploredata.ipynb b/‎solutions/01_exploredata.ipynb‎ ‎solutions/00_exploredata.ipynb‎solutions/01_exploredata.ipynb renamed to solutions/00_exploredata.ipynb
diff --git a/‎solutions/01_subset-data-GBP-Copy1.py‎ ‎solutions/01_subset-data-GBP.py‎solutions/01_subset-data-GBP-Copy1.py renamed to solutions/01_subset-data-GBP.py
Lines changed: 11 additions & 1 deletion b/‎solutions/01_subset-data-GBP-Copy1.py‎ ‎solutions/01_subset-data-GBP.py‎solutions/01_subset-data-GBP-Copy1.py renamed to solutions/01_subset-data-GBP.py
Lines changed: 11 additions & 1 deletion
diff --git a/‎solutions/02_visualize-wines.py‎
Lines changed: 4 additions & 0 deletions b/‎solutions/02_visualize-wines.py‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎solutions/03_country-subset.py‎
Lines changed: 6 additions & 2 deletions b/‎solutions/03_country-subset.py‎
Lines changed: 6 additions & 2 deletions
@@ -1,5 +1,14 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Raw data are **sacrosanct**\n",
+    "\n",
+    "<blockquote class=\"twitter-tweet\" data-lang=\"en-gb\"><p lang=\"en\" dir=\"ltr\"><a href=\"https://twitter.com/tomjwebb?ref_src=twsrc%5Etfw\">@tomjwebb</a> don&#39;t, not even with a barge pole, not for one second, touch or otherwise edit the raw data files. Do any manipulations in script</p>&mdash; Gavin Simpson (@ucfagls) <a href=\"https://twitter.com/ucfagls/status/556107371634634755?ref_src=twsrc%5Etfw\">16 January 2015</a></blockquote> <script async src=\"https://platform.twitter.com/widgets.js\" charset=\"utf-8\"></script> \n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -37,6 +46,20 @@
     "Othwerwise you can get a copy at [https://drive.google.com/drive/u/1/folders/1b2B0KWS0UAVQqFgzx2R2qMNeiiB98lMe?usp=sharing](https://drive.google.com/drive/u/1/folders/1b2B0KWS0UAVQqFgzx2R2qMNeiiB98lMe?usp=sharing)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# You got data... is it enough?Data without documentation has no value\n",
+    "### metadata = data about data \n",
+    "\n",
+    "Information that describes, explains, locates or makes it easier\n",
+    "to <strong>find, access, and use</strong> a resource\n",
+    "\n",
+    "<img src=\"assets/meta.jpg\" alt=\"metadata\" width='300px'>\n",
+    "<img src=\"assets/metadata.png\" alt=\"metadata\" >"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -66,7 +89,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [
     {
@@ -178,7 +201,7 @@
        "<IPython.core.display.HTML object>"
       ]
      },
-     "execution_count": 1,
+     "execution_count": 2,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -210,7 +233,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.6.4"
   }
  },
  "nbformat": 4,
 
@@ -15,7 +15,7 @@
     "We want to do the following:\n",
     "- Create a Jupyter notebook for exploratory analysis\n",
     "- Generate the following outputs using python scripts:\n",
-    "    - Generate a subset of `winemag-130k-v2.csv` containing only the following columns:`Country, designation, points, price (in GBP)`. Save in a .csv file\n",
+    "    - Generate a subset of `winemag-130k-v2.csv` containing only the following columns: `country, designation, points, price (in GBP)`. Save in a .csv file\n",
     "    - Generate and save a table of wines only produced in Chile\n",
     "    - Save a scatterplot of the wines points vs price and a distribution plot of wine scores"
    ]
@@ -28,7 +28,7 @@
     }
    },
    "source": [
-    "Don't worry you do not have to generate all of the scripts... we have provided the bases for you to start working.\n",
+    "Don't worry you do not have to generate all of the scripts... we have provided some scripts for you to get started.\n",
     "You should now have a directory called `SupportScripts`\n",
     "\n",
     "You need to make sure that they are in the appropriate directory inside your newly created project.\n",
@@ -45,7 +45,11 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "slideshow": {
+     "slide_type": "subslide"
+    }
+   },
    "source": [
     "# Documentation\n",
     "\n",
@@ -56,6 +60,20 @@
     "A good point to start is checking the [Google Python style guidelines](https://google.github.io/styleguide/pyguide.html#Comments)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "subslide"
+    }
+   },
+   "source": [
+    "Let's face it.... there are going to be files\n",
+    "**LOTS** of files\n",
+    "\n",
+    "![files](assets/allthefiles.png)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -76,11 +94,69 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "slideshow": {
+     "slide_type": "subslide"
+    }
+   },
    "source": [
     "![](./assets/dates_ISO.png)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "subslide"
+    }
+   },
+   "source": [
+    "## What works and what doesn't\n",
+    "\n",
+    "<table>\n",
+    "  <tr>\n",
+    "    <th>NO</th>\n",
+    "    <th>YES</th>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <td>report.docx</td>\n",
+    "    <td>2018-02-03_report-for-sla.docx</td>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <td>Joey's filename has spaces and punctuation.xlsx</td>\n",
+    "    <td>joeys-filenames-are-getting-better.xlsx</td>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <td>fig 1.png</td>\n",
+    "    <td>fig01_scatterplot-talk-length-vs-interest.png</td>\n",
+    "  </tr>\n",
+    "</table>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# The scripts\n",
+    "\n",
+    "Let's start by checking the scripts and notebooks:\n",
+    "- **00_explore-data.ipynb**: exploratory analysis \n",
+    "- **01_subset-data-GBP.py**: subset of winemag-130k-v2.csv containing only the following columns: country, designation, points, price (in GBP). Save in a .csv file\n",
+    "- **02_visualize-wines.py**\n",
+    "- **03_country-subset.py**\n",
+    "\n",
+    "From the root of your file system you can run the scripts as follow:\n",
+    "```\n",
+    "$ python src/data/01_subset-data-GBP.py data/raw/winemag-data-130k-v2.csv \n",
+    "$ python src/visualization/02_visualize-wines.py data/interim/2018-05-09-winemag_priceGBP.csv \n",
+    "$ python src/data/03_country-subset.py data/interim/2018-05-09-winemag_priceGBP.csv Chile\n",
+    "```\n"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -227,7 +303,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.6.4"
   }
  },
  "nbformat": 4,
 
@@ -1,5 +1,13 @@
 #!/usr/bin/env python
 
+
+"""
+Module containing functions to subset the raw data:
+keeps description, country, price, points and adds
+column for price in GBP
+
+"""
+
 import sys
 import datetime
 
@@ -8,15 +16,17 @@
 import matplotlib.pyplot as plt
 
 
+
 def process_data_GBP(filename):
     """
     Get only the needed subset from the data.
     Args:
+    -----
     filename: str
         Path to the filename containing the wine data
 
     Returns:
-
+    -----
     data_path: st
         Path to the created data set
     """
 
@@ -1,4 +1,8 @@
 #!/usr/bin/env python
+"""
+Module contaning the functions to visualize the 
+wines distribution using a subset data
+"""
 
 import sys
 import datetime
 
@@ -1,5 +1,8 @@
 #!/usr/bin/env python
-
+"""
+Module containing the functions to subset the data
+according to a given country name
+"""
 
 import sys
 import datetime
@@ -13,13 +16,14 @@ def get_country(filename, country):
     """
     Do a simple analysis per country
     Args:
+    -----
     filename: str
         Path to the filename containing the wine data
     country: str
         Country to be used to subset
 
     Returns:
-
+    -----
     data_path: st
         Path to the created data set
     """