Skip to content

Commit b254e59

Browse files
author
Tania Allard
committed
Add content to scripts and metadata
1 parent 0a06023 commit b254e59

9 files changed

Lines changed: 128 additions & 11 deletions

02_WorkingWithData.ipynb

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Raw data are **sacrosanct**\n",
8+
"\n",
9+
"<blockquote class=\"twitter-tweet\" data-lang=\"en-gb\"><p lang=\"en\" dir=\"ltr\"><a href=\"https://twitter.com/tomjwebb?ref_src=twsrc%5Etfw\">@tomjwebb</a> don&#39;t, not even with a barge pole, not for one second, touch or otherwise edit the raw data files. Do any manipulations in script</p>&mdash; Gavin Simpson (@ucfagls) <a href=\"https://twitter.com/ucfagls/status/556107371634634755?ref_src=twsrc%5Etfw\">16 January 2015</a></blockquote> <script async src=\"https://platform.twitter.com/widgets.js\" charset=\"utf-8\"></script> \n"
10+
]
11+
},
312
{
413
"cell_type": "markdown",
514
"metadata": {
@@ -37,6 +46,20 @@
3746
"Othwerwise you can get a copy at [https://drive.google.com/drive/u/1/folders/1b2B0KWS0UAVQqFgzx2R2qMNeiiB98lMe?usp=sharing](https://drive.google.com/drive/u/1/folders/1b2B0KWS0UAVQqFgzx2R2qMNeiiB98lMe?usp=sharing)"
3847
]
3948
},
49+
{
50+
"cell_type": "markdown",
51+
"metadata": {},
52+
"source": [
53+
"# You got data... is it enough?Data without documentation has no value\n",
54+
"### metadata = data about data \n",
55+
"\n",
56+
"Information that describes, explains, locates or makes it easier\n",
57+
"to <strong>find, access, and use</strong> a resource\n",
58+
"\n",
59+
"<img src=\"assets/meta.jpg\" alt=\"metadata\" width='300px'>\n",
60+
"<img src=\"assets/metadata.png\" alt=\"metadata\" >"
61+
]
62+
},
4063
{
4164
"cell_type": "markdown",
4265
"metadata": {
@@ -66,7 +89,7 @@
6689
},
6790
{
6891
"cell_type": "code",
69-
"execution_count": 1,
92+
"execution_count": 2,
7093
"metadata": {},
7194
"outputs": [
7295
{
@@ -178,7 +201,7 @@
178201
"<IPython.core.display.HTML object>"
179202
]
180203
},
181-
"execution_count": 1,
204+
"execution_count": 2,
182205
"metadata": {},
183206
"output_type": "execute_result"
184207
}
@@ -210,7 +233,7 @@
210233
"name": "python",
211234
"nbconvert_exporter": "python",
212235
"pygments_lexer": "ipython3",
213-
"version": "3.6.5"
236+
"version": "3.6.4"
214237
}
215238
},
216239
"nbformat": 4,

03_ProcessData.ipynb

Lines changed: 81 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"We want to do the following:\n",
1616
"- Create a Jupyter notebook for exploratory analysis\n",
1717
"- Generate the following outputs using python scripts:\n",
18-
" - Generate a subset of `winemag-130k-v2.csv` containing only the following columns:`Country, designation, points, price (in GBP)`. Save in a .csv file\n",
18+
" - Generate a subset of `winemag-130k-v2.csv` containing only the following columns: `country, designation, points, price (in GBP)`. Save in a .csv file\n",
1919
" - Generate and save a table of wines only produced in Chile\n",
2020
" - Save a scatterplot of the wines points vs price and a distribution plot of wine scores"
2121
]
@@ -28,7 +28,7 @@
2828
}
2929
},
3030
"source": [
31-
"Don't worry you do not have to generate all of the scripts... we have provided the bases for you to start working.\n",
31+
"Don't worry you do not have to generate all of the scripts... we have provided some scripts for you to get started.\n",
3232
"You should now have a directory called `SupportScripts`\n",
3333
"\n",
3434
"You need to make sure that they are in the appropriate directory inside your newly created project.\n",
@@ -45,7 +45,11 @@
4545
},
4646
{
4747
"cell_type": "markdown",
48-
"metadata": {},
48+
"metadata": {
49+
"slideshow": {
50+
"slide_type": "subslide"
51+
}
52+
},
4953
"source": [
5054
"# Documentation\n",
5155
"\n",
@@ -56,6 +60,20 @@
5660
"A good point to start is checking the [Google Python style guidelines](https://google.github.io/styleguide/pyguide.html#Comments)"
5761
]
5862
},
63+
{
64+
"cell_type": "markdown",
65+
"metadata": {
66+
"slideshow": {
67+
"slide_type": "subslide"
68+
}
69+
},
70+
"source": [
71+
"Let's face it.... there are going to be files\n",
72+
"**LOTS** of files\n",
73+
"\n",
74+
"![files](assets/allthefiles.png)"
75+
]
76+
},
5977
{
6078
"cell_type": "markdown",
6179
"metadata": {
@@ -76,11 +94,69 @@
7694
},
7795
{
7896
"cell_type": "markdown",
79-
"metadata": {},
97+
"metadata": {
98+
"slideshow": {
99+
"slide_type": "subslide"
100+
}
101+
},
80102
"source": [
81103
"![](./assets/dates_ISO.png)"
82104
]
83105
},
106+
{
107+
"cell_type": "markdown",
108+
"metadata": {
109+
"slideshow": {
110+
"slide_type": "subslide"
111+
}
112+
},
113+
"source": [
114+
"## What works and what doesn't\n",
115+
"\n",
116+
"<table>\n",
117+
" <tr>\n",
118+
" <th>NO</th>\n",
119+
" <th>YES</th>\n",
120+
" </tr>\n",
121+
" <tr>\n",
122+
" <td>report.docx</td>\n",
123+
" <td>2018-02-03_report-for-sla.docx</td>\n",
124+
" </tr>\n",
125+
" <tr>\n",
126+
" <td>Joey's filename has spaces and punctuation.xlsx</td>\n",
127+
" <td>joeys-filenames-are-getting-better.xlsx</td>\n",
128+
" </tr>\n",
129+
" <tr>\n",
130+
" <td>fig 1.png</td>\n",
131+
" <td>fig01_scatterplot-talk-length-vs-interest.png</td>\n",
132+
" </tr>\n",
133+
"</table>\n"
134+
]
135+
},
136+
{
137+
"cell_type": "markdown",
138+
"metadata": {
139+
"slideshow": {
140+
"slide_type": "slide"
141+
}
142+
},
143+
"source": [
144+
"# The scripts\n",
145+
"\n",
146+
"Let's start by checking the scripts and notebooks:\n",
147+
"- **00_explore-data.ipynb**: exploratory analysis \n",
148+
"- **01_subset-data-GBP.py**: subset of winemag-130k-v2.csv containing only the following columns: country, designation, points, price (in GBP). Save in a .csv file\n",
149+
"- **02_visualize-wines.py**\n",
150+
"- **03_country-subset.py**\n",
151+
"\n",
152+
"From the root of your file system you can run the scripts as follow:\n",
153+
"```\n",
154+
"$ python src/data/01_subset-data-GBP.py data/raw/winemag-data-130k-v2.csv \n",
155+
"$ python src/visualization/02_visualize-wines.py data/interim/2018-05-09-winemag_priceGBP.csv \n",
156+
"$ python src/data/03_country-subset.py data/interim/2018-05-09-winemag_priceGBP.csv Chile\n",
157+
"```\n"
158+
]
159+
},
84160
{
85161
"cell_type": "code",
86162
"execution_count": 1,
@@ -227,7 +303,7 @@
227303
"name": "python",
228304
"nbconvert_exporter": "python",
229305
"pygments_lexer": "ipython3",
230-
"version": "3.6.5"
306+
"version": "3.6.4"
231307
}
232308
},
233309
"nbformat": 4,

assets/allthefiles.png

1.01 MB
Loading

assets/meta.jpg

115 KB
Loading

assets/metadata.png

62.4 KB
Loading
Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
#!/usr/bin/env python
22

3+
4+
"""
5+
Module containing functions to subset the raw data:
6+
keeps description, country, price, points and adds
7+
column for price in GBP
8+
9+
"""
10+
311
import sys
412
import datetime
513

@@ -8,15 +16,17 @@
816
import matplotlib.pyplot as plt
917

1018

19+
1120
def process_data_GBP(filename):
1221
"""
1322
Get only the needed subset from the data.
1423
Args:
24+
-----
1525
filename: str
1626
Path to the filename containing the wine data
1727
1828
Returns:
19-
29+
-----
2030
data_path: st
2131
Path to the created data set
2232
"""

solutions/02_visualize-wines.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/usr/bin/env python
2+
"""
3+
Module contaning the functions to visualize the
4+
wines distribution using a subset data
5+
"""
26

37
import sys
48
import datetime

solutions/03_country-subset.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
#!/usr/bin/env python
2-
2+
"""
3+
Module containing the functions to subset the data
4+
according to a given country name
5+
"""
36

47
import sys
58
import datetime
@@ -13,13 +16,14 @@ def get_country(filename, country):
1316
"""
1417
Do a simple analysis per country
1518
Args:
19+
-----
1620
filename: str
1721
Path to the filename containing the wine data
1822
country: str
1923
Country to be used to subset
2024
2125
Returns:
22-
26+
-----
2327
data_path: st
2428
Path to the created data set
2529
"""

0 commit comments

Comments
 (0)