Skip to content

Commit a1cfcd5

Browse files
committed
05-loops-and-functions: fix metadata & code blocks
1 parent 20bda05 commit a1cfcd5

1 file changed

Lines changed: 73 additions & 48 deletions

File tree

_episodes/05-loops-and-functions.md

Lines changed: 73 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,16 @@ title: Data Workflows and Automation
33
teaching: 40
44
exercises: 50
55
questions:
6-
- " Can I automate operations in Python? "
7-
- " What are functions and why should I use them? "
6+
- "Can I automate operations in Python?"
7+
- "What are functions and why should I use them?"
88
objectives:
9-
- Describe why for loops are used in Python.
10-
- Employ for loops to automate data analysis.
11-
- Write unique filenames in Python.
12-
- Build reusable code in Python.
13-
- Write functions using conditional statements (if, then, else).
9+
- "Describe why for loops are used in Python."
10+
- "Employ for loops to automate data analysis."
11+
- "Write unique filenames in Python."
12+
- "Build reusable code in Python."
13+
- "Write functions using conditional statements (if, then, else)."
14+
keypoints:
15+
- "FIXME"
1416
---
1517

1618
So far, we've used Python and the pandas library to explore and manipulate
@@ -30,7 +32,7 @@ errors by making mistakes while processing each file by hand.
3032
Let's write a simple for loop that simulates what a kid might see during a
3133
visit to the zoo:
3234

33-
```python
35+
~~~
3436
>>> animals = ['lion', 'tiger', 'crocodile', 'vulture', 'hippo']
3537
>>> print(animals)
3638
['lion', 'tiger', 'crocodile', 'vulture', 'hippo']
@@ -42,7 +44,8 @@ tiger
4244
crocodile
4345
vulture
4446
hippo
45-
```
47+
~~~
48+
{: .language-python}
4649

4750
The line defining the loop must start with `for` and end with a colon, and the
4851
body of the loop must be indented.
@@ -52,14 +55,15 @@ entry in `animals` every time the loop goes around. We can call the loop variabl
5255
anything we like. After the loop finishes, the loop variable will still exist
5356
and will have the value of the last entry in the collection:
5457

55-
```python
58+
~~~
5659
>>> animals = ['lion', 'tiger', 'crocodile', 'vulture', 'hippo']
5760
>>> for creature in animals:
5861
... pass
5962
6063
>>> print('The loop variable is now: ' + creature)
6164
The loop variable is now: hippo
62-
```
65+
~~~
66+
{: .language-python}
6367

6468
We are not asking python to print the value of the loop variable anymore, but
6569
the for loop still runs and the value of `creature` changes on each pass through
@@ -83,16 +87,17 @@ file.
8387
Let's start by making a new directory inside the folder `data` to store all of
8488
these files using the module `os`:
8589

86-
```python
90+
~~~
8791
import os
8892
8993
os.mkdir('data/yearly_files')
90-
```
94+
~~~
95+
{: .language-python}
9196

9297
The command `os.mkdir` is equivalent to `mkdir` in the shell. Just so we are
9398
sure, we can check that the new directory was created within the `data` folder:
9499

95-
```python
100+
~~~
96101
>>> os.listdir('data')
97102
['plots.csv',
98103
'portal_mammals.sqlite',
@@ -102,7 +107,8 @@ sure, we can check that the new directory was created within the `data` folder:
102107
'surveys.csv',
103108
'surveys2002_temp.csv',
104109
'yearly_files']
105-
```
110+
~~~
111+
{: .language-python}
106112

107113
The command `os.listdir` is equivalent to `ls` in the shell.
108114

@@ -111,7 +117,7 @@ data into memory as a DataFrame, how to select a subset of the data using some
111117
criteria, and how to write the DataFrame into a CSV file. Let's write a script
112118
that performs those three steps in sequence for the year 2002:
113119

114-
```python
120+
~~~
115121
import pandas as pd
116122
117123
# Load the data into a DataFrame
@@ -122,7 +128,8 @@ surveys2002 = surveys_df[surveys_df.year == 2002]
122128
123129
# Write the new DataFrame to a CSV file
124130
surveys2002.to_csv('data/yearly_files/surveys2002.csv')
125-
```
131+
~~~
132+
{: .language-python}
126133

127134
To create yearly data files, we could repeat the last two commands over and
128135
over, once for each year of data. Repeating code is neither elegant nor
@@ -138,7 +145,7 @@ confirm that the loop is behaving as we expect.
138145
We have seen that we can loop over a list of items, so we need a list of years
139146
to loop over. We can get the years in our DataFrame with:
140147

141-
```python
148+
~~~
142149
>>> surveys_df['year']
143150
144151
0 1977
@@ -150,21 +157,23 @@ to loop over. We can get the years in our DataFrame with:
150157
35546 2002
151158
35547 2002
152159
35548 2002
153-
```
160+
~~~
161+
{: .language-python}
154162

155163
but we want only unique years, which we can get using the `unique` method
156-
which we have already seen.
164+
which we have already seen.
157165

158-
```python
166+
~~~
159167
>>> surveys_df['year'].unique()
160168
array([1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987,
161169
1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
162170
1999, 2000, 2001, 2002], dtype=int64)
163-
```
171+
~~~
172+
{: .language-python}
164173

165174
Putting this into our for loop we get
166175

167-
```python
176+
~~~
168177
>>> for year in surveys_df['year'].unique():
169178
... filename='data/yearly_files/surveys' + str(year) + '.csv'
170179
... print(filename)
@@ -195,11 +204,12 @@ data/yearly_files/surveys1999.csv
195204
data/yearly_files/surveys2000.csv
196205
data/yearly_files/surveys2001.csv
197206
data/yearly_files/surveys2002.csv
198-
```
207+
~~~
208+
{: .language-python}
199209

200210
We can now add the rest of the steps we need to create separate text files:
201211

202-
```python
212+
~~~
203213
# Load the data into a DataFrame
204214
surveys_df = pd.read_csv('data/surveys.csv')
205215
@@ -211,7 +221,8 @@ for year in surveys_df['year'].unique():
211221
# Write the new DataFrame to a CSV file
212222
filename = 'data/yearly_files/surveys' + str(year) + '.csv'
213223
surveys_year.to_csv(filename)
214-
```
224+
~~~
225+
{: .language-python}
215226

216227
Look inside the `yearly_files` directory and check a couple of the files you
217228
just created to confirm that everything worked as expected.
@@ -220,7 +231,10 @@ just created to confirm that everything worked as expected.
220231

221232
Notice that the code above created a unique filename for each year.
222233

234+
~~~
223235
filename = 'data/yearly_files/surveys' + str(year) + '.csv'
236+
~~~
237+
{: .language-python}
224238

225239
Let's break down the parts of this name:
226240

@@ -272,7 +286,7 @@ easy to write functions that can be used by different programs.
272286

273287
Functions are declared following this general structure:
274288

275-
```python
289+
~~~
276290
def this_is_the_function_name(input_argument1, input_argument2):
277291
278292
# The body of the function is indented
@@ -281,7 +295,8 @@ def this_is_the_function_name(input_argument1, input_argument2):
281295
282296
# And returns their product
283297
return input_argument1 * input_argument2
284-
```
298+
~~~
299+
{: .language-python}
285300

286301
The function declaration starts with the word `def`, followed by the function
287302
name and any arguments in parenthesis, and ends in a colon. The body of the
@@ -290,13 +305,14 @@ it is called, it includes a return statement at the end.
290305

291306
This is how we call the function:
292307

293-
```python
308+
~~~
294309
>>> product_of_inputs = this_is_the_function_name(2,5)
295310
The function arguments are: 2 5 (this is done inside the function!)
296311
297312
>>> print('Their product is:', product_of_inputs, '(this is done outside the function!)')
298313
Their product is: 10 (this is done outside the function!)
299-
```
314+
~~~
315+
{: .language-python}
300316

301317
> ## Challenge - Functions
302318
>
@@ -315,7 +331,7 @@ many different "chunks" of this code that we can turn into functions, and we can
315331
even create functions that call other functions inside them. Let's first write a
316332
function that separates data for just one year and saves that data to a file:
317333

318-
```python
334+
~~~
319335
def one_year_csv_writer(this_year, all_data):
320336
"""
321337
Writes a csv file for data from a given year.
@@ -330,21 +346,24 @@ def one_year_csv_writer(this_year, all_data):
330346
# Write the new DataFrame to a csv file
331347
filename = 'data/yearly_files/function_surveys' + str(this_year) + '.csv'
332348
surveys_year.to_csv(filename)
333-
```
349+
~~~
350+
{: .language-python}
334351

335352
The text between the two sets of triple double quotes is called a docstring and
336353
contains the documentation for the function. It does nothing when the function
337354
is running and is therefore not necessary, but it is good practice to include
338355
docstrings as a reminder of what the code does. Docstrings in functions also
339356
become part of their 'official' documentation:
340357

341-
```python
358+
~~~
342359
one_year_csv_writer?
343-
```
360+
~~~
361+
{: .language-python}
344362

345-
```python
363+
~~~
346364
one_year_csv_writer(2002, surveys_df)
347-
```
365+
~~~
366+
{: .language-python}
348367

349368
We changed the root of the name of the CSV file so we can distinguish it from
350369
the one we wrote before. Check the `yearly_files` directory for the file. Did it
@@ -356,7 +375,7 @@ the entire For loop by simply looping through a sequence of years and repeatedly
356375
calling the function we just wrote, `one_year_csv_writer`:
357376

358377

359-
```python
378+
~~~
360379
def yearly_data_csv_writer(start_year, end_year, all_data):
361380
"""
362381
Writes separate CSV files for each year of data.
@@ -369,7 +388,8 @@ def yearly_data_csv_writer(start_year, end_year, all_data):
369388
# "end_year" is the last year of data we want to pull, so we loop to end_year+1
370389
for year in range(start_year, end_year+1):
371390
one_year_csv_writer(year, all_data)
372-
```
391+
~~~
392+
{: .language-python}
373393

374394
Because people will naturally expect that the end year for the files is the last
375395
year with data, the for loop inside the function ends at `end_year + 1`. By
@@ -379,13 +399,14 @@ first and last year for which we want files, we can even use this function to
379399
create files for a subset of the years available. This is how we call this
380400
function:
381401

382-
```python
402+
~~~
383403
# Load the data into a DataFrame
384404
surveys_df = pd.read_csv('data/surveys.csv')
385405
386406
# Create CSV files
387407
yearly_data_csv_writer(1977, 2002, surveys_df)
388-
```
408+
~~~
409+
{: .language-python}
389410

390411
BEWARE! If you are using IPython Notebooks and you modify a function, you MUST
391412
re-run that cell in order for the changed function to be available to the rest
@@ -422,7 +443,7 @@ sign in the function declaration. Any arguments in the function without default
422443
values (here, `all_data`) is a required argument and MUST come before the
423444
argument with default values (which are optional in the function call).
424445

425-
```python
446+
~~~
426447
def yearly_data_arg_test(all_data, start_year = 1977, end_year = 2002):
427448
"""
428449
Modified from yearly_data_csv_writer to test default argument values!
@@ -440,7 +461,8 @@ argument with default values (which are optional in the function call).
440461
441462
start,end = yearly_data_arg_test (surveys_df)
442463
print('Default values:\t\t\t', start, end)
443-
```
464+
~~~
465+
{: .language-python}
444466

445467
```
446468
Both optional arguments: 1988 1993
@@ -454,7 +476,7 @@ But what if our dataset doesn't start in 1977 and end in 2002? We can modify the
454476
function so that it looks for the start and end years in the dataset if those
455477
dates are not provided:
456478

457-
```python
479+
~~~
458480
def yearly_data_arg_test(all_data, start_year = None, end_year = None):
459481
"""
460482
Modified from yearly_data_csv_writer to test default argument values!
@@ -477,7 +499,8 @@ dates are not provided:
477499
478500
start,end = yearly_data_arg_test (surveys_df)
479501
print('Default values:\t\t\t', start, end)
480-
```
502+
~~~
503+
{: .language-python}
481504
```
482505
Both optional arguments: 1988 1993
483506
Default values: 1977 2002
@@ -510,7 +533,7 @@ The body of the test function now has two conditionals (if statements) that
510533
check the values of `start_year` and `end_year`. If statements execute a segment
511534
of code when some condition is met. They commonly look something like this:
512535

513-
```python
536+
~~~
514537
a = 5
515538
516539
if a<0: # Meets first condition?
@@ -527,7 +550,8 @@ of code when some condition is met. They commonly look something like this:
527550
528551
# if a ISN'T less than zero and ISN'T more than zero
529552
print('a must be zero!')
530-
```
553+
~~~
554+
{: .language-python}
531555

532556
Which would return:
533557

@@ -556,7 +580,7 @@ calling the function using keyword arguments, where each of the arguments in the
556580
function definition is associated with a keyword and the function call passes
557581
values to the function using these keywords:
558582

559-
```python
583+
~~~
560584
start,end = yearly_data_arg_test (surveys_df)
561585
print('Default values:\t\t\t', start, end)
562586
@@ -574,7 +598,8 @@ values to the function using these keywords:
574598
575599
start,end = yearly_data_arg_test (surveys_df, end_year = 1993)
576600
print('One keyword, default start:\t', start, end)
577-
```
601+
~~~
602+
{: .language-python}
578603
```
579604
Default values: 1977 2002
580605
No keywords: 1988 1993

0 commit comments

Comments
 (0)