datacarpentry
diff --git a/‎_episodes/00-before-we-start.md‎
Lines changed: 7 additions & 5 deletions b/‎_episodes/00-before-we-start.md‎
Lines changed: 7 additions & 5 deletions
diff --git a/‎_episodes/01-short-introduction-to-Python.md‎
Lines changed: 3 additions & 4 deletions b/‎_episodes/01-short-introduction-to-Python.md‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎_episodes/02-starting-with-data.md‎
Lines changed: 15 additions & 14 deletions b/‎_episodes/02-starting-with-data.md‎
Lines changed: 15 additions & 14 deletions
diff --git a/‎_episodes/04-data-types-and-format.md‎
Lines changed: 1 addition & 1 deletion b/‎_episodes/04-data-types-and-format.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_episodes/05-merging-data.md‎
Lines changed: 11 additions & 10 deletions b/‎_episodes/05-merging-data.md‎
Lines changed: 11 additions & 10 deletions
@@ -44,7 +44,7 @@ mean it is easier for new members of the community to get up to speed.
 Reproducibility is the ability to obtain the same results using the same dataset(s) and analysis.
 
 Data analysis written as a Python script can be reproduced on any platform.  Moreover, if you
-collect more or correct existing data, you can quickly and easily re-run your analysis!
+collect more or correct existing data, you can quickly re-run your analysis!
 
 An increasing number of journals and funding agencies expect analyses to be reproducible,
 so knowing Python will give you an edge with these requirements.
@@ -77,7 +77,7 @@ such as the IPython console, Jupyter Notebook, and Spyder IDE.
 Have a quick look around the Anaconda Navigator. You can launch programs from the Navigator or use the command line.
 
 The [Jupyter Notebook](https://jupyter.org) is an open-source web application that allows you to create
-and share documents that allow one to easilty create documents that combine code, graphs, and narrative text.
+and share documents that allow one to create documents that combine code, graphs, and narrative text.
 [Spyder][spyder-ide] is an **Integrated Development Environment** that
 allows one to write Python scripts and interact with the Python software from within a single interface.
 
@@ -147,7 +147,7 @@ default.
 
 Since we want our code and workflow to be reproducible, it is better to type the commands in
 the script editor, and save them as a script. This way, there is a complete record of what we did,
-and anyone (including our future selves!) can easily reproduce the results on their computer.
+and anyone (including our future selves!) has an easier time reproducing the results on their computer.
 
 Spyder allows you to execute commands directly from the script editor by using the run buttons on
 top.  To run the entire script click _Run file_ or press <kbd>F5</kbd>, to run the current line
@@ -189,6 +189,7 @@ code to suit your purpose might make it easier for you to get started.
 * type `help()`
 * type `?object` or `help(object)` to get information about an object
 * [Python documentation][python-docs]
+* [Pandas documentation][pandas-docs]
 
 Finally, a generic Google or internet search "Python task" will often either send you to the
 appropriate module documentation or a helpful forum where someone else has already asked your
@@ -201,7 +202,7 @@ messages that might not be very helpful to diagnose a problem (e.g. "subscript o
 the message is very generic, you might also include the name of the function or package you’re using
 in your query.
 
-However, you should check Stack Overflow. Search using the `python` tag. Most questions have already
+However, you should check Stack Overflow. Search using the `[python]` tag. Most questions have already
 been answered, but the challenge is to use the right words in the search to find the answers:
 <https://stackoverflow.com/questions/tagged/python?tab=Votes>
 
@@ -245,7 +246,8 @@ ask a good question.
 [anaconda]: https://www.anaconda.com
 [anaconda-community]: https://www.anaconda.com/community
 [dive-into-python3]: https://finderiko.com/python-book
-[pypi]: https://pypi.python.org/pypi
+[pandas-docs]: https://pandas.pydata.org/pandas-docs/stable/
+[pypi]: https://pypi.org/
 [python-docs]: https://www.python.org/doc
 [python-guide]: https://docs.python-guide.org
 [python-mailing-lists]: https://www.python.org/community/lists
 
@@ -1,6 +1,6 @@
 ---
 title: Short Introduction to Programming in Python
-teaching: 0
+teaching: 30
 exercises: 0
 questions:
     - "What is Python?"
@@ -290,9 +290,8 @@ A `for` loop can be used to access the elements in a list or other Python data
 structure one at a time:
 
 ~~~
->>> for num in numbers:
-...     print(num)
-...
+for num in numbers:
+    print(num)
 ~~~
 {: .language-python}
 
 
@@ -18,7 +18,7 @@ objectives:
     - "Perform basic mathematical operations and summary statistics on data in a Pandas DataFrame."
     - "Create simple plots."
 keypoints:
-    - "Libraries enable us to extend the functionality of Python." 
+    - "Libraries enable us to extend the functionality of Python."
     - "Pandas is a popular library for working with data."
     - "A Dataframe is a Pandas data structure that allows one to access data by column (name or index) or row."
     - "Aggregating data using the `groupby()` function enables you to generate useful summaries of data quickly."
@@ -37,7 +37,7 @@ and they can replicate the same analysis.
 
 To help the lesson run smoothly, let's ensure everyone is in the same directory.
 This should help us avoid path and file name issues. At this time please
-navigate to the workshop directory. If you working in IPython Notebook be sure
+navigate to the workshop directory. If you are working in IPython Notebook be sure
 that you start your notebook in the workshop directory.
 
 A quick aside that there are Python libraries like [OS Library][os-lib] that can work with our
@@ -93,7 +93,8 @@ record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight
 A library in Python contains a set of tools (called functions) that perform
 tasks on our data. Importing a library is like getting a piece of lab equipment
 out of a storage locker and setting it up on the bench for use in a project.
-Once a library is set up, it can be used or called to perform many tasks.
+Once a library is set up, it can be used or called to perform the task(s)
+it was built to do.
 
 ## Pandas in Python
 One of the best options for working with tabular data in Python is to use the
@@ -124,7 +125,7 @@ time we call a Pandas function.
 # Reading CSV Data Using Pandas
 
 We will begin by locating and reading our survey data which are in CSV format. CSV stands for
-Comma-Separated Values and is a common way store formatted data. Other symbols may also be used, so
+Comma-Separated Values and is a common way to store formatted data. Other symbols may also be used, so
 you might see tab-separated, colon-separated or space separated files. It is quite easy to replace
 one separator with another, to match your application. The first line in the file often has headers
 to explain what is in each column. CSV (and other separators) make it easy to share data, and can be
@@ -486,8 +487,8 @@ summary stats.
 >
 > 1. How many recorded individuals are female `F` and how many male `M`?
 > 2. What happens when you group by two columns using the following syntax and
->    then grab mean values?
->   - `grouped_data2 = surveys_df.groupby(['plot_id','sex'])`
+>    then calculate mean values?
+>   - `grouped_data2 = surveys_df.groupby(['plot_id', 'sex'])`
 >   - `grouped_data2.mean()`
 > 3. Summarize weight values for each site in your data. HINT: you can use the
 >   following syntax to only create summary statistics for one column in your data.
@@ -536,7 +537,7 @@ surveys_df.groupby('species_id')['record_id'].count()['DO']
 > ## Challenge - Make a list
 >
 >  What's another way to create a list of species and associated `count` of the
->  records in the data? Hint: you can perform `count`, `min`, etc functions on
+>  records in the data? Hint: you can perform `count`, `min`, etc. functions on
 >  groupby DataFrames in the same way you can perform them on regular DataFrames.
 {: .challenge}
 
@@ -589,13 +590,13 @@ total_count.plot(kind='bar');
 > being sex. The plot should show total weight by sex for each site. Some
 > tips are below to help you solve this challenge:
 >
-> * For more on Pandas plots, visit this [link][pandas-plot].
+> * For more information on pandas plots, see [pandas' documentation page on visualization][pandas-plot].
 > * You can use the code that follows to create a stacked bar plot but the data to stack
 >  need to be in individual columns.  Here's a simple example with some data where
 >  'a', 'b', and 'c' are the groups, and 'one' and 'two' are the subgroups.
 >
 > ~~~
-> d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
+> d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
 > pd.DataFrame(d)
 > ~~~
 > {: .language-python }
@@ -616,7 +617,7 @@ total_count.plot(kind='bar');
 > ~~~
 > # Plot stacked data so columns 'one' and 'two' are stacked
 > my_df = pd.DataFrame(d)
-> my_df.plot(kind='bar',stacked=True,title="The title of my graph")
+> my_df.plot(kind='bar', stacked=True, title="The title of my graph")
 > ~~~
 > {: .language-python }
 >
@@ -635,7 +636,7 @@ total_count.plot(kind='bar');
 >> First we group data by site and by sex, and then calculate a total for each site.
 >>
 >> ~~~
->> by_site_sex = surveys_df.groupby(['plot_id','sex'])
+>> by_site_sex = surveys_df.groupby(['plot_id', 'sex'])
 >> site_sex_count = by_site_sex['weight'].sum()
 >> ~~~
 >> {: .language-python}
@@ -660,7 +661,7 @@ total_count.plot(kind='bar');
 >> Below we'll use `.unstack()` on our grouped data to figure out the total weight that each sex contributed to each site.
 >>
 >> ~~~
->> by_site_sex = surveys_df.groupby(['plot_id','sex'])
+>> by_site_sex = surveys_df.groupby(['plot_id', 'sex'])
 >> site_sex_count = by_site_sex['weight'].sum()
 >> site_sex_count.unstack()
 >> ~~~
@@ -684,10 +685,10 @@ total_count.plot(kind='bar');
 >> Rather than display it as a table, we can plot the above data by stacking the values of each sex as follows:
 >>
 >> ~~~
->> by_site_sex = surveys_df.groupby(['plot_id','sex'])
+>> by_site_sex = surveys_df.groupby(['plot_id', 'sex'])
 >> site_sex_count = by_site_sex['weight'].sum()
 >> spc = site_sex_count.unstack()
->> s_plot = spc.plot(kind='bar',stacked=True,title="Total weight by site and sex")
+>> s_plot = spc.plot(kind='bar', stacked=True, title="Total weight by site and sex")
 >> s_plot.set_ylabel("Weight")
 >> s_plot.set_xlabel("Plot")
 >> ~~~
 
@@ -275,7 +275,7 @@ with weight values > 0 (i.e., select meaningful weight values):
 ~~~
 len(surveys_df[pd.isnull(surveys_df.weight)])
 # How many rows have weight values?
-len(surveys_df[surveys_df.weight> 0])
+len(surveys_df[surveys_df.weight > 0])
 ~~~
 {: .language-python}
 
 
@@ -20,7 +20,7 @@ keypoints:
 In many "real world" situations, the data that we want to use come in multiple
 files. We often need to combine these files into a single DataFrame to analyze
 the data. The pandas package provides [various methods for combining
-DataFrames](http://pandas.pydata.org/pandas-docs/stable/merging.html) including
+DataFrames](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html) including
 `merge` and `concat`.
 
 To work through the examples below, we first need to load the species and
@@ -71,7 +71,7 @@ Take note that the `read_csv` method we used can take some additional options wh
 we didn't use previously. Many functions in Python have a set of options that
 can be set by the user if needed. In this case, we have told pandas to assign
 empty values in our CSV to NaN `keep_default_na=False, na_values=[""]`.
-[More about all of the read_csv options here.](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html)
+[More about all of the read_csv options here.](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)
 
 # Concatenating DataFrames
 
@@ -85,18 +85,18 @@ survey_sub = surveys_df.head(10)
 # Grab the last 10 rows
 survey_sub_last10 = surveys_df.tail(10)
 # Reset the index values to the second dataframe appends properly
-survey_sub_last10=survey_sub_last10.reset_index(drop=True)
+survey_sub_last10 = survey_sub_last10.reset_index(drop=True)
 # drop=True option avoids adding new index column with old index values
 ~~~
 {: .language-python}
 
 When we concatenate DataFrames, we need to specify the axis. `axis=0` tells
-pandas to stack the second DataFrame under the first one. It will automatically
+pandas to stack the second DataFrame UNDER the first one. It will automatically
 detect whether the column names are the same and will stack accordingly.
 `axis=1` will stack the columns in the second DataFrame to the RIGHT of the
 first DataFrame. To stack the data vertically, we need to make sure we have the
 same columns and associated column format in both datasets. When we stack
-horizonally, we want to make sure what we are doing makes sense (ie the data are
+horizontally, we want to make sure what we are doing makes sense (i.e. the data are
 related in some way).
 
 ~~~
@@ -225,7 +225,7 @@ identifier, which is called `species_id`.
 
 Now that we know the fields with the common species ID attributes in each
 DataFrame, we are almost ready to join our data. However, since there are
-[different types of joins](http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/), we
+[different types of joins][join-types], we
 also need to decide which type of join makes sense for our analysis.
 
 ## Inner joins
@@ -236,16 +236,15 @@ two DataFrames based on a join key and returns a new DataFrame that contains
 DataFrames.
 
 Inner joins yield a DataFrame that contains only rows where the value being
-joins exists in BOTH tables. An example of an inner join, adapted from [this
-page](http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/) is below:
+joined exists in BOTH tables. An example of an inner join, adapted from [Jeff Atwood's blogpost about SQL joins][join-types] is below:
 
 ![Inner join -- courtesy of codinghorror.com](../fig/inner-join.png)
 
 The pandas function for performing joins is called `merge` and an Inner join is
 the default option:
 
 ~~~
-merged_inner = pd.merge(left=survey_sub,right=species_sub, left_on='species_id', right_on='species_id')
+merged_inner = pd.merge(left=survey_sub, right=species_sub, left_on='species_id', right_on='species_id')
 # In this case `species_id` is the only column name in  both dataframes, so if we skipped `left_on`
 # And `right_on` arguments we would still get the same result
 
@@ -326,7 +325,7 @@ A left join is performed in pandas by calling the same `merge` function used for
 inner join, but using the `how='left'` argument:
 
 ~~~
-merged_left = pd.merge(left=survey_sub,right=species_sub, how='left', left_on='species_id', right_on='species_id')
+merged_left = pd.merge(left=survey_sub, right=species_sub, how='left', left_on='species_id', right_on='species_id')
 merged_left
 ~~~
 {: .language-python}
@@ -421,4 +420,6 @@ The pandas `merge` function supports two other join types:
 >    the number of species in the plot / the total number of individuals in the plot = Biodiversity index.
 {: .challenge}
 
+[join-types]: http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
+
 {% include links.md %}