@@ -18,7 +18,7 @@ objectives:
1818 - " Perform basic mathematical operations and summary statistics on data in a Pandas DataFrame."
1919 - " Create simple plots."
2020keypoints :
21- - " Libraries enable us to extend the functionality of Python."
21+ - " Libraries enable us to extend the functionality of Python."
2222 - " Pandas is a popular library for working with data."
2323 - " A Dataframe is a Pandas data structure that allows one to access data by column (name or index) or row."
2424 - " Aggregating data using the `groupby()` function enables you to generate useful summaries of data quickly."
@@ -37,7 +37,7 @@ and they can replicate the same analysis.
3737
3838To help the lesson run smoothly, let's ensure everyone is in the same directory.
3939This should help us avoid path and file name issues. At this time please
40- navigate to the workshop directory. If you working in IPython Notebook be sure
40+ navigate to the workshop directory. If you are working in IPython Notebook be sure
4141that you start your notebook in the workshop directory.
4242
4343A quick aside that there are Python libraries like [ OS Library] [ os-lib ] that can work with our
@@ -93,7 +93,8 @@ record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight
9393A library in Python contains a set of tools (called functions) that perform
9494tasks on our data. Importing a library is like getting a piece of lab equipment
9595out of a storage locker and setting it up on the bench for use in a project.
96- Once a library is set up, it can be used or called to perform many tasks.
96+ Once a library is set up, it can be used or called to perform the task(s)
97+ it was built to do.
9798
9899## Pandas in Python
99100One of the best options for working with tabular data in Python is to use the
@@ -124,7 +125,7 @@ time we call a Pandas function.
124125# Reading CSV Data Using Pandas
125126
126127We will begin by locating and reading our survey data which are in CSV format. CSV stands for
127- Comma-Separated Values and is a common way store formatted data. Other symbols may also be used, so
128+ Comma-Separated Values and is a common way to store formatted data. Other symbols may also be used, so
128129you might see tab-separated, colon-separated or space separated files. It is quite easy to replace
129130one separator with another, to match your application. The first line in the file often has headers
130131to explain what is in each column. CSV (and other separators) make it easy to share data, and can be
@@ -486,8 +487,8 @@ summary stats.
486487>
487488> 1 . How many recorded individuals are female ` F ` and how many male ` M ` ?
488489> 2 . What happens when you group by two columns using the following syntax and
489- > then grab mean values?
490- > - ` grouped_data2 = surveys_df.groupby(['plot_id','sex']) `
490+ > then calculate mean values?
491+ > - ` grouped_data2 = surveys_df.groupby(['plot_id', 'sex']) `
491492> - ` grouped_data2.mean() `
492493> 3 . Summarize weight values for each site in your data. HINT: you can use the
493494> following syntax to only create summary statistics for one column in your data.
@@ -536,7 +537,7 @@ surveys_df.groupby('species_id')['record_id'].count()['DO']
536537> ## Challenge - Make a list
537538>
538539> What's another way to create a list of species and associated `count` of the
539- > records in the data? Hint: you can perform `count`, `min`, etc functions on
540+ > records in the data? Hint: you can perform `count`, `min`, etc. functions on
540541> groupby DataFrames in the same way you can perform them on regular DataFrames.
541542{: .challenge}
542543
@@ -589,13 +590,13 @@ total_count.plot(kind='bar');
589590> being sex. The plot should show total weight by sex for each site. Some
590591> tips are below to help you solve this challenge:
591592>
592- > * For more on Pandas plots, visit this [link ][pandas-plot].
593+ > * For more information on pandas plots, see [pandas' documentation page on visualization ][pandas-plot].
593594> * You can use the code that follows to create a stacked bar plot but the data to stack
594595> need to be in individual columns. Here's a simple example with some data where
595596> 'a', 'b', and 'c' are the groups, and 'one' and 'two' are the subgroups.
596597>
597598> ~~~
598- > d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
599+ > d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
599600> pd.DataFrame(d)
600601> ~~~
601602> {: .language-python }
@@ -616,7 +617,7 @@ total_count.plot(kind='bar');
616617> ~~~
617618> # Plot stacked data so columns 'one' and 'two' are stacked
618619> my_df = pd.DataFrame(d)
619- > my_df.plot(kind='bar',stacked=True,title="The title of my graph")
620+ > my_df.plot(kind='bar', stacked=True, title="The title of my graph")
620621> ~~~
621622> {: .language-python }
622623>
@@ -635,7 +636,7 @@ total_count.plot(kind='bar');
635636>> First we group data by site and by sex, and then calculate a total for each site.
636637>>
637638>> ~~~
638- >> by_site_sex = surveys_df.groupby(['plot_id','sex'])
639+ >> by_site_sex = surveys_df.groupby(['plot_id', 'sex'])
639640>> site_sex_count = by_site_sex['weight'].sum()
640641>> ~~~
641642>> {: .language-python}
@@ -660,7 +661,7 @@ total_count.plot(kind='bar');
660661>> Below we'll use `.unstack()` on our grouped data to figure out the total weight that each sex contributed to each site.
661662>>
662663>> ~~~
663- >> by_site_sex = surveys_df.groupby(['plot_id','sex'])
664+ >> by_site_sex = surveys_df.groupby(['plot_id', 'sex'])
664665>> site_sex_count = by_site_sex['weight'].sum()
665666>> site_sex_count.unstack()
666667>> ~~~
@@ -684,10 +685,10 @@ total_count.plot(kind='bar');
684685>> Rather than display it as a table, we can plot the above data by stacking the values of each sex as follows:
685686>>
686687>> ~~~
687- >> by_site_sex = surveys_df.groupby(['plot_id','sex'])
688+ >> by_site_sex = surveys_df.groupby(['plot_id', 'sex'])
688689>> site_sex_count = by_site_sex['weight'].sum()
689690>> spc = site_sex_count.unstack()
690- >> s_plot = spc.plot(kind='bar',stacked=True,title="Total weight by site and sex")
691+ >> s_plot = spc.plot(kind='bar', stacked=True, title="Total weight by site and sex")
691692>> s_plot.set_ylabel("Weight")
692693>> s_plot.set_xlabel("Plot")
693694>> ~~~
0 commit comments